feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline

Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.

Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.

This commit ships the audit's recommended two-phase fix:

  Phase 1 — operator-callable scrub primitive
    internal/service/user_retention.go
      UserRetentionService.DeleteUserPII(ctx, userID):
        - revoke all active sessions (defense-in-depth)
        - email := 'purged@redacted.local'
        - display_name := '[purged]'
        - oidc_subject := 'sha256:' || hex(sha256(original))
        - audit_events row with action=user.purge_pii,
          category=auth, actor=system

      Why hash oidc_subject instead of NULL:
        1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
           trip on multiple purged users converging to NULL
        2. The hash is one-way; the original IdP-side identifier is
           unrecoverable. Re-login under the same subject mints a
           fresh u-id (right-to-be-forgotten semantics)
        3. Forensic continuity: an operator can recompute
           sha256(<known-subject>) and confirm "this user was
           deactivated then purged"

      users.id itself is preserved so historical
      audit_events.actor = u-X rows still resolve. The forensic-
      attribution chain stays intact even after the PII is gone.

  Phase 2 — scheduled batch purge
    internal/scheduler/scheduler.go
      UserRetentionPurger interface + userRetentionLoop:
        - PurgeDeactivatedUsers enumerates every user with
          deactivated_at < NOW() - retention_window
        - DeleteUserPII per row
        - per-tick batch cap (default 200) keeps blast radius
          predictable; large backlogs spread across multiple ticks
        - atomic.Bool guard + 5-min per-tick context.WithTimeout

    Repository contract grew a single new method:
      internal/repository/user.go::ListDeactivatedBefore(ctx, t)
      internal/repository/postgres/user.go: SQL-side filter
      (deactivated_at IS NOT NULL AND deactivated_at < $1)
      ORDER BY deactivated_at ASC, cross-tenant.

  Configuration
    CERTCTL_USER_RETENTION_INTERVAL   default 24h
    CERTCTL_USER_RETENTION_WINDOW     default 30 days
    CERTCTL_USER_RETENTION_BATCH_CAP  default 200

  Test stub additions for repository.UserRepository.ListDeactivatedBefore:
    internal/auth/oidc/service_test.go::stubUsers
    internal/api/handler/auth_users_test.go::stubFullUserRepo
    internal/api/handler/auth_session_oidc_test.go::stubUserRepo

  Documentation
    docs/operator/privacy-and-retention.md
      - retention pipeline diagram (day-0 deactivate → day-N purge)
      - operator config table
      - verification runbook (4 steps with SQL)
      - what's NOT covered (deferred: DSAR export, api_keys cascade,
        retroactive audit_events.details redaction)

  Tests
    internal/service/user_retention_test.go (NEW, 4 tests):
      TestDeleteUserPII_ScrubsAndRevokes
      TestDeleteUserPII_IsIdempotent
      TestPurgeDeactivatedUsers_RespectsWindow
      TestPurgeDeactivatedUsers_BatchCap

Verified locally:
  go vet ./...                                   (clean)
  gofmt -l internal/ cmd/                        (clean)
  go test -short -count=1 \
    ./internal/service/... ./internal/scheduler/... ./internal/config/...
    (all green)

Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.

Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
This commit is contained in:
shankar0123
2026-05-16 06:18:39 +00:00
parent 43836aca7c
commit 663b14bfd8
11 changed files with 874 additions and 0 deletions
+31
View File
@@ -8,6 +8,7 @@ import (
"database/sql"
"errors"
"fmt"
"time"
"github.com/lib/pq"
@@ -177,3 +178,33 @@ func (r *UserRepository) ListAll(ctx context.Context, tenantID string) ([]*userd
}
return out, rows.Err()
}
// ListDeactivatedBefore returns every user (across all tenants) whose
// deactivated_at is not NULL AND strictly before threshold. Sprint 6
// COMP-002-RETENTION — the userRetentionLoop in the scheduler walks
// this list per tick and calls UserRetentionService.DeleteUserPII on
// each. Cross-tenant on purpose: a single retention policy spans the
// whole control plane.
func (r *UserRepository) ListDeactivatedBefore(ctx context.Context, threshold time.Time) ([]*userdomain.User, error) {
rows, err := r.db.QueryContext(ctx,
`SELECT `+userColumns+`
FROM users
WHERE deactivated_at IS NOT NULL
AND deactivated_at < $1
ORDER BY deactivated_at ASC`,
threshold)
if err != nil {
return nil, fmt.Errorf("users list_deactivated_before: %w", err)
}
defer rows.Close()
var out []*userdomain.User
for rows.Next() {
u, err := scanUser(rows)
if err != nil {
return nil, fmt.Errorf("users scan: %w", err)
}
out = append(out, u)
}
return out, rows.Err()
}
+9
View File
@@ -6,6 +6,7 @@ package repository
import (
"context"
"errors"
"time"
userdomain "github.com/certctl-io/certctl/internal/auth/user/domain"
)
@@ -46,4 +47,12 @@ type UserRepository interface {
// ListAll returns every user in the tenant. Order:
// created_at ASC. Used by the GUI's admin surface.
ListAll(ctx context.Context, tenantID string) ([]*userdomain.User, error)
// ListDeactivatedBefore returns every user whose deactivated_at is
// not NULL AND strictly before the supplied threshold. Sprint 6
// COMP-002-RETENTION closure — the scheduler's userRetentionLoop
// uses this to enumerate purge-eligible rows on each tick. Order:
// deactivated_at ASC (oldest first, so a tick-budget cap is
// deterministic about which rows it processes).
ListDeactivatedBefore(ctx context.Context, threshold time.Time) ([]*userdomain.User, error)
}