feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline

Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.

Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.

This commit ships the audit's recommended two-phase fix:

  Phase 1 — operator-callable scrub primitive
    internal/service/user_retention.go
      UserRetentionService.DeleteUserPII(ctx, userID):
        - revoke all active sessions (defense-in-depth)
        - email := 'purged@redacted.local'
        - display_name := '[purged]'
        - oidc_subject := 'sha256:' || hex(sha256(original))
        - audit_events row with action=user.purge_pii,
          category=auth, actor=system

      Why hash oidc_subject instead of NULL:
        1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
           trip on multiple purged users converging to NULL
        2. The hash is one-way; the original IdP-side identifier is
           unrecoverable. Re-login under the same subject mints a
           fresh u-id (right-to-be-forgotten semantics)
        3. Forensic continuity: an operator can recompute
           sha256(<known-subject>) and confirm "this user was
           deactivated then purged"

      users.id itself is preserved so historical
      audit_events.actor = u-X rows still resolve. The forensic-
      attribution chain stays intact even after the PII is gone.

  Phase 2 — scheduled batch purge
    internal/scheduler/scheduler.go
      UserRetentionPurger interface + userRetentionLoop:
        - PurgeDeactivatedUsers enumerates every user with
          deactivated_at < NOW() - retention_window
        - DeleteUserPII per row
        - per-tick batch cap (default 200) keeps blast radius
          predictable; large backlogs spread across multiple ticks
        - atomic.Bool guard + 5-min per-tick context.WithTimeout

    Repository contract grew a single new method:
      internal/repository/user.go::ListDeactivatedBefore(ctx, t)
      internal/repository/postgres/user.go: SQL-side filter
      (deactivated_at IS NOT NULL AND deactivated_at < $1)
      ORDER BY deactivated_at ASC, cross-tenant.

  Configuration
    CERTCTL_USER_RETENTION_INTERVAL   default 24h
    CERTCTL_USER_RETENTION_WINDOW     default 30 days
    CERTCTL_USER_RETENTION_BATCH_CAP  default 200

  Test stub additions for repository.UserRepository.ListDeactivatedBefore:
    internal/auth/oidc/service_test.go::stubUsers
    internal/api/handler/auth_users_test.go::stubFullUserRepo
    internal/api/handler/auth_session_oidc_test.go::stubUserRepo

  Documentation
    docs/operator/privacy-and-retention.md
      - retention pipeline diagram (day-0 deactivate → day-N purge)
      - operator config table
      - verification runbook (4 steps with SQL)
      - what's NOT covered (deferred: DSAR export, api_keys cascade,
        retroactive audit_events.details redaction)

  Tests
    internal/service/user_retention_test.go (NEW, 4 tests):
      TestDeleteUserPII_ScrubsAndRevokes
      TestDeleteUserPII_IsIdempotent
      TestPurgeDeactivatedUsers_RespectsWindow
      TestPurgeDeactivatedUsers_BatchCap

Verified locally:
  go vet ./...                                   (clean)
  gofmt -l internal/ cmd/                        (clean)
  go test -short -count=1 \
    ./internal/service/... ./internal/scheduler/... ./internal/config/...
    (all green)

Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.

Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
This commit is contained in:
shankar0123
2026-05-16 06:18:39 +00:00
parent 43836aca7c
commit 663b14bfd8
11 changed files with 874 additions and 0 deletions
+29
View File
@@ -108,6 +108,10 @@ type Config struct {
// cadence. Scheduler loop auditChainVerifyLoop reads VerifyInterval;
// the metric-side counter is wired separately in cmd/server/main.go.
AuditChain AuditChainConfig
// UserRetention holds the Sprint 6 COMP-002-RETENTION purge cadence
// + window. The scheduler's userRetentionLoop reads Interval; the
// UserRetentionService reads RetentionWindow + BatchCap.
UserRetention UserRetentionConfig
}
// AuditChainConfig configures the audit_events tamper-evidence
@@ -126,6 +130,26 @@ type AuditChainConfig struct {
VerifyInterval time.Duration
}
// UserRetentionConfig configures the Sprint 6 COMP-002-RETENTION user
// PII purge sweeper. The scheduler's userRetentionLoop walks every
// user with deactivated_at older than RetentionWindow and scrubs the
// PII columns via UserRetentionService.DeleteUserPII.
type UserRetentionConfig struct {
// Interval is the tick cadence. Default 24h.
// Setting: CERTCTL_USER_RETENTION_INTERVAL.
Interval time.Duration
// RetentionWindow is how long after deactivated_at a row's PII
// stays in the table. Default 30 days. Operators with strict
// GDPR / CCPA expectations may shorten; operators who need
// forensic recovery latitude may lengthen.
// Setting: CERTCTL_USER_RETENTION_WINDOW.
RetentionWindow time.Duration
// BatchCap bounds how many users a single tick processes. Default
// 200 — keeps blast radius predictable. Set to 0 to disable the
// cap (test fixtures only).
// Setting: CERTCTL_USER_RETENTION_BATCH_CAP.
BatchCap int
}
// OCSPResponderConfig configures the dedicated OCSP-responder cert
// per issuer (RFC 6960 §2.6 + §4.2.2.2). When unset, the local issuer
@@ -724,6 +748,11 @@ func Load() (*Config, error) {
AuditChain: AuditChainConfig{
VerifyInterval: getEnvDuration("CERTCTL_AUDIT_CHAIN_VERIFY_INTERVAL", 6*time.Hour),
},
UserRetention: UserRetentionConfig{
Interval: getEnvDuration("CERTCTL_USER_RETENTION_INTERVAL", 24*time.Hour),
RetentionWindow: getEnvDuration("CERTCTL_USER_RETENTION_WINDOW", 30*24*time.Hour),
BatchCap: getEnvInt("CERTCTL_USER_RETENTION_BATCH_CAP", 200),
},
}
// Parse CERTCTL_API_KEYS_NAMED for named key authentication (M-002).