Files
certctl/docs/operator/privacy-and-retention.md
T
shankar0123 663b14bfd8 feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline
Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.

Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.

This commit ships the audit's recommended two-phase fix:

  Phase 1 — operator-callable scrub primitive
    internal/service/user_retention.go
      UserRetentionService.DeleteUserPII(ctx, userID):
        - revoke all active sessions (defense-in-depth)
        - email := 'purged@redacted.local'
        - display_name := '[purged]'
        - oidc_subject := 'sha256:' || hex(sha256(original))
        - audit_events row with action=user.purge_pii,
          category=auth, actor=system

      Why hash oidc_subject instead of NULL:
        1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
           trip on multiple purged users converging to NULL
        2. The hash is one-way; the original IdP-side identifier is
           unrecoverable. Re-login under the same subject mints a
           fresh u-id (right-to-be-forgotten semantics)
        3. Forensic continuity: an operator can recompute
           sha256(<known-subject>) and confirm "this user was
           deactivated then purged"

      users.id itself is preserved so historical
      audit_events.actor = u-X rows still resolve. The forensic-
      attribution chain stays intact even after the PII is gone.

  Phase 2 — scheduled batch purge
    internal/scheduler/scheduler.go
      UserRetentionPurger interface + userRetentionLoop:
        - PurgeDeactivatedUsers enumerates every user with
          deactivated_at < NOW() - retention_window
        - DeleteUserPII per row
        - per-tick batch cap (default 200) keeps blast radius
          predictable; large backlogs spread across multiple ticks
        - atomic.Bool guard + 5-min per-tick context.WithTimeout

    Repository contract grew a single new method:
      internal/repository/user.go::ListDeactivatedBefore(ctx, t)
      internal/repository/postgres/user.go: SQL-side filter
      (deactivated_at IS NOT NULL AND deactivated_at < $1)
      ORDER BY deactivated_at ASC, cross-tenant.

  Configuration
    CERTCTL_USER_RETENTION_INTERVAL   default 24h
    CERTCTL_USER_RETENTION_WINDOW     default 30 days
    CERTCTL_USER_RETENTION_BATCH_CAP  default 200

  Test stub additions for repository.UserRepository.ListDeactivatedBefore:
    internal/auth/oidc/service_test.go::stubUsers
    internal/api/handler/auth_users_test.go::stubFullUserRepo
    internal/api/handler/auth_session_oidc_test.go::stubUserRepo

  Documentation
    docs/operator/privacy-and-retention.md
      - retention pipeline diagram (day-0 deactivate → day-N purge)
      - operator config table
      - verification runbook (4 steps with SQL)
      - what's NOT covered (deferred: DSAR export, api_keys cascade,
        retroactive audit_events.details redaction)

  Tests
    internal/service/user_retention_test.go (NEW, 4 tests):
      TestDeleteUserPII_ScrubsAndRevokes
      TestDeleteUserPII_IsIdempotent
      TestPurgeDeactivatedUsers_RespectsWindow
      TestPurgeDeactivatedUsers_BatchCap

Verified locally:
  go vet ./...                                   (clean)
  gofmt -l internal/ cmd/                        (clean)
  go test -short -count=1 \
    ./internal/service/... ./internal/scheduler/... ./internal/config/...
    (all green)

Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.

Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
2026-05-16 06:18:39 +00:00

5.5 KiB

Privacy & retention (federated-user PII)

Last reviewed: 2026-05-16

Sprint 6 COMP-002-RETENTION closure. certctl stores three categories of personally-identifiable information for federated humans (Auth Bundle 2 OIDC users):

Column Source Used by
users.email IdP claim (email) Operator GUI "find user by email", display in lists, audit attribution.
users.display_name IdP claim (name) UI display string for the human.
users.oidc_subject IdP claim (sub) Stable identifier — joined with oidc_provider_id in the (provider, subject) UNIQUE constraint.

Pre-fix, deactivating a user (admin-side auth.user.deactivate) soft-deleted the row by setting deactivated_at but left the PII columns populated indefinitely. The Sprint 6 fix adds an automatic purge pipeline.

Retention pipeline shape

Day 0   admin → POST /api/v1/auth/users/u-X/deactivate
                ├─ users.deactivated_at = NOW()
                └─ all active sessions for u-X revoked

Day N   scheduler's userRetentionLoop tick (default cadence 24h)
        └─ UserRetentionService.PurgeDeactivatedUsers
           ├─ SELECT users WHERE deactivated_at < NOW() - retention_window
           ├─ For each row (batch-capped per tick):
           │     UserRetentionService.DeleteUserPII(u.id)
           │     ├─ revoke all active sessions (defense-in-depth)
           │     ├─ email        := "purged@redacted.local"
           │     ├─ display_name := "[purged]"
           │     ├─ oidc_subject := "sha256:" || hex(sha256(original))
           │     └─ audit_events row (action=user.purge_pii, category=auth)

users.id is preserved. Historical audit_events.actor = u-X rows still resolve to the row (now scrubbed). This is the forensic-attribution guarantee — the operator can prove "user u-X performed action Y on date Z" even after the PII is gone.

oidc_subject is hashed, not nullified, for two reasons:

  1. The (oidc_provider_id, oidc_subject) UNIQUE constraint would trip if multiple purged users converged on the same NULL.
  2. Re-login under the same IdP subject creates a fresh row (different u- id) because GetByOIDCSubject won't match the hashed token — the original subject is unrecoverable from the hash. This is the "right-to-be-forgotten" behavior: the same human logging back in is functionally a new account.

Operator configuration

Env var Default Notes
CERTCTL_USER_RETENTION_INTERVAL 24h Tick cadence for the scheduler's userRetentionLoop. Zero or negative ignored.
CERTCTL_USER_RETENTION_WINDOW 30 * 24h (30 days) How long after deactivated_at a row's PII stays in the table. Operators with stricter GDPR/CCPA expectations may shorten.
CERTCTL_USER_RETENTION_BATCH_CAP 200 Per-tick row budget. Larger backlogs spread across multiple ticks. 0 = unbounded (test fixtures only).

How to verify retention is working

  1. Deactivate a test user via the admin path:

    curl -X POST -H "X-API-Key: $ADMIN_KEY" \
      https://certctl.example.com/api/v1/auth/users/u-test/deactivate
    
  2. Confirm the row's deactivated_at is set:

    SELECT id, email, deactivated_at FROM users WHERE id = 'u-test';
    
  3. Backdate deactivated_at to past the retention window (only for testing — never in production):

    UPDATE users SET deactivated_at = NOW() - INTERVAL '60 days'
    WHERE id = 'u-test';
    

    (Note: this UPDATE will succeed because users doesn't have a WORM trigger; the audit-events WORM trigger is unrelated.)

  4. Wait for the next userRetentionLoop tick (or restart the server to force an immediate sweep). Confirm scrub:

    SELECT id, email, display_name, oidc_subject
      FROM users
     WHERE id = 'u-test';
    

    Expected: email = 'purged@redacted.local', display_name = '[purged]', oidc_subject LIKE 'sha256:%'.

  5. Confirm an audit row was emitted:

    SELECT id, actor, action, resource_id, timestamp
      FROM audit_events
     WHERE action = 'user.purge_pii' AND resource_id = 'u-test'
     ORDER BY timestamp DESC LIMIT 1;
    

What's NOT covered (deferred work)

The Sprint 6 fix is Phase 1 of the audit's COMP-002-RETENTION recommendation. Two further pieces are forward-looking:

  • GDPR data-subject access request (DSAR) export. A "show me everything you know about me" endpoint is not yet implemented. Operators on EU-resident data should treat this as a manual SQL procedure today; track for Phase 2.
  • Cascade purge of related rows. Sessions are revoked (above); api_keys with created_by = u-X are NOT yet purged on scrub. The api_keys table doesn't have a foreign key to users (it indexes by actor_id strings, free-form), so the cascade is a service-layer concern that needs explicit wiring. Track for Phase 2.
  • Per-event PII redaction in audit_events.details. The existing RedactDetailsForAudit (internal/service/audit_redact.go) scrubs credential + PII keys at write time. A future feature for "retroactively re-redact existing rows" would interact with the WORM trigger; out of scope today.

See also

  • internal/service/user_retention.goUserRetentionService source.
  • internal/scheduler/scheduler.go::userRetentionLoop — scheduler loop.
  • migrations/000036_users.up.sqlusers table definition.
  • migrations/000045_users_deactivated_at.up.sqldeactivated_at column.
  • docs/operator/audit-chain.md — paired Sprint 6 tamper-evidence work.