feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline

Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.

Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.

This commit ships the audit's recommended two-phase fix:

  Phase 1 — operator-callable scrub primitive
    internal/service/user_retention.go
      UserRetentionService.DeleteUserPII(ctx, userID):
        - revoke all active sessions (defense-in-depth)
        - email := 'purged@redacted.local'
        - display_name := '[purged]'
        - oidc_subject := 'sha256:' || hex(sha256(original))
        - audit_events row with action=user.purge_pii,
          category=auth, actor=system

      Why hash oidc_subject instead of NULL:
        1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
           trip on multiple purged users converging to NULL
        2. The hash is one-way; the original IdP-side identifier is
           unrecoverable. Re-login under the same subject mints a
           fresh u-id (right-to-be-forgotten semantics)
        3. Forensic continuity: an operator can recompute
           sha256(<known-subject>) and confirm "this user was
           deactivated then purged"

      users.id itself is preserved so historical
      audit_events.actor = u-X rows still resolve. The forensic-
      attribution chain stays intact even after the PII is gone.

  Phase 2 — scheduled batch purge
    internal/scheduler/scheduler.go
      UserRetentionPurger interface + userRetentionLoop:
        - PurgeDeactivatedUsers enumerates every user with
          deactivated_at < NOW() - retention_window
        - DeleteUserPII per row
        - per-tick batch cap (default 200) keeps blast radius
          predictable; large backlogs spread across multiple ticks
        - atomic.Bool guard + 5-min per-tick context.WithTimeout

    Repository contract grew a single new method:
      internal/repository/user.go::ListDeactivatedBefore(ctx, t)
      internal/repository/postgres/user.go: SQL-side filter
      (deactivated_at IS NOT NULL AND deactivated_at < $1)
      ORDER BY deactivated_at ASC, cross-tenant.

  Configuration
    CERTCTL_USER_RETENTION_INTERVAL   default 24h
    CERTCTL_USER_RETENTION_WINDOW     default 30 days
    CERTCTL_USER_RETENTION_BATCH_CAP  default 200

  Test stub additions for repository.UserRepository.ListDeactivatedBefore:
    internal/auth/oidc/service_test.go::stubUsers
    internal/api/handler/auth_users_test.go::stubFullUserRepo
    internal/api/handler/auth_session_oidc_test.go::stubUserRepo

  Documentation
    docs/operator/privacy-and-retention.md
      - retention pipeline diagram (day-0 deactivate → day-N purge)
      - operator config table
      - verification runbook (4 steps with SQL)
      - what's NOT covered (deferred: DSAR export, api_keys cascade,
        retroactive audit_events.details redaction)

  Tests
    internal/service/user_retention_test.go (NEW, 4 tests):
      TestDeleteUserPII_ScrubsAndRevokes
      TestDeleteUserPII_IsIdempotent
      TestPurgeDeactivatedUsers_RespectsWindow
      TestPurgeDeactivatedUsers_BatchCap

Verified locally:
  go vet ./...                                   (clean)
  gofmt -l internal/ cmd/                        (clean)
  go test -short -count=1 \
    ./internal/service/... ./internal/scheduler/... ./internal/config/...
    (all green)

Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.

Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
This commit is contained in:
shankar0123
2026-05-16 06:18:39 +00:00
parent 43836aca7c
commit 663b14bfd8
11 changed files with 874 additions and 0 deletions
+136
View File
@@ -0,0 +1,136 @@
# Privacy & retention (federated-user PII)
> Last reviewed: 2026-05-16
Sprint 6 COMP-002-RETENTION closure. certctl stores three categories
of personally-identifiable information for federated humans (Auth
Bundle 2 OIDC users):
| Column | Source | Used by |
|---|---|---|
| `users.email` | IdP claim (`email`) | Operator GUI "find user by email", display in lists, audit attribution. |
| `users.display_name` | IdP claim (`name`) | UI display string for the human. |
| `users.oidc_subject` | IdP claim (`sub`) | Stable identifier — joined with `oidc_provider_id` in the (provider, subject) UNIQUE constraint. |
Pre-fix, deactivating a user (admin-side `auth.user.deactivate`)
soft-deleted the row by setting `deactivated_at` but left the PII
columns populated indefinitely. The Sprint 6 fix adds an automatic
purge pipeline.
## Retention pipeline shape
```
Day 0 admin → POST /api/v1/auth/users/u-X/deactivate
├─ users.deactivated_at = NOW()
└─ all active sessions for u-X revoked
Day N scheduler's userRetentionLoop tick (default cadence 24h)
└─ UserRetentionService.PurgeDeactivatedUsers
├─ SELECT users WHERE deactivated_at < NOW() - retention_window
├─ For each row (batch-capped per tick):
│ UserRetentionService.DeleteUserPII(u.id)
│ ├─ revoke all active sessions (defense-in-depth)
│ ├─ email := "purged@redacted.local"
│ ├─ display_name := "[purged]"
│ ├─ oidc_subject := "sha256:" || hex(sha256(original))
│ └─ audit_events row (action=user.purge_pii, category=auth)
```
`users.id` is **preserved**. Historical `audit_events.actor = u-X`
rows still resolve to the row (now scrubbed). This is the
forensic-attribution guarantee — the operator can prove "user u-X
performed action Y on date Z" even after the PII is gone.
`oidc_subject` is **hashed**, not nullified, for two reasons:
1. The `(oidc_provider_id, oidc_subject)` UNIQUE constraint would
trip if multiple purged users converged on the same NULL.
2. Re-login under the same IdP subject creates a fresh row (different
`u-` id) because `GetByOIDCSubject` won't match the hashed token —
the original subject is unrecoverable from the hash. This is the
"right-to-be-forgotten" behavior: the same human logging back in
is functionally a new account.
## Operator configuration
| Env var | Default | Notes |
|---|---|---|
| `CERTCTL_USER_RETENTION_INTERVAL` | `24h` | Tick cadence for the scheduler's userRetentionLoop. Zero or negative ignored. |
| `CERTCTL_USER_RETENTION_WINDOW` | `30 * 24h` (30 days) | How long after `deactivated_at` a row's PII stays in the table. Operators with stricter GDPR/CCPA expectations may shorten. |
| `CERTCTL_USER_RETENTION_BATCH_CAP` | `200` | Per-tick row budget. Larger backlogs spread across multiple ticks. 0 = unbounded (test fixtures only). |
## How to verify retention is working
1. Deactivate a test user via the admin path:
```bash
curl -X POST -H "X-API-Key: $ADMIN_KEY" \
https://certctl.example.com/api/v1/auth/users/u-test/deactivate
```
2. Confirm the row's `deactivated_at` is set:
```sql
SELECT id, email, deactivated_at FROM users WHERE id = 'u-test';
```
3. Backdate `deactivated_at` to past the retention window (only for
testing — never in production):
```sql
UPDATE users SET deactivated_at = NOW() - INTERVAL '60 days'
WHERE id = 'u-test';
```
(Note: this UPDATE will succeed because `users` doesn't have a
WORM trigger; the audit-events WORM trigger is unrelated.)
4. Wait for the next `userRetentionLoop` tick (or restart the server
to force an immediate sweep). Confirm scrub:
```sql
SELECT id, email, display_name, oidc_subject
FROM users
WHERE id = 'u-test';
```
Expected: `email = 'purged@redacted.local'`,
`display_name = '[purged]'`,
`oidc_subject LIKE 'sha256:%'`.
5. Confirm an audit row was emitted:
```sql
SELECT id, actor, action, resource_id, timestamp
FROM audit_events
WHERE action = 'user.purge_pii' AND resource_id = 'u-test'
ORDER BY timestamp DESC LIMIT 1;
```
## What's NOT covered (deferred work)
The Sprint 6 fix is Phase 1 of the audit's COMP-002-RETENTION
recommendation. Two further pieces are forward-looking:
- **GDPR data-subject access request (DSAR) export.** A "show me
everything you know about me" endpoint is not yet implemented.
Operators on EU-resident data should treat this as a manual SQL
procedure today; track for Phase 2.
- **Cascade purge of related rows.** Sessions are revoked (above);
api_keys with `created_by = u-X` are NOT yet purged on scrub. The
api_keys table doesn't have a foreign key to users (it indexes by
`actor_id` strings, free-form), so the cascade is a service-layer
concern that needs explicit wiring. Track for Phase 2.
- **Per-event PII redaction in `audit_events.details`.** The existing
`RedactDetailsForAudit` (`internal/service/audit_redact.go`) scrubs
credential + PII keys at write time. A future feature for
"retroactively re-redact existing rows" would interact with the WORM
trigger; out of scope today.
## See also
- `internal/service/user_retention.go` — `UserRetentionService` source.
- `internal/scheduler/scheduler.go::userRetentionLoop` — scheduler loop.
- `migrations/000036_users.up.sql` — `users` table definition.
- `migrations/000045_users_deactivated_at.up.sql` — `deactivated_at` column.
- `docs/operator/audit-chain.md` — paired Sprint 6 tamper-evidence work.