Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.
Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.
This commit ships the audit's recommended two-phase fix:
Phase 1 — operator-callable scrub primitive
internal/service/user_retention.go
UserRetentionService.DeleteUserPII(ctx, userID):
- revoke all active sessions (defense-in-depth)
- email := 'purged@redacted.local'
- display_name := '[purged]'
- oidc_subject := 'sha256:' || hex(sha256(original))
- audit_events row with action=user.purge_pii,
category=auth, actor=system
Why hash oidc_subject instead of NULL:
1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
trip on multiple purged users converging to NULL
2. The hash is one-way; the original IdP-side identifier is
unrecoverable. Re-login under the same subject mints a
fresh u-id (right-to-be-forgotten semantics)
3. Forensic continuity: an operator can recompute
sha256(<known-subject>) and confirm "this user was
deactivated then purged"
users.id itself is preserved so historical
audit_events.actor = u-X rows still resolve. The forensic-
attribution chain stays intact even after the PII is gone.
Phase 2 — scheduled batch purge
internal/scheduler/scheduler.go
UserRetentionPurger interface + userRetentionLoop:
- PurgeDeactivatedUsers enumerates every user with
deactivated_at < NOW() - retention_window
- DeleteUserPII per row
- per-tick batch cap (default 200) keeps blast radius
predictable; large backlogs spread across multiple ticks
- atomic.Bool guard + 5-min per-tick context.WithTimeout
Repository contract grew a single new method:
internal/repository/user.go::ListDeactivatedBefore(ctx, t)
internal/repository/postgres/user.go: SQL-side filter
(deactivated_at IS NOT NULL AND deactivated_at < $1)
ORDER BY deactivated_at ASC, cross-tenant.
Configuration
CERTCTL_USER_RETENTION_INTERVAL default 24h
CERTCTL_USER_RETENTION_WINDOW default 30 days
CERTCTL_USER_RETENTION_BATCH_CAP default 200
Test stub additions for repository.UserRepository.ListDeactivatedBefore:
internal/auth/oidc/service_test.go::stubUsers
internal/api/handler/auth_users_test.go::stubFullUserRepo
internal/api/handler/auth_session_oidc_test.go::stubUserRepo
Documentation
docs/operator/privacy-and-retention.md
- retention pipeline diagram (day-0 deactivate → day-N purge)
- operator config table
- verification runbook (4 steps with SQL)
- what's NOT covered (deferred: DSAR export, api_keys cascade,
retroactive audit_events.details redaction)
Tests
internal/service/user_retention_test.go (NEW, 4 tests):
TestDeleteUserPII_ScrubsAndRevokes
TestDeleteUserPII_IsIdempotent
TestPurgeDeactivatedUsers_RespectsWindow
TestPurgeDeactivatedUsers_BatchCap
Verified locally:
go vet ./... (clean)
gofmt -l internal/ cmd/ (clean)
go test -short -count=1 \
./internal/service/... ./internal/scheduler/... ./internal/config/...
(all green)
Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.
Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
5.5 KiB
Privacy & retention (federated-user PII)
Last reviewed: 2026-05-16
Sprint 6 COMP-002-RETENTION closure. certctl stores three categories of personally-identifiable information for federated humans (Auth Bundle 2 OIDC users):
| Column | Source | Used by |
|---|---|---|
users.email |
IdP claim (email) |
Operator GUI "find user by email", display in lists, audit attribution. |
users.display_name |
IdP claim (name) |
UI display string for the human. |
users.oidc_subject |
IdP claim (sub) |
Stable identifier — joined with oidc_provider_id in the (provider, subject) UNIQUE constraint. |
Pre-fix, deactivating a user (admin-side auth.user.deactivate)
soft-deleted the row by setting deactivated_at but left the PII
columns populated indefinitely. The Sprint 6 fix adds an automatic
purge pipeline.
Retention pipeline shape
Day 0 admin → POST /api/v1/auth/users/u-X/deactivate
├─ users.deactivated_at = NOW()
└─ all active sessions for u-X revoked
Day N scheduler's userRetentionLoop tick (default cadence 24h)
└─ UserRetentionService.PurgeDeactivatedUsers
├─ SELECT users WHERE deactivated_at < NOW() - retention_window
├─ For each row (batch-capped per tick):
│ UserRetentionService.DeleteUserPII(u.id)
│ ├─ revoke all active sessions (defense-in-depth)
│ ├─ email := "purged@redacted.local"
│ ├─ display_name := "[purged]"
│ ├─ oidc_subject := "sha256:" || hex(sha256(original))
│ └─ audit_events row (action=user.purge_pii, category=auth)
users.id is preserved. Historical audit_events.actor = u-X
rows still resolve to the row (now scrubbed). This is the
forensic-attribution guarantee — the operator can prove "user u-X
performed action Y on date Z" even after the PII is gone.
oidc_subject is hashed, not nullified, for two reasons:
- The
(oidc_provider_id, oidc_subject)UNIQUE constraint would trip if multiple purged users converged on the same NULL. - Re-login under the same IdP subject creates a fresh row (different
u-id) becauseGetByOIDCSubjectwon't match the hashed token — the original subject is unrecoverable from the hash. This is the "right-to-be-forgotten" behavior: the same human logging back in is functionally a new account.
Operator configuration
| Env var | Default | Notes |
|---|---|---|
CERTCTL_USER_RETENTION_INTERVAL |
24h |
Tick cadence for the scheduler's userRetentionLoop. Zero or negative ignored. |
CERTCTL_USER_RETENTION_WINDOW |
30 * 24h (30 days) |
How long after deactivated_at a row's PII stays in the table. Operators with stricter GDPR/CCPA expectations may shorten. |
CERTCTL_USER_RETENTION_BATCH_CAP |
200 |
Per-tick row budget. Larger backlogs spread across multiple ticks. 0 = unbounded (test fixtures only). |
How to verify retention is working
-
Deactivate a test user via the admin path:
curl -X POST -H "X-API-Key: $ADMIN_KEY" \ https://certctl.example.com/api/v1/auth/users/u-test/deactivate -
Confirm the row's
deactivated_atis set:SELECT id, email, deactivated_at FROM users WHERE id = 'u-test'; -
Backdate
deactivated_atto past the retention window (only for testing — never in production):UPDATE users SET deactivated_at = NOW() - INTERVAL '60 days' WHERE id = 'u-test';(Note: this UPDATE will succeed because
usersdoesn't have a WORM trigger; the audit-events WORM trigger is unrelated.) -
Wait for the next
userRetentionLooptick (or restart the server to force an immediate sweep). Confirm scrub:SELECT id, email, display_name, oidc_subject FROM users WHERE id = 'u-test';Expected:
email = 'purged@redacted.local',display_name = '[purged]',oidc_subject LIKE 'sha256:%'. -
Confirm an audit row was emitted:
SELECT id, actor, action, resource_id, timestamp FROM audit_events WHERE action = 'user.purge_pii' AND resource_id = 'u-test' ORDER BY timestamp DESC LIMIT 1;
What's NOT covered (deferred work)
The Sprint 6 fix is Phase 1 of the audit's COMP-002-RETENTION recommendation. Two further pieces are forward-looking:
- GDPR data-subject access request (DSAR) export. A "show me everything you know about me" endpoint is not yet implemented. Operators on EU-resident data should treat this as a manual SQL procedure today; track for Phase 2.
- Cascade purge of related rows. Sessions are revoked (above);
api_keys with
created_by = u-Xare NOT yet purged on scrub. The api_keys table doesn't have a foreign key to users (it indexes byactor_idstrings, free-form), so the cascade is a service-layer concern that needs explicit wiring. Track for Phase 2. - Per-event PII redaction in
audit_events.details. The existingRedactDetailsForAudit(internal/service/audit_redact.go) scrubs credential + PII keys at write time. A future feature for "retroactively re-redact existing rows" would interact with the WORM trigger; out of scope today.
See also
internal/service/user_retention.go—UserRetentionServicesource.internal/scheduler/scheduler.go::userRetentionLoop— scheduler loop.migrations/000036_users.up.sql—userstable definition.migrations/000045_users_deactivated_at.up.sql—deactivated_atcolumn.docs/operator/audit-chain.md— paired Sprint 6 tamper-evidence work.