mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:21:30 +00:00
feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline
Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.
Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.
This commit ships the audit's recommended two-phase fix:
Phase 1 — operator-callable scrub primitive
internal/service/user_retention.go
UserRetentionService.DeleteUserPII(ctx, userID):
- revoke all active sessions (defense-in-depth)
- email := 'purged@redacted.local'
- display_name := '[purged]'
- oidc_subject := 'sha256:' || hex(sha256(original))
- audit_events row with action=user.purge_pii,
category=auth, actor=system
Why hash oidc_subject instead of NULL:
1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
trip on multiple purged users converging to NULL
2. The hash is one-way; the original IdP-side identifier is
unrecoverable. Re-login under the same subject mints a
fresh u-id (right-to-be-forgotten semantics)
3. Forensic continuity: an operator can recompute
sha256(<known-subject>) and confirm "this user was
deactivated then purged"
users.id itself is preserved so historical
audit_events.actor = u-X rows still resolve. The forensic-
attribution chain stays intact even after the PII is gone.
Phase 2 — scheduled batch purge
internal/scheduler/scheduler.go
UserRetentionPurger interface + userRetentionLoop:
- PurgeDeactivatedUsers enumerates every user with
deactivated_at < NOW() - retention_window
- DeleteUserPII per row
- per-tick batch cap (default 200) keeps blast radius
predictable; large backlogs spread across multiple ticks
- atomic.Bool guard + 5-min per-tick context.WithTimeout
Repository contract grew a single new method:
internal/repository/user.go::ListDeactivatedBefore(ctx, t)
internal/repository/postgres/user.go: SQL-side filter
(deactivated_at IS NOT NULL AND deactivated_at < $1)
ORDER BY deactivated_at ASC, cross-tenant.
Configuration
CERTCTL_USER_RETENTION_INTERVAL default 24h
CERTCTL_USER_RETENTION_WINDOW default 30 days
CERTCTL_USER_RETENTION_BATCH_CAP default 200
Test stub additions for repository.UserRepository.ListDeactivatedBefore:
internal/auth/oidc/service_test.go::stubUsers
internal/api/handler/auth_users_test.go::stubFullUserRepo
internal/api/handler/auth_session_oidc_test.go::stubUserRepo
Documentation
docs/operator/privacy-and-retention.md
- retention pipeline diagram (day-0 deactivate → day-N purge)
- operator config table
- verification runbook (4 steps with SQL)
- what's NOT covered (deferred: DSAR export, api_keys cascade,
retroactive audit_events.details redaction)
Tests
internal/service/user_retention_test.go (NEW, 4 tests):
TestDeleteUserPII_ScrubsAndRevokes
TestDeleteUserPII_IsIdempotent
TestPurgeDeactivatedUsers_RespectsWindow
TestPurgeDeactivatedUsers_BatchCap
Verified locally:
go vet ./... (clean)
gofmt -l internal/ cmd/ (clean)
go test -short -count=1 \
./internal/service/... ./internal/scheduler/... ./internal/config/...
(all green)
Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.
Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
This commit is contained in:
@@ -8,6 +8,7 @@ import (
|
||||
"database/sql"
|
||||
"errors"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/lib/pq"
|
||||
|
||||
@@ -177,3 +178,33 @@ func (r *UserRepository) ListAll(ctx context.Context, tenantID string) ([]*userd
|
||||
}
|
||||
return out, rows.Err()
|
||||
}
|
||||
|
||||
// ListDeactivatedBefore returns every user (across all tenants) whose
|
||||
// deactivated_at is not NULL AND strictly before threshold. Sprint 6
|
||||
// COMP-002-RETENTION — the userRetentionLoop in the scheduler walks
|
||||
// this list per tick and calls UserRetentionService.DeleteUserPII on
|
||||
// each. Cross-tenant on purpose: a single retention policy spans the
|
||||
// whole control plane.
|
||||
func (r *UserRepository) ListDeactivatedBefore(ctx context.Context, threshold time.Time) ([]*userdomain.User, error) {
|
||||
rows, err := r.db.QueryContext(ctx,
|
||||
`SELECT `+userColumns+`
|
||||
FROM users
|
||||
WHERE deactivated_at IS NOT NULL
|
||||
AND deactivated_at < $1
|
||||
ORDER BY deactivated_at ASC`,
|
||||
threshold)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("users list_deactivated_before: %w", err)
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var out []*userdomain.User
|
||||
for rows.Next() {
|
||||
u, err := scanUser(rows)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("users scan: %w", err)
|
||||
}
|
||||
out = append(out, u)
|
||||
}
|
||||
return out, rows.Err()
|
||||
}
|
||||
|
||||
@@ -6,6 +6,7 @@ package repository
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"time"
|
||||
|
||||
userdomain "github.com/certctl-io/certctl/internal/auth/user/domain"
|
||||
)
|
||||
@@ -46,4 +47,12 @@ type UserRepository interface {
|
||||
// ListAll returns every user in the tenant. Order:
|
||||
// created_at ASC. Used by the GUI's admin surface.
|
||||
ListAll(ctx context.Context, tenantID string) ([]*userdomain.User, error)
|
||||
|
||||
// ListDeactivatedBefore returns every user whose deactivated_at is
|
||||
// not NULL AND strictly before the supplied threshold. Sprint 6
|
||||
// COMP-002-RETENTION closure — the scheduler's userRetentionLoop
|
||||
// uses this to enumerate purge-eligible rows on each tick. Order:
|
||||
// deactivated_at ASC (oldest first, so a tick-budget cap is
|
||||
// deterministic about which rows it processes).
|
||||
ListDeactivatedBefore(ctx context.Context, threshold time.Time) ([]*userdomain.User, error)
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user