Files
certctl/internal/repository/user.go
T
shankar0123 663b14bfd8 feat(retention): COMP-002-RETENTION — federated-user PII purge pipeline
Sprint 6 closure of the audit's MED-severity COMP-002-RETENTION
finding.

Pre-fix posture: the federated-user admin surface
(auth_users.go::Deactivate) sets users.deactivated_at on soft-delete,
but the PII columns (email, display_name, oidc_subject) stay
populated forever. No in-code primitive for GDPR right-to-be-
forgotten; no scheduled retention purge.

This commit ships the audit's recommended two-phase fix:

  Phase 1 — operator-callable scrub primitive
    internal/service/user_retention.go
      UserRetentionService.DeleteUserPII(ctx, userID):
        - revoke all active sessions (defense-in-depth)
        - email := 'purged@redacted.local'
        - display_name := '[purged]'
        - oidc_subject := 'sha256:' || hex(sha256(original))
        - audit_events row with action=user.purge_pii,
          category=auth, actor=system

      Why hash oidc_subject instead of NULL:
        1. (oidc_provider_id, oidc_subject) UNIQUE constraint would
           trip on multiple purged users converging to NULL
        2. The hash is one-way; the original IdP-side identifier is
           unrecoverable. Re-login under the same subject mints a
           fresh u-id (right-to-be-forgotten semantics)
        3. Forensic continuity: an operator can recompute
           sha256(<known-subject>) and confirm "this user was
           deactivated then purged"

      users.id itself is preserved so historical
      audit_events.actor = u-X rows still resolve. The forensic-
      attribution chain stays intact even after the PII is gone.

  Phase 2 — scheduled batch purge
    internal/scheduler/scheduler.go
      UserRetentionPurger interface + userRetentionLoop:
        - PurgeDeactivatedUsers enumerates every user with
          deactivated_at < NOW() - retention_window
        - DeleteUserPII per row
        - per-tick batch cap (default 200) keeps blast radius
          predictable; large backlogs spread across multiple ticks
        - atomic.Bool guard + 5-min per-tick context.WithTimeout

    Repository contract grew a single new method:
      internal/repository/user.go::ListDeactivatedBefore(ctx, t)
      internal/repository/postgres/user.go: SQL-side filter
      (deactivated_at IS NOT NULL AND deactivated_at < $1)
      ORDER BY deactivated_at ASC, cross-tenant.

  Configuration
    CERTCTL_USER_RETENTION_INTERVAL   default 24h
    CERTCTL_USER_RETENTION_WINDOW     default 30 days
    CERTCTL_USER_RETENTION_BATCH_CAP  default 200

  Test stub additions for repository.UserRepository.ListDeactivatedBefore:
    internal/auth/oidc/service_test.go::stubUsers
    internal/api/handler/auth_users_test.go::stubFullUserRepo
    internal/api/handler/auth_session_oidc_test.go::stubUserRepo

  Documentation
    docs/operator/privacy-and-retention.md
      - retention pipeline diagram (day-0 deactivate → day-N purge)
      - operator config table
      - verification runbook (4 steps with SQL)
      - what's NOT covered (deferred: DSAR export, api_keys cascade,
        retroactive audit_events.details redaction)

  Tests
    internal/service/user_retention_test.go (NEW, 4 tests):
      TestDeleteUserPII_ScrubsAndRevokes
      TestDeleteUserPII_IsIdempotent
      TestPurgeDeactivatedUsers_RespectsWindow
      TestPurgeDeactivatedUsers_BatchCap

Verified locally:
  go vet ./...                                   (clean)
  gofmt -l internal/ cmd/                        (clean)
  go test -short -count=1 \
    ./internal/service/... ./internal/scheduler/... ./internal/config/...
    (all green)

Cross-sprint interaction: pairs with COMP-001-HASH (prior commit).
The user.purge_pii audit row this service emits flows through the
new hash chain, so the scrub event is itself tamper-evident.

Closes COMP-002-RETENTION. Sprint 6 is complete (2/2 findings).
2026-05-16 06:18:39 +00:00

59 lines
2.3 KiB
Go

// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package repository
import (
"context"
"errors"
"time"
userdomain "github.com/certctl-io/certctl/internal/auth/user/domain"
)
// Sentinel errors for the user repository.
var (
// ErrUserNotFound: Get / GetByOIDCSubject returned no row. Phase
// 3's HandleCallback treats this as "first login for this person;
// create the row".
ErrUserNotFound = errors.New("user: not found")
// ErrUserDuplicateOIDCSubject: Create tripped the
// (oidc_provider_id, oidc_subject) UNIQUE constraint. HTTP 409.
ErrUserDuplicateOIDCSubject = errors.New("user: a user with this provider+subject already exists")
)
// UserRepository wraps the users table. Phase 3's HandleCallback
// uses GetByOIDCSubject + Create + Update on every login; the GUI's
// admin user-list surface uses ListAll + Get.
type UserRepository interface {
// Get returns one user by id. ErrUserNotFound on miss.
Get(ctx context.Context, id string) (*userdomain.User, error)
// GetByOIDCSubject is the Phase 3 hot-path lookup at login time.
// Returns the existing row if present, ErrUserNotFound otherwise.
GetByOIDCSubject(ctx context.Context, providerID, subject string) (*userdomain.User, error)
// Create persists a new user. Caller MUST have called u.Validate().
// Returns ErrUserDuplicateOIDCSubject on UNIQUE constraint trip.
Create(ctx context.Context, u *userdomain.User) error
// Update writes the mutable field set back to the row. Immutable
// fields (id, tenant_id, oidc_subject, oidc_provider_id,
// created_at) are preserved. updated_at is set to NOW() by the
// implementation.
Update(ctx context.Context, u *userdomain.User) error
// ListAll returns every user in the tenant. Order:
// created_at ASC. Used by the GUI's admin surface.
ListAll(ctx context.Context, tenantID string) ([]*userdomain.User, error)
// ListDeactivatedBefore returns every user whose deactivated_at is
// not NULL AND strictly before the supplied threshold. Sprint 6
// COMP-002-RETENTION closure — the scheduler's userRetentionLoop
// uses this to enumerate purge-eligible rows on each tick. Order:
// deactivated_at ASC (oldest first, so a tick-budget cap is
// deterministic about which rows it processes).
ListDeactivatedBefore(ctx context.Context, threshold time.Time) ([]*userdomain.User, error)
}