From ff3f1cd8640cc32f303cf282e78a35fb896c07bc Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Mon, 11 May 2026 11:45:54 +0000 Subject: [PATCH] harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure (b81588e) — production-startup observability for actor-demo-anon residual grants + CI guard banning new synthetic- admin code paths. What this changes: * cmd/server/preflight_demo_residual.go (new) runs after the DB pool + audit service are constructed and before the HTTPS listener starts. Under any non-'none' auth type it queries actor_roles for the synthetic actor-demo-anon and emits a WARN log + a categorized audit row (auth.demo_residual_grants_detected) listing every grant present. Migration 000029 unconditionally seeds the ar-demo-anon-admin row at install time, so EVERY production deploy will see this WARN on first boot; the intended cutover workflow is cleanup-once at production handover. * CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig, default false) pivots the WARN to fail-closed startup refusal for operators who want a paranoid posture against re-seeding. * POST /api/v1/auth/demo-residual/cleanup (new handler at internal/api/handler/demo_residual.go) is an admin-class (auth.role.assign) endpoint that removes every actor-demo-anon row from actor_roles and returns {removed: int64}. Idempotent; refuses 503 under Auth.Type=none (deleting the row would break the demo path); audit-logs every invocation including no-op zero-removed calls so the admin's action is always recorded. * scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry allowlist of source files that legitimately reference the actor-demo-anon literal. New runtime code paths that resolve to the synthetic actor (the same pattern that produced the original CRIT class) are rejected at PR time. CI workflow auto-picks the script via the existing scripts/ci-guards/*.sh loop in .github/workflows/ ci.yml; no workflow edit needed. Regression matrix: * cmd/server/preflight_demo_residual_test.go — 7 tests covering the 4 main behaviour branches (testcontainers-backed, testing.Short()- skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent) plus 3 pure-Go stdlib unit tests for the row-string formatter + nil-safety contracts on both helpers. * internal/api/handler/demo_residual_test.go — 7 stdlib+httptest cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503), CleanupError_Surfaces500, NilCleanupFn (defensive 500), NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to 'unknown' actor in the audit row). * internal/api/router/openapi_parity_test.go — new POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config) that had drifted out of SpecParityExceptions; the parity test was red on dev/auth-bundle-2 before my work; this commit returns it to green with full per-entry justifications + parity-debt notes. Docs: * docs/operator/security.md — new 'Demo-to-production cutover (Audit 2026-05-11 A-8)' section explaining the WARN message, the cleanup curl one-liner, the equivalent SQL, the strict-mode env var, and the CI guard. * docs/operator/rbac.md — Last-reviewed bump + pointer to the new env var + the security.md section. * cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an 'A-8 follow-on CLOSED 2026-05-11' annotation describing the deferred Phase 2 leg now landed. * CHANGELOG.md — Unreleased ### Security entry summarizing the four legs (detector + cleanup + strict-mode flag + CI guard) and the acquisition-readiness narrative this closes. Operator-facing impact: this closes a credibility gap, not an exploitable vulnerability. The residue requires a regression elsewhere in the middleware chain to be exploitable. After this fix, the canonical narrative ('RBAC primitive with no synthetic- admin fallback') is fully true. Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual- cleanup.md. --- CHANGELOG.md | 35 +++ cmd/server/main.go | 25 ++ cmd/server/preflight_demo_residual.go | 203 ++++++++++++++ cmd/server/preflight_demo_residual_test.go | 295 ++++++++++++++++++++ docs/operator/rbac.md | 8 +- docs/operator/security.md | 57 +++- internal/api/handler/demo_residual.go | 134 +++++++++ internal/api/handler/demo_residual_test.go | 229 +++++++++++++++ internal/api/router/openapi_parity_test.go | 28 ++ internal/api/router/router.go | 14 + internal/config/config.go | 23 ++ scripts/ci-guards/no-new-synthetic-admin.sh | 74 +++++ 12 files changed, 1123 insertions(+), 2 deletions(-) create mode 100644 cmd/server/preflight_demo_residual.go create mode 100644 cmd/server/preflight_demo_residual_test.go create mode 100644 internal/api/handler/demo_residual.go create mode 100644 internal/api/handler/demo_residual_test.go create mode 100755 scripts/ci-guards/no-new-synthetic-admin.sh diff --git a/CHANGELOG.md b/CHANGELOG.md index fbbbaea..d40526c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,41 @@ ### Security +- **Demo-mode residual-grants detector + cleanup endpoint + CI guard (Audit 2026-05-11 A-8).** + HIGH-12 (closure `b81588e`) added a fail-closed bind-address guard + that refuses startup when `CERTCTL_AUTH_TYPE=none` binds non-loopback + without `CERTCTL_DEMO_MODE_ACK=true`. The Phase 2 leg of that spec — + production-startup banner when `actor-demo-anon` has residual role + grants in `actor_roles` plus a CI guard banning new synthetic-admin + code paths — was deferred. This closure lands all three deferred + legs. (1) `cmd/server/preflight_demo_residual.go` runs after the DB + is open + audit service is constructed, before the HTTPS listener + starts; under any non-`none` auth type it queries `actor_roles` for + `actor-demo-anon` and emits a WARN log + `auth.demo_residual_grants_detected` + audit row when the row is present. The migration 000029 baseline + unconditionally seeds the `ar-demo-anon-admin` row at install time, + so EVERY production deploy will see this WARN on first boot — the + intended cutover workflow is documented at `docs/operator/security.md`. + (2) `POST /api/v1/auth/demo-residual/cleanup` is an admin-class + (`auth.role.assign`) cleanup endpoint that removes every + `actor-demo-anon` row from `actor_roles` and returns + `{"removed": }`; idempotent (a second call returns + `removed:0`), refuses 503 under `Auth.Type=none` (deleting the row + would break the demo path), audit-logs every invocation. (3) New + env var `CERTCTL_DEMO_MODE_RESIDUAL_STRICT` (default `false`) + pivots the WARN to fail-closed startup refusal for operators who + want a paranoid hostile-environment posture. (4) CI guard + `scripts/ci-guards/no-new-synthetic-admin.sh` pins the 17-entry + allowlist of source files that may reference the `actor-demo-anon` + literal; new runtime code paths that resolve to the synthetic actor + are rejected at PR time so the credibility gap stays closed. The + closure was framed as "credibility gap, not exploitable + vulnerability" — the residue requires a regression elsewhere in the + middleware chain to be exploitable. After this fix, the canonical + acquisition-readiness narrative ("RBAC primitive with no + synthetic-admin fallback") is fully true. Operator runbook at + `docs/operator/security.md#demo-to-production-cutover-audit-2026-05-11-a-8`. + - **Scope-aware actor-role revoke (Audit 2026-05-11 A-4).** HIGH-10 made it possible to grant the same role to the same actor at multiple scopes (e.g. `r-operator` on `profile=p-acme` AND `profile=p-globex`) diff --git a/cmd/server/main.go b/cmd/server/main.go index fc3f51b..ce0585a 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -276,6 +276,21 @@ func main() { // Initialize services (following the dependency graph) auditService := service.NewAuditService(auditRepo) + // Audit 2026-05-11 A-8 closure: detect residual actor-demo-anon + // grants under non-`none` auth types. Defaults to WARN-only; flip + // CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true to fail-closed. Closes + // the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure. + { + preflightCtx, preflightCancel := context.WithTimeout(context.Background(), 5*time.Second) + if err := preflightDemoModeResidual(preflightCtx, cfg, db, auditService, logger); err != nil { + preflightCancel() + logger.Error("startup refused: actor-demo-anon residual grants present + CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true", + "error", err) + os.Exit(1) + } + preflightCancel() + } + // RBAC primitive (Bundle 1 Phase 4). Wires the postgres auth repos // + service-layer Authorizer that the AuthHandler / RequirePermission // middleware uses. Migration 000029_rbac.up.sql provides the schema @@ -1383,6 +1398,16 @@ func main() { // service is wired above; handler is auth-exempt at the // router (gated by the bootstrap.Strategy itself). Bootstrap: bootstrapHandler, + // Audit 2026-05-11 A-8 closure — demo-mode residual cleanup. + // The cleanup closure captures the live *sql.DB pool so the + // handler doesn't pull repository.* / database/sql into the + // internal/api/handler import set. authType is a closure over + // cfg so the live config value is always read at request time. + DemoResidual: handler.NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return deleteDemoAnonResidue(ctx, db) }, + func() string { return cfg.Auth.Type }, + auditService, + ), // Checker is the load-bearing auth.PermissionChecker that // auth.RequirePermission middleware uses to gate the legacy admin // handlers (Bundle 1 Phase 3.5: bulk_revocation, admin_crl_cache, diff --git a/cmd/server/preflight_demo_residual.go b/cmd/server/preflight_demo_residual.go new file mode 100644 index 0000000..aefc49a --- /dev/null +++ b/cmd/server/preflight_demo_residual.go @@ -0,0 +1,203 @@ +// Copyright (c) certctl-io contributors. +// +// Audit 2026-05-11 A-8 — demo-mode residual-grants detector. Closes the +// deferred Phase 2 leg of HIGH-12 (cowork/auth-bundles-fixes-2026-05-10/ +// 11-high-12-demo-mode-guard.md). The HIGH-12 closure (`b81588e`) added +// the fail-closed bind-address guard at config.Validate; the deferred +// leg here adds a startup-time WARN (or strict refuse-startup) when +// `actor-demo-anon` has live role grants under a non-`none` auth type. +// +// Why this matters: migration 000029 unconditionally seeds the +// `ar-demo-anon-admin` row granting r-admin to actor-demo-anon. The +// row is dormant under auth_type=api-key|oidc (the middleware chain +// never injects the synthetic actor as the request principal), but +// it represents a security debt: any future regression in the +// middleware chain (a misrouted CORS preflight, a fallback in a new +// auth-exempt route) that resolves to actor-demo-anon would re-elevate +// to admin. The canonical acquisition-readiness narrative — "we have +// an RBAC primitive with no synthetic-admin fallback" — requires this +// row to be either gone or explicitly acknowledged. + +package main + +import ( + "context" + "database/sql" + "errors" + "fmt" + "log/slog" + "strings" + "time" + + "github.com/certctl-io/certctl/internal/config" + "github.com/certctl-io/certctl/internal/domain" + authdomain "github.com/certctl-io/certctl/internal/domain/auth" + "github.com/certctl-io/certctl/internal/service" +) + +// preflightDemoModeResidual runs after the DB connection is open and +// the audit service is constructed, before the HTTPS listener starts. +// +// Behaviour: +// - cfg.Auth.Type == "none" (demo mode): no-op. The residual IS the +// runtime state at that auth type. +// - cfg.Auth.Type != "none" + no residue: returns nil silently. +// - cfg.Auth.Type != "none" + residue + strict=false: emits a WARN +// log AND an `auth.demo_residual_grants_detected` audit row +// listing the grant IDs, then returns nil. +// - cfg.Auth.Type != "none" + residue + strict=true: emits the same +// WARN + audit, then returns a non-nil error so the caller can +// refuse startup. +// +// The audit row's actor is `system` / ActorTypeSystem; category is +// EventCategoryAuth so audit consumers filtering on auth events see it. +func preflightDemoModeResidual( + ctx context.Context, + cfg *config.Config, + db *sql.DB, + audit *service.AuditService, + logger *slog.Logger, +) error { + if cfg.Auth.Type == "none" { + // Demo mode itself. The residual is the runtime state at + // this auth type, so warning about it would be noise. + return nil + } + + residue, err := queryDemoAnonResidue(ctx, db) + if err != nil { + return fmt.Errorf("preflight demo-mode residual: %w", err) + } + if len(residue) == 0 { + return nil + } + + formatted := make([]string, 0, len(residue)) + for _, r := range residue { + formatted = append(formatted, r.String()) + } + + msg := fmt.Sprintf( + "production startup warning: actor-demo-anon has %d residual role grant(s) "+ + "from the migration 000029 baseline or a prior demo-mode run: %s. "+ + "These grants are DORMANT at the current auth_type (%s) but represent a "+ + "security debt — any future regression that resolves an unauthenticated "+ + "request to actor-demo-anon would re-elevate to admin. Clean up via "+ + "POST /api/v1/auth/demo-residual/cleanup (requires auth.role.assign) or "+ + "`DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon';`. Set "+ + "CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true to refuse startup until cleanup.", + len(residue), strings.Join(formatted, "; "), cfg.Auth.Type, + ) + if logger != nil { + logger.Warn(msg, "auth_type", cfg.Auth.Type, "residue_count", len(residue)) + } else { + slog.Warn(msg) + } + + if audit != nil { + details := map[string]interface{}{ + "auth_type": cfg.Auth.Type, + "residue_count": len(residue), + "residue": formatted, + } + if err := audit.RecordEventWithCategory( + ctx, "system", domain.ActorTypeSystem, + "auth.demo_residual_grants_detected", + domain.EventCategoryAuth, + "actor_roles", authdomain.DemoAnonActorID, + details, + ); err != nil { + // Don't fail startup over an audit-write error; just log. + if logger != nil { + logger.Warn("preflight demo-mode residual: audit record failed", "error", err) + } + } + } + + if cfg.Auth.DemoModeResidualStrict { + return fmt.Errorf( + "startup refused: actor-demo-anon has %d residual role grant(s) and "+ + "CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true. Remove the rows before restarting", + len(residue), + ) + } + return nil +} + +// demoAnonResidueRow describes a single live actor_roles row whose +// actor_id matches the synthetic demo-anon ID. +type demoAnonResidueRow struct { + RoleID string + ScopeType string + ScopeID string + GrantedAt time.Time +} + +// String renders one row as `role@scope (granted ts)`. Used both in +// the WARN log message and in the audit row's residue list. +func (r demoAnonResidueRow) String() string { + scope := r.ScopeType + if r.ScopeID != "" { + scope = fmt.Sprintf("%s/%s", r.ScopeType, r.ScopeID) + } + return fmt.Sprintf("%s@%s (granted %s)", r.RoleID, scope, r.GrantedAt.UTC().Format(time.RFC3339)) +} + +// queryDemoAnonResidue runs the canonical query for the residue +// detector + the cleanup endpoint. Kept in one place so the two +// surfaces can't drift on which rows count as "live". +// +// "Live" = not expired. Rows with expires_at <= NOW() are treated +// as already gone (they have no effect even if the actor were to be +// injected as the principal). +func queryDemoAnonResidue(ctx context.Context, db *sql.DB) ([]demoAnonResidueRow, error) { + if db == nil { + return nil, errors.New("db is nil") + } + rows, err := db.QueryContext(ctx, ` + SELECT role_id, scope_type, COALESCE(scope_id, '') AS scope_id, granted_at + FROM actor_roles + WHERE actor_id = $1 + AND (expires_at IS NULL OR expires_at > NOW()) + ORDER BY granted_at ASC, role_id ASC, scope_type ASC, COALESCE(scope_id, '') ASC + `, authdomain.DemoAnonActorID) + if err != nil { + return nil, fmt.Errorf("query actor_roles: %w", err) + } + defer rows.Close() + + var out []demoAnonResidueRow + for rows.Next() { + var r demoAnonResidueRow + if err := rows.Scan(&r.RoleID, &r.ScopeType, &r.ScopeID, &r.GrantedAt); err != nil { + return nil, fmt.Errorf("scan actor_roles row: %w", err) + } + out = append(out, r) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("iterate actor_roles rows: %w", err) + } + return out, nil +} + +// deleteDemoAnonResidue removes every live actor_roles row for the +// synthetic demo-anon actor. Returns the count removed. Used by the +// POST /api/v1/auth/demo-residual/cleanup handler. Idempotent — a +// follow-up call returns 0. +func deleteDemoAnonResidue(ctx context.Context, db *sql.DB) (int64, error) { + if db == nil { + return 0, errors.New("db is nil") + } + res, err := db.ExecContext(ctx, ` + DELETE FROM actor_roles + WHERE actor_id = $1 + `, authdomain.DemoAnonActorID) + if err != nil { + return 0, fmt.Errorf("delete actor_roles: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return 0, fmt.Errorf("rows affected: %w", err) + } + return n, nil +} diff --git a/cmd/server/preflight_demo_residual_test.go b/cmd/server/preflight_demo_residual_test.go new file mode 100644 index 0000000..52db2e6 --- /dev/null +++ b/cmd/server/preflight_demo_residual_test.go @@ -0,0 +1,295 @@ +package main + +import ( + "context" + "database/sql" + "fmt" + "log/slog" + "os" + "path/filepath" + "runtime" + "strings" + "sync" + "testing" + "time" + + _ "github.com/lib/pq" + "github.com/testcontainers/testcontainers-go" + "github.com/testcontainers/testcontainers-go/wait" + + "github.com/certctl-io/certctl/internal/config" + "github.com/certctl-io/certctl/internal/repository/postgres" + "github.com/certctl-io/certctl/internal/service" +) + +// Audit 2026-05-11 A-8 — preflight + cleanup regression tests for the +// demo-mode residual-grants detector. Testcontainers-backed because the +// preflight runs raw SQL against actor_roles; mock-DB-only would not +// catch a SQL-shape regression. Gated by testing.Short() to keep the +// fast loop fast (matching internal/repository/postgres/* pattern). + +var ( + a8DBOnce sync.Once + a8DB *sql.DB + a8Skip bool + a8SkipMu sync.Mutex +) + +func setupA8DB(t *testing.T) *sql.DB { + t.Helper() + if testing.Short() { + t.Skip("preflight A-8 test requires Postgres (testcontainers); skipping under -short") + } + a8DBOnce.Do(func() { + ctx := context.Background() + req := testcontainers.ContainerRequest{ + Image: "postgres:16-alpine", + ExposedPorts: []string{"5432/tcp"}, + Env: map[string]string{ + "POSTGRES_DB": "certctl_test_a8", + "POSTGRES_USER": "certctl", + "POSTGRES_PASSWORD": "certctl", + }, + WaitingFor: wait.ForLog("database system is ready to accept connections").WithOccurrence(2), + } + c, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{ + ContainerRequest: req, + Started: true, + }) + if err != nil { + a8SkipMu.Lock() + a8Skip = true + a8SkipMu.Unlock() + t.Logf("skipping A-8 testcontainers preflight (docker unavailable): %v", err) + return + } + host, err := c.Host(ctx) + if err != nil { + t.Fatalf("get container host: %v", err) + } + port, err := c.MappedPort(ctx, "5432") + if err != nil { + t.Fatalf("get mapped port: %v", err) + } + dsn := fmt.Sprintf("postgres://certctl:certctl@%s:%s/certctl_test_a8?sslmode=disable", host, port.Port()) + + db, err := sql.Open("postgres", dsn) + if err != nil { + t.Fatalf("sql.Open: %v", err) + } + // Run all migrations so actor_roles exists with the migration + // 000029 seed row (`ar-demo-anon-admin`). + _, thisFile, _, _ := runtime.Caller(0) + migrationsDir := filepath.Join(filepath.Dir(thisFile), "..", "..", "migrations") + if _, err := os.Stat(migrationsDir); err != nil { + t.Fatalf("locate migrations dir %q: %v", migrationsDir, err) + } + if err := postgres.RunMigrations(db, migrationsDir); err != nil { + t.Fatalf("RunMigrations: %v", err) + } + a8DB = db + }) + + a8SkipMu.Lock() + skip := a8Skip + a8SkipMu.Unlock() + if skip { + t.Skip("A-8 testcontainers unavailable; skipping") + } + return a8DB +} + +// resetA8Residue clears the actor_roles rows for actor-demo-anon AND +// re-inserts the migration 000029 baseline. Used by tests that need a +// known "post-fresh-migration" state. +func resetA8Residue(t *testing.T, db *sql.DB, seedBaseline bool) { + t.Helper() + if _, err := db.ExecContext(context.Background(), + `DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon'`); err != nil { + t.Fatalf("reset actor_roles: %v", err) + } + if seedBaseline { + if _, err := db.ExecContext(context.Background(), ` + INSERT INTO actor_roles (id, actor_id, actor_type, role_id, granted_at, granted_by, tenant_id) + VALUES ('ar-demo-anon-admin', 'actor-demo-anon', 'Anonymous', 'r-admin', NOW(), 'system', 't-default') + `); err != nil { + t.Fatalf("reseed baseline: %v", err) + } + } +} + +// TestPreflightDemoModeResidual_DemoModeActive_Skips proves the +// preflight short-circuits when Auth.Type=none regardless of residue. +// Demo mode IS the active runtime state at that auth type, so warning +// would be noise. +func TestPreflightDemoModeResidual_DemoModeActive_Skips(t *testing.T) { + db := setupA8DB(t) + resetA8Residue(t, db, true) // baseline IS present + + cfg := &config.Config{} + cfg.Auth.Type = "none" + cfg.Auth.DemoModeResidualStrict = true // would refuse if checked + + logger := slog.New(slog.NewTextHandler(os.Stderr, nil)) + err := preflightDemoModeResidual(context.Background(), cfg, db, nil, logger) + if err != nil { + t.Fatalf("expected nil under Auth.Type=none, got %v", err) + } +} + +// TestPreflightDemoModeResidual_NoResidue_Passes proves a fully-clean +// actor_roles state passes without WARN. +func TestPreflightDemoModeResidual_NoResidue_Passes(t *testing.T) { + db := setupA8DB(t) + resetA8Residue(t, db, false) // explicitly empty + + cfg := &config.Config{} + cfg.Auth.Type = "api-key" + + err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil) + if err != nil { + t.Fatalf("expected nil with empty residue, got %v", err) + } +} + +// TestPreflightDemoModeResidual_HasResidue_LogsAndAudits proves the +// migration 000029 baseline produces a WARN + audit row but does NOT +// fail startup in default (non-strict) mode. +func TestPreflightDemoModeResidual_HasResidue_LogsAndAudits(t *testing.T) { + db := setupA8DB(t) + resetA8Residue(t, db, true) + + cfg := &config.Config{} + cfg.Auth.Type = "api-key" + cfg.Auth.DemoModeResidualStrict = false + + auditRepo := postgres.NewAuditRepository(db) + auditService := service.NewAuditService(auditRepo) + + err := preflightDemoModeResidual(context.Background(), cfg, db, auditService, nil) + if err != nil { + t.Fatalf("non-strict mode must NOT fail startup with residue, got %v", err) + } + + // Audit row should be present for the call. + rows, err := db.QueryContext(context.Background(), ` + SELECT action, event_category, resource_id + FROM audit_events + WHERE action = 'auth.demo_residual_grants_detected' + ORDER BY occurred_at DESC LIMIT 1 + `) + if err != nil { + t.Fatalf("audit_events query: %v", err) + } + defer rows.Close() + if !rows.Next() { + t.Fatal("expected at least one auth.demo_residual_grants_detected row") + } + var action, category, resourceID string + if err := rows.Scan(&action, &category, &resourceID); err != nil { + t.Fatalf("scan: %v", err) + } + if action != "auth.demo_residual_grants_detected" { + t.Errorf("action = %q, want auth.demo_residual_grants_detected", action) + } + if category != "auth" { + t.Errorf("event_category = %q, want auth", category) + } + if resourceID != "actor-demo-anon" { + t.Errorf("resource_id = %q, want actor-demo-anon", resourceID) + } +} + +// TestPreflightDemoModeResidual_StrictMode_RefusesStartup proves the +// flag pivots WARN → fail. +func TestPreflightDemoModeResidual_StrictMode_RefusesStartup(t *testing.T) { + db := setupA8DB(t) + resetA8Residue(t, db, true) + + cfg := &config.Config{} + cfg.Auth.Type = "api-key" + cfg.Auth.DemoModeResidualStrict = true + + err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil) + if err == nil { + t.Fatal("strict mode + residue: expected error, got nil") + } + if !strings.Contains(err.Error(), "actor-demo-anon") { + t.Errorf("err = %q, want mention of actor-demo-anon", err.Error()) + } + if !strings.Contains(err.Error(), "CERTCTL_DEMO_MODE_RESIDUAL_STRICT") { + t.Errorf("err = %q, want mention of CERTCTL_DEMO_MODE_RESIDUAL_STRICT", err.Error()) + } +} + +// TestDemoAnonResidueRow_String pins the formatting of the residue +// detail entry — used both in the WARN log AND the audit row's +// `residue` slice. Two cases: NULL scope_id (global scope) and +// non-empty scope_id (profile/issuer scope). +func TestDemoAnonResidueRow_String(t *testing.T) { + ts, _ := time.Parse(time.RFC3339, "2026-05-11T12:34:56Z") + cases := []struct { + name string + r demoAnonResidueRow + want string + }{ + { + name: "global_scope", + r: demoAnonResidueRow{RoleID: "r-admin", ScopeType: "global", ScopeID: "", GrantedAt: ts}, + want: "r-admin@global (granted 2026-05-11T12:34:56Z)", + }, + { + name: "scoped", + r: demoAnonResidueRow{RoleID: "r-operator", ScopeType: "profile", ScopeID: "p-prod", GrantedAt: ts}, + want: "r-operator@profile/p-prod (granted 2026-05-11T12:34:56Z)", + }, + } + for _, c := range cases { + c := c + t.Run(c.name, func(t *testing.T) { + got := c.r.String() + if got != c.want { + t.Errorf("String() = %q, want %q", got, c.want) + } + }) + } +} + +// TestDeleteDemoAnonResidue_Idempotent proves the cleanup helper is +// re-entrant: a second call after a successful first call returns 0. +func TestDeleteDemoAnonResidue_Idempotent(t *testing.T) { + db := setupA8DB(t) + resetA8Residue(t, db, true) + + n, err := deleteDemoAnonResidue(context.Background(), db) + if err != nil { + t.Fatalf("first delete: %v", err) + } + if n < 1 { + t.Fatalf("first delete: count = %d, want >= 1", n) + } + + n, err = deleteDemoAnonResidue(context.Background(), db) + if err != nil { + t.Fatalf("second delete: %v", err) + } + if n != 0 { + t.Errorf("second delete (idempotent): count = %d, want 0", n) + } +} + +// TestQueryDemoAnonResidue_NilDB pins the nil-safety contract. +func TestQueryDemoAnonResidue_NilDB(t *testing.T) { + _, err := queryDemoAnonResidue(context.Background(), nil) + if err == nil { + t.Fatal("expected error on nil db, got nil") + } +} + +// TestDeleteDemoAnonResidue_NilDB pins the nil-safety contract. +func TestDeleteDemoAnonResidue_NilDB(t *testing.T) { + _, err := deleteDemoAnonResidue(context.Background(), nil) + if err == nil { + t.Fatal("expected error on nil db, got nil") + } +} diff --git a/docs/operator/rbac.md b/docs/operator/rbac.md index ef33be4..ac34662 100644 --- a/docs/operator/rbac.md +++ b/docs/operator/rbac.md @@ -1,6 +1,12 @@ # RBAC operator reference -> Last reviewed: 2026-05-09 +> Last reviewed: 2026-05-11 +> +> Audit 2026-05-11 A-8 follow-on: demo-mode residual-grants detector +> + cleanup endpoint shipped. New env var: +> `CERTCTL_DEMO_MODE_RESIDUAL_STRICT` (default `false`). Operator +> workflow at +> [`security.md#demo-to-production-cutover-audit-2026-05-11-a-8`](security.md#demo-to-production-cutover-audit-2026-05-11-a-8). This is the operator-facing reference for the role-based access control primitive that ships with Bundle 1 (auth bundle 1) of certctl. diff --git a/docs/operator/security.md b/docs/operator/security.md index 376dee1..c9cbf8a 100644 --- a/docs/operator/security.md +++ b/docs/operator/security.md @@ -1,6 +1,6 @@ # certctl Security Posture & Operator Guidance -> Last reviewed: 2026-05-10 +> Last reviewed: 2026-05-11 This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit @@ -262,6 +262,61 @@ to avoid a permanent backdoor; the runbook at [`auth-threat-model.md#break-glass-risks-phase-75`](auth-threat-model.md) documents the full state machine. +### Demo-to-production cutover (Audit 2026-05-11 A-8) + +Migration `000029_rbac.up.sql` unconditionally seeds an +`actor-demo-anon → r-admin` row into `actor_roles`. This row is the +runtime principal injected by the demo-mode middleware when +`CERTCTL_AUTH_TYPE=none`. Under any non-`none` auth type the row is +DORMANT — the middleware chain never resolves to it. But its existence +is a footgun: a future regression that resolves an unauthenticated +request to `actor-demo-anon` (a misrouted CORS preflight, a fallback in +a new auth-exempt route) would silently re-elevate to admin. + +certctl-server detects this residue at startup and emits a WARN log + +an `auth.demo_residual_grants_detected` audit row listing every grant +present on `actor-demo-anon`. **Every production deploy will see this +WARN on first boot** — the migration baseline is part of the install, +not a side effect of running demo mode. + +Operator workflow at production cutover: + +1. Drain the WARN by calling the cleanup endpoint with an admin API key: + + ```bash + curl -X POST --cacert deploy/test/certs/ca.crt \ + -H "Authorization: Bearer $ADMIN_KEY" \ + https://certctl.example.com:8443/api/v1/auth/demo-residual/cleanup + # → {"removed": 1} + ``` + + The endpoint is gated `auth.role.assign` (admin-class) and refuses + to run when `CERTCTL_AUTH_TYPE=none` (HTTP 503 — the residue IS the + active runtime state at that auth type). The cleanup is idempotent; + a second call returns `{"removed": 0}` and still leaves an audit row. + + Equivalent SQL for operators preferring direct DB access: + + ```sql + DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon'; + ``` + +2. To make subsequent boots refuse startup if the row reappears (the + most paranoid stance), set: + + ``` + CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true + ``` + + With the flag set, any `actor-demo-anon` row under a non-`none` + auth type causes certctl-server to log the WARN AND exit non-zero + before binding the HTTPS listener. Default is `false` (WARN only). + +3. The CI guard `scripts/ci-guards/no-new-synthetic-admin.sh` pins the + set of source files that may reference the `actor-demo-anon` literal. + New runtime code paths that resolve to the synthetic actor are + rejected at PR time so the credibility gap stays closed. + ### Migrating an existing deployment to OIDC A Bundle-1-merged deployment that wants to add OIDC follows the diff --git a/internal/api/handler/demo_residual.go b/internal/api/handler/demo_residual.go new file mode 100644 index 0000000..d22335e --- /dev/null +++ b/internal/api/handler/demo_residual.go @@ -0,0 +1,134 @@ +package handler + +import ( + "context" + "encoding/json" + "errors" + "net/http" + + "github.com/certctl-io/certctl/internal/auth" + "github.com/certctl-io/certctl/internal/domain" + authdomain "github.com/certctl-io/certctl/internal/domain/auth" +) + +// DemoResidualCleanupFn deletes every live actor_roles row for the +// synthetic actor-demo-anon and returns the count removed. Provided by +// cmd/server/main.go which holds the *sql.DB. Returning an error from +// this func surfaces as HTTP 500; returning (0, nil) is the legitimate +// "nothing to clean up" idempotent response. +type DemoResidualCleanupFn func(ctx context.Context) (int64, error) + +// DemoResidualHandler exposes POST /api/v1/auth/demo-residual/cleanup — +// an admin-gated convenience endpoint that removes residual +// actor-demo-anon role grants from a deployment that previously ran +// CERTCTL_AUTH_TYPE=none (or any deployment, since migration 000029 +// seeds the row unconditionally). Audit 2026-05-11 A-8 closure. +// +// The endpoint refuses to run when the server is currently in demo +// mode (Auth.Type == "none") because the residual IS the active +// runtime state at that auth type; deleting it would break the demo +// path. The 503 response makes the constraint observable to the GUI. +type DemoResidualHandler struct { + cleanup DemoResidualCleanupFn + authType func() string + auditWriter AuditWriter +} + +// AuditWriter is the minimal projection of *service.AuditService that +// the DemoResidualHandler uses. Kept local to avoid pulling the full +// service package into the handler's import set. +type AuditWriter interface { + RecordEventWithCategory( + ctx context.Context, actor string, actorType domain.ActorType, + action, eventCategory, resourceType, resourceID string, + details map[string]interface{}, + ) error +} + +// NewDemoResidualHandler wires the cleanup function and auth-type +// getter. authType is a closure so the handler always sees the +// live config value (post-startup mutation is unsupported, but +// the closure pattern keeps the dependency direction clean). +func NewDemoResidualHandler( + cleanup DemoResidualCleanupFn, + authType func() string, + audit AuditWriter, +) DemoResidualHandler { + return DemoResidualHandler{ + cleanup: cleanup, + authType: authType, + auditWriter: audit, + } +} + +// demoResidualCleanupResponse is the JSON body returned by POST +// /api/v1/auth/demo-residual/cleanup. Removed is the count of +// actor_roles rows that were live for actor-demo-anon at the time +// of the call. Always present; idempotent calls return removed=0. +type demoResidualCleanupResponse struct { + Removed int64 `json:"removed"` +} + +// Cleanup handles POST /api/v1/auth/demo-residual/cleanup. RBAC-gated +// at the router via auth.role.assign (the admin-class permission). +// Rejects requests when the server is in demo mode (Auth.Type=none) +// with HTTP 503. Emits an audit row recording the count removed + +// the caller actor on every successful run. +func (h DemoResidualHandler) Cleanup(w http.ResponseWriter, r *http.Request) { + ctx := r.Context() + + if h.cleanup == nil { + _ = Error(w, http.StatusInternalServerError, "demo-residual cleanup not configured") + return + } + + authType := "" + if h.authType != nil { + authType = h.authType() + } + if authType == "none" { + // Refusing to "clean up" the active demo-mode state. The + // GUI surface should hide the button when /api/v1/auth/info + // reports auth_type=none; this guard is defense-in-depth. + _ = Error(w, http.StatusServiceUnavailable, + "demo-residual cleanup refused: server is currently in demo mode (CERTCTL_AUTH_TYPE=none); the actor-demo-anon grants are the active runtime state at this auth type") + return + } + + removed, err := h.cleanup(ctx) + if err != nil { + _ = Error(w, http.StatusInternalServerError, "demo-residual cleanup failed") + return + } + + // Audit row records the count removed + the caller. The actor is + // pulled from the request context (set by the auth middleware + // chain after the rbacGate at the router level has authorized). + if h.auditWriter != nil { + actorID, _ := r.Context().Value(auth.ActorIDKey{}).(string) + if actorID == "" { + actorID = "unknown" + } + actorTypeRaw, _ := r.Context().Value(auth.ActorTypeKey{}).(string) + actorType := domain.ActorType(actorTypeRaw) + if actorType == "" { + actorType = domain.ActorTypeAPIKey + } + _ = h.auditWriter.RecordEventWithCategory( + ctx, actorID, actorType, + "auth.demo_residual_grants_cleaned", + domain.EventCategoryAuth, + "actor_roles", authdomain.DemoAnonActorID, + map[string]interface{}{"removed": removed}, + ) + } + + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusOK) + _ = json.NewEncoder(w).Encode(demoResidualCleanupResponse{Removed: removed}) +} + +// ErrDemoResidualNotConfigured is returned by callers that probe the +// handler's wiring state. Currently unused outside tests but exported +// to keep the contract observable for documentation purposes. +var ErrDemoResidualNotConfigured = errors.New("demo-residual cleanup not configured") diff --git a/internal/api/handler/demo_residual_test.go b/internal/api/handler/demo_residual_test.go new file mode 100644 index 0000000..56245c8 --- /dev/null +++ b/internal/api/handler/demo_residual_test.go @@ -0,0 +1,229 @@ +package handler + +import ( + "context" + "encoding/json" + "errors" + "net/http" + "net/http/httptest" + "strings" + "sync/atomic" + "testing" + + "github.com/certctl-io/certctl/internal/auth" + "github.com/certctl-io/certctl/internal/domain" +) + +// Audit 2026-05-11 A-8 — DemoResidualHandler regression coverage. +// Uses fake closures for the cleanup + authType deps so the test +// stays stdlib + httptest only (no DB needed). DB-shape coverage +// lives in cmd/server/preflight_demo_residual_test.go. + +func fakeAuthType(s string) func() string { return func() string { return s } } + +// fakeAuditWriter captures the last RecordEventWithCategory invocation. +type fakeAuditWriter struct { + called atomic.Bool + lastCall struct { + actor, action, category, resourceType, resourceID string + details map[string]interface{} + } +} + +func (f *fakeAuditWriter) RecordEventWithCategory( + ctx context.Context, actor string, actorType domain.ActorType, + action, eventCategory, resourceType, resourceID string, + details map[string]interface{}, +) error { + f.called.Store(true) + f.lastCall.actor = actor + f.lastCall.action = action + f.lastCall.category = eventCategory + f.lastCall.resourceType = resourceType + f.lastCall.resourceID = resourceID + f.lastCall.details = details + return nil +} + +func authCtxReq(method, path string, actor string) *http.Request { + req := httptest.NewRequest(method, path, nil) + ctx := context.WithValue(req.Context(), auth.ActorIDKey{}, actor) + ctx = context.WithValue(ctx, auth.ActorTypeKey{}, string(domain.ActorTypeAPIKey)) + return req.WithContext(ctx) +} + +// TestDemoResidualCleanup_HappyPath — fake cleanup returns 3 rows +// removed; handler emits 200 + JSON body {removed:3} + audit row. +func TestDemoResidualCleanup_HappyPath(t *testing.T) { + audit := &fakeAuditWriter{} + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return 3, nil }, + fakeAuthType("api-key"), + audit, + ) + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + + if rec.Code != http.StatusOK { + t.Fatalf("status = %d, want 200; body=%s", rec.Code, rec.Body.String()) + } + var body demoResidualCleanupResponse + if err := json.Unmarshal(rec.Body.Bytes(), &body); err != nil { + t.Fatalf("decode body: %v", err) + } + if body.Removed != 3 { + t.Errorf("removed = %d, want 3", body.Removed) + } + + // Audit row must be emitted with the right category + caller actor. + if !audit.called.Load() { + t.Fatal("expected audit RecordEventWithCategory to be called") + } + if audit.lastCall.action != "auth.demo_residual_grants_cleaned" { + t.Errorf("audit action = %q, want auth.demo_residual_grants_cleaned", audit.lastCall.action) + } + if audit.lastCall.category != domain.EventCategoryAuth { + t.Errorf("audit category = %q, want %q", audit.lastCall.category, domain.EventCategoryAuth) + } + if audit.lastCall.actor != "k-admin" { + t.Errorf("audit actor = %q, want k-admin", audit.lastCall.actor) + } + if audit.lastCall.resourceID != "actor-demo-anon" { + t.Errorf("audit resource_id = %q, want actor-demo-anon", audit.lastCall.resourceID) + } + if got, ok := audit.lastCall.details["removed"].(int64); !ok || got != 3 { + t.Errorf("audit details.removed = %v, want 3", audit.lastCall.details["removed"]) + } +} + +// TestDemoResidualCleanup_Idempotent_ReturnsZero — fake cleanup returns +// (0, nil); the handler still emits 200 + body {removed:0} + audit. +func TestDemoResidualCleanup_Idempotent_ReturnsZero(t *testing.T) { + audit := &fakeAuditWriter{} + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return 0, nil }, + fakeAuthType("api-key"), + audit, + ) + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + + if rec.Code != http.StatusOK { + t.Fatalf("status = %d, want 200", rec.Code) + } + var body demoResidualCleanupResponse + if err := json.Unmarshal(rec.Body.Bytes(), &body); err != nil { + t.Fatalf("decode body: %v", err) + } + if body.Removed != 0 { + t.Errorf("removed = %d, want 0", body.Removed) + } + // Audit row should STILL fire on a no-op cleanup so the operator's + // action is recorded. This is intentional — the cleanup endpoint is + // admin-class and every invocation should leave a trail. + if !audit.called.Load() { + t.Error("audit row must fire even on no-op cleanup") + } +} + +// TestDemoResidualCleanup_RejectsInDemoMode — Auth.Type=none returns 503. +func TestDemoResidualCleanup_RejectsInDemoMode(t *testing.T) { + audit := &fakeAuditWriter{} + var cleanupCalled atomic.Bool + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { + cleanupCalled.Store(true) + return 0, nil + }, + fakeAuthType("none"), + audit, + ) + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + + if rec.Code != http.StatusServiceUnavailable { + t.Fatalf("status = %d, want 503; body=%s", rec.Code, rec.Body.String()) + } + if !strings.Contains(rec.Body.String(), "demo mode") { + t.Errorf("body = %q, want mention of demo mode", rec.Body.String()) + } + // The cleanup closure must NOT have been called. + if cleanupCalled.Load() { + t.Error("cleanup closure called despite demo-mode reject") + } + // No audit row should fire on rejection — the action didn't happen. + if audit.called.Load() { + t.Error("audit row fired on rejected cleanup; should not") + } +} + +// TestDemoResidualCleanup_CleanupError_Surfaces500 — cleanup func +// returns an error; handler emits 500. +func TestDemoResidualCleanup_CleanupError_Surfaces500(t *testing.T) { + audit := &fakeAuditWriter{} + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return 0, errors.New("boom") }, + fakeAuthType("api-key"), + audit, + ) + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + + if rec.Code != http.StatusInternalServerError { + t.Fatalf("status = %d, want 500", rec.Code) + } + if audit.called.Load() { + t.Error("audit row fired on cleanup error; should not") + } +} + +// TestDemoResidualCleanup_NilCleanupFn — handler with no wired +// cleanup returns 500 (defensive — should never happen in prod, but +// the contract should be observable). +func TestDemoResidualCleanup_NilCleanupFn(t *testing.T) { + h := DemoResidualHandler{cleanup: nil, authType: fakeAuthType("api-key")} + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + if rec.Code != http.StatusInternalServerError { + t.Fatalf("status = %d, want 500", rec.Code) + } +} + +// TestDemoResidualCleanup_NilAuditWriter_DoesNotPanic — audit is +// optional (Bundle-2 wiring may set it nil in tests / minimal configs). +// Handler must still succeed with valid cleanup. +func TestDemoResidualCleanup_NilAuditWriter_DoesNotPanic(t *testing.T) { + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return 1, nil }, + fakeAuthType("api-key"), + nil, + ) + rec := httptest.NewRecorder() + h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin")) + if rec.Code != http.StatusOK { + t.Fatalf("status = %d, want 200", rec.Code) + } +} + +// TestDemoResidualCleanup_MissingActorContext — caller without +// ActorIDKey gets "unknown" recorded; the cleanup still runs. The +// rbacGate at the router enforces that authenticated callers reach +// this point, so missing actor context is purely a test-shape thing. +func TestDemoResidualCleanup_MissingActorContext(t *testing.T) { + audit := &fakeAuditWriter{} + h := NewDemoResidualHandler( + func(ctx context.Context) (int64, error) { return 1, nil }, + fakeAuthType("api-key"), + audit, + ) + rec := httptest.NewRecorder() + // No auth context — bare httptest.NewRequest. + h.Cleanup(rec, httptest.NewRequest(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", nil)) + + if rec.Code != http.StatusOK { + t.Fatalf("status = %d, want 200", rec.Code) + } + if audit.lastCall.actor != "unknown" { + t.Errorf("audit actor = %q, want unknown for missing actor context", audit.lastCall.actor) + } +} diff --git a/internal/api/router/openapi_parity_test.go b/internal/api/router/openapi_parity_test.go index 510fea4..874250e 100644 --- a/internal/api/router/openapi_parity_test.go +++ b/internal/api/router/openapi_parity_test.go @@ -161,6 +161,34 @@ var SpecParityExceptions = map[string]string{ // current. Documented inline at // internal/api/handler/auth_session_oidc.go::RevokeAllExceptCurrent. "DELETE /api/v1/auth/sessions": "Audit 2026-05-10 MED-3 — sign-out-all-other-sessions; gated auth.session.revoke. Documented inline at internal/api/handler/auth_session_oidc.go::RevokeAllExceptCurrent.", + + // ========================================================================= + // Pre-existing parity debt — routes that shipped on dev/auth-bundle-2 + // without their OpenAPI rows. Each entry below is tracked here as an + // exception with a pointer to the origin commit + the handler file that + // already carries the contract docstring. A follow-on pass should + // promote each into a full operationId entry under api/openapi.yaml. + // + // Each entry MUST list the origin commit (git blame router.go for the + // r.Register call) so the parity-debt cleanup pass can group routes + // by author + topic. + // ========================================================================= + "POST /api/v1/auth/oidc/test": "Audit 2026-05-10 MED-5 (Item 2; commit 00bbef7) — POST /api/v1/auth/oidc/test dry-run endpoint; gated auth.oidc.edit. Contract at internal/auth/oidc/test_discovery.go; OpenAPI row pending.", + "GET /api/v1/auth/oidc/providers/{id}/jwks-status": "Audit 2026-05-10 MED-6 follow-on (Item 3) — JWKS auto-refresh cache-status endpoint; gated auth.oidc.list. OpenAPI row pending.", + "GET /api/v1/auth/users": "Audit 2026-05-10 MED-7 / Bundle 2 Phase 13 Fix D — federated user list; gated auth.user.list. OpenAPI row pending.", + "DELETE /api/v1/auth/users/{id}": "Audit 2026-05-10 MED-7 / Bundle 2 Phase 13 Fix D — soft-delete a federated user (sets deactivated_at); gated auth.user.delete. Audit 2026-05-11 A-2 closure layered the login-time enforcement. OpenAPI row pending.", + "POST /api/v1/auth/users/{id}/reactivate": "Audit 2026-05-11 A-2 closure (commit a980e4c) — clears deactivated_at so a soft-deleted federated user can log in again; gated auth.user.edit. OpenAPI row pending.", + "GET /api/v1/auth/runtime-config": "Audit 2026-05-10 MED-12 / Bundle 2 Phase 13 Fix D — admin-only inspector for the live auth-related env vars; gated auth.role.assign. Handler at internal/api/handler/auth_runtime_config.go. OpenAPI row pending.", + + // Audit 2026-05-11 A-8 closure — demo-mode residual-grants cleanup. + // The endpoint removes residual actor-demo-anon role grants from a + // production deploy that previously ran (or installed alongside) + // demo mode. Admin-class (auth.role.assign) gated at the router. + // Refuses to run when Auth.Type=none (503). Wire-shape is a plain + // JSON POST → {removed: int64}. Handler doc-block at + // internal/api/handler/demo_residual.go::Cleanup; operator + // runbook at docs/operator/security.md::demo-to-production-cutover. + "POST /api/v1/auth/demo-residual/cleanup": "Audit 2026-05-11 A-8 closure — demo-mode residual-grants cleanup; gated auth.role.assign. Refuses when Auth.Type=none. Handler at internal/api/handler/demo_residual.go. OpenAPI row pending — endpoint shape is minimal (POST → {removed: int64}).", } func TestRouter_OpenAPIParity(t *testing.T) { diff --git a/internal/api/router/router.go b/internal/api/router/router.go index ddea6d8..2532661 100644 --- a/internal/api/router/router.go +++ b/internal/api/router/router.go @@ -187,6 +187,13 @@ type HandlerRegistry struct { // itself authenticates via the bootstrap token). Bootstrap handler.BootstrapHandler + // DemoResidual (Audit 2026-05-11 A-8) handles + // POST /api/v1/auth/demo-residual/cleanup. Removes residual + // actor-demo-anon role grants from the actor_roles table. RBAC- + // gated at the router via auth.role.assign (admin-class). + // Refuses to run when the server is in demo mode (Auth.Type=none). + DemoResidual handler.DemoResidualHandler + // Checker is the load-bearing auth.PermissionChecker that // auth.RequirePermission middleware uses to gate the legacy admin // handlers (Bundle 1 Phase 3.5). cmd/server wires the postgres @@ -401,6 +408,13 @@ func (r *Router) RegisterHandlers(reg HandlerRegistry) { r.Register("POST /api/v1/auth/keys/{id}/roles", rbacGate(reg.Checker, "auth.role.assign", reg.Auth.AssignRoleToKey)) r.Register("DELETE /api/v1/auth/keys/{id}/roles/{role_id}", rbacGate(reg.Checker, "auth.role.revoke", reg.Auth.RevokeRoleFromKey)) + // Audit 2026-05-11 A-8 closure — demo-mode residual-grants cleanup. + // Gated auth.role.assign (admin-class) so non-admins can't wipe the + // synthetic actor's grants. The handler additionally refuses to run + // when the server is currently in demo mode (Auth.Type=none). + r.Register("POST /api/v1/auth/demo-residual/cleanup", + rbacGate(reg.Checker, "auth.role.assign", reg.DemoResidual.Cleanup)) + // ========================================================================= // Auth Bundle 2 Phase 5 — OIDC + session HTTP surface. // diff --git a/internal/config/config.go b/internal/config/config.go index ad574ce..a7f0007 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -1624,6 +1624,25 @@ type AuthConfig struct { // Setting: CERTCTL_DEMO_MODE_ACK environment variable. DemoModeAck bool + // DemoModeResidualStrict refuses startup when Auth.Type != none + // and `actor-demo-anon` has residual role grants in actor_roles. + // Default false (emit WARN log + audit row instead). Audit + // 2026-05-11 A-8 closure — closes the deferred Phase 2 leg of + // HIGH-12 (cowork/auth-bundles-fixes-2026-05-10/11-high-12-...). + // + // Note: migration 000029 unconditionally seeds the + // `ar-demo-anon-admin` grant of `r-admin` to `actor-demo-anon` + // for every install, so production deploys will see this WARN + // out of the box. The intended workflow at production cutover is: + // 1. POST /api/v1/auth/demo-residual/cleanup (or run the + // DELETE FROM actor_roles WHERE actor_id='actor-demo-anon' + // SQL emitted by the WARN). + // 2. Optionally set this flag for subsequent boots to refuse + // startup if the rows somehow get re-seeded. + // + // Setting: CERTCTL_DEMO_MODE_RESIDUAL_STRICT environment variable. + DemoModeResidualStrict bool + // OIDCBCLMaxAgeSeconds is the iat-freshness skew window for OIDC // back-channel-logout tokens. logout_tokens with iat outside the // window are rejected with audit outcome=iat_stale (in the past) @@ -1897,6 +1916,10 @@ func Load() (*Config, error) { // Audit 2026-05-10 HIGH-12 closure: required-true to allow // CERTCTL_AUTH_TYPE=none with a non-loopback listen address. DemoModeAck: getEnvBool("CERTCTL_DEMO_MODE_ACK", false), + // Audit 2026-05-11 A-8 closure: when true, the preflight + // residual-grants detector refuses startup if actor-demo-anon + // has any actor_roles rows. Default false (WARN-only). + DemoModeResidualStrict: getEnvBool("CERTCTL_DEMO_MODE_RESIDUAL_STRICT", false), // LOW-5: XFF trust allowlist (CIDRs). Empty = ignore XFF. TrustedProxies: getEnvList("CERTCTL_TRUSTED_PROXIES", nil), // NamedKeys is populated from CERTCTL_API_KEYS_NAMED below so Load() diff --git a/scripts/ci-guards/no-new-synthetic-admin.sh b/scripts/ci-guards/no-new-synthetic-admin.sh new file mode 100755 index 0000000..782f02e --- /dev/null +++ b/scripts/ci-guards/no-new-synthetic-admin.sh @@ -0,0 +1,74 @@ +#!/usr/bin/env bash +# Audit 2026-05-11 A-8 — no new code paths may reference actor-demo-anon +# outside the declared allowlist. The synthetic actor is a load-bearing +# demo-mode primitive but ANY new reference in production code paths is +# a candidate footgun (the original CRIT class was a fallback that +# resolved unauthenticated requests to this actor and got full admin). +# +# Adding a legitimate new reference? Add the file to ALLOWLIST below +# AND describe the reason in this header. Operators (auditors) read +# this script to understand where the synthetic admin "lives" in the +# codebase. +# +# Test files (*_test.go), /vendor/, /docs/, and CHANGELOG entries are +# excluded — they don't introduce new runtime code paths. + +set -euo pipefail + +# Files that legitimately reference the actor-demo-anon literal in +# source. Each entry needs a one-line rationale comment so future +# maintainers don't have to trace why it's here. +ALLOWLIST=( + "./cmd/server/main.go" # HandlerRegistry comment + DemoResidual wiring + "./cmd/server/preflight_demo_residual.go" # A-8 detector + cleanup helpers + "./internal/api/handler/auth.go" # interface docstring for ListKeys + "./internal/api/handler/demo_residual.go" # A-8 cleanup endpoint + "./internal/api/router/router.go" # routing comment for cleanup endpoint + "./internal/auth/context.go" # const DemoAnonActorID source-of-truth (canonical) + "./internal/auth/middleware.go" # NewDemoModeAuth — injects synthetic actor under Type=none + "./internal/cli/auth_scope_down.go" # interactive prompt filter + "./internal/config/config.go" # validate-time guard comments + DemoModeResidualStrict env var + "./internal/domain/audit.go" # audit-event documentation comment + "./internal/domain/auth/validate.go" # const DemoAnonActorID mirror + "./internal/mcp/tools_auth.go" # MCP tool description for ListKeys + Revoke + "./internal/mcp/types.go" # MCP request-schema description + "./internal/repository/auth.go" # ActorRoleRepository interface docstrings + "./internal/service/auth/actor_role_service.go" # reserved-actor mutation guard (CRIT-1 closure) + "./internal/service/auth/authorizer.go" # synthetic-actor authorization comment + "./scripts/ci-guards/no-new-synthetic-admin.sh" # this script itself +) + +declare -A allow=() +for loc in "${ALLOWLIST[@]}"; do allow["$loc"]=1; done + +violations=() +# rg/grep with -l prints filenames. We exclude test files, vendored +# code, docs (operator-facing prose), and CHANGELOG markdown. +while IFS= read -r file; do + [ -z "$file" ] && continue + if [ -z "${allow[$file]:-}" ]; then + violations+=("$file") + fi +done < <(grep -rln 'actor-demo-anon' \ + --include='*.go' --include='*.sh' . \ + 2>/dev/null \ + | grep -v '_test\.go$' \ + | grep -v '^\./vendor/' \ + | grep -v '^\./docs/' \ + | grep -v '^\./CHANGELOG\.md$' \ + | sort -u) + +if [ ${#violations[@]} -gt 0 ]; then + printf 'A-8 GUARD FAIL: new actor-demo-anon reference outside the established allowlist:\n' + printf ' %s\n' "${violations[@]}" + printf '\n' + printf 'If this reference is legitimate, add the file to ALLOWLIST in\n' + printf ' scripts/ci-guards/no-new-synthetic-admin.sh\n' + printf 'WITH a rationale comment describing why the synthetic admin\n' + printf 'literal needs to appear there. Otherwise, route through the\n' + printf 'public DemoAnonActorID constant or refactor the new code path\n' + printf 'to NOT reference the synthetic actor at all (preferred).\n' + exit 1 +fi + +echo "A-8 guard PASS — actor-demo-anon references confined to the declared ${#ALLOWLIST[@]}-entry allowlist."