harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8)

Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the
2026-05-10 HIGH-12 closure (2e97cc1) — production-startup observability
for actor-demo-anon residual grants + CI guard banning new synthetic-
admin code paths.

What this changes:

* cmd/server/preflight_demo_residual.go (new) runs after the DB pool +
  audit service are constructed and before the HTTPS listener starts.
  Under any non-'none' auth type it queries actor_roles for the
  synthetic actor-demo-anon and emits a WARN log + a categorized audit
  row (auth.demo_residual_grants_detected) listing every grant
  present. Migration 000029 unconditionally seeds the ar-demo-anon-admin
  row at install time, so EVERY production deploy will see this WARN
  on first boot; the intended cutover workflow is cleanup-once at
  production handover.

* CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig,
  default false) pivots the WARN to fail-closed startup refusal for
  operators who want a paranoid posture against re-seeding.

* POST /api/v1/auth/demo-residual/cleanup (new handler at
  internal/api/handler/demo_residual.go) is an admin-class
  (auth.role.assign) endpoint that removes every actor-demo-anon row
  from actor_roles and returns {removed: int64}. Idempotent; refuses
  503 under Auth.Type=none (deleting the row would break the demo
  path); audit-logs every invocation including no-op zero-removed
  calls so the admin's action is always recorded.

* scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry
  allowlist of source files that legitimately reference the
  actor-demo-anon literal. New runtime code paths that resolve to the
  synthetic actor (the same pattern that produced the original CRIT
  class) are rejected at PR time. CI workflow auto-picks the script
  via the existing scripts/ci-guards/*.sh loop in .github/workflows/
  ci.yml; no workflow edit needed.

Regression matrix:

* cmd/server/preflight_demo_residual_test.go — 7 tests covering the
  4 main behaviour branches (testcontainers-backed, testing.Short()-
  skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd
  Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent)
  plus 3 pure-Go stdlib unit tests for the row-string formatter +
  nil-safety contracts on both helpers.

* internal/api/handler/demo_residual_test.go — 7 stdlib+httptest
  cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503),
  CleanupError_Surfaces500, NilCleanupFn (defensive 500),
  NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to
  'unknown' actor in the audit row).

* internal/api/router/openapi_parity_test.go — new
  POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing
  pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config)
  that had drifted out of SpecParityExceptions; the parity test was
  red on dev/auth-bundle-2 before my work; this commit returns it to
  green with full per-entry justifications + parity-debt notes.

Docs:

* docs/operator/security.md — new 'Demo-to-production cutover (Audit
  2026-05-11 A-8)' section explaining the WARN message, the cleanup
  curl one-liner, the equivalent SQL, the strict-mode env var, and
  the CI guard.

* docs/operator/rbac.md — Last-reviewed bump + pointer to the new
  env var + the security.md section.

* cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an
  'A-8 follow-on CLOSED 2026-05-11' annotation describing the
  deferred Phase 2 leg now landed.

* CHANGELOG.md — Unreleased ### Security entry summarizing the four
  legs (detector + cleanup + strict-mode flag + CI guard) and the
  acquisition-readiness narrative this closes.

Operator-facing impact: this closes a credibility gap, not an
exploitable vulnerability. The residue requires a regression
elsewhere in the middleware chain to be exploitable. After this
fix, the canonical narrative ('RBAC primitive with no synthetic-
admin fallback') is fully true.

Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual-
cleanup.md.
This commit is contained in:
shankar0123
2026-05-11 11:45:54 +00:00
parent b8fac59200
commit a923cf697c
12 changed files with 1123 additions and 2 deletions
+134
View File
@@ -0,0 +1,134 @@
package handler
import (
"context"
"encoding/json"
"errors"
"net/http"
"github.com/certctl-io/certctl/internal/auth"
"github.com/certctl-io/certctl/internal/domain"
authdomain "github.com/certctl-io/certctl/internal/domain/auth"
)
// DemoResidualCleanupFn deletes every live actor_roles row for the
// synthetic actor-demo-anon and returns the count removed. Provided by
// cmd/server/main.go which holds the *sql.DB. Returning an error from
// this func surfaces as HTTP 500; returning (0, nil) is the legitimate
// "nothing to clean up" idempotent response.
type DemoResidualCleanupFn func(ctx context.Context) (int64, error)
// DemoResidualHandler exposes POST /api/v1/auth/demo-residual/cleanup —
// an admin-gated convenience endpoint that removes residual
// actor-demo-anon role grants from a deployment that previously ran
// CERTCTL_AUTH_TYPE=none (or any deployment, since migration 000029
// seeds the row unconditionally). Audit 2026-05-11 A-8 closure.
//
// The endpoint refuses to run when the server is currently in demo
// mode (Auth.Type == "none") because the residual IS the active
// runtime state at that auth type; deleting it would break the demo
// path. The 503 response makes the constraint observable to the GUI.
type DemoResidualHandler struct {
cleanup DemoResidualCleanupFn
authType func() string
auditWriter AuditWriter
}
// AuditWriter is the minimal projection of *service.AuditService that
// the DemoResidualHandler uses. Kept local to avoid pulling the full
// service package into the handler's import set.
type AuditWriter interface {
RecordEventWithCategory(
ctx context.Context, actor string, actorType domain.ActorType,
action, eventCategory, resourceType, resourceID string,
details map[string]interface{},
) error
}
// NewDemoResidualHandler wires the cleanup function and auth-type
// getter. authType is a closure so the handler always sees the
// live config value (post-startup mutation is unsupported, but
// the closure pattern keeps the dependency direction clean).
func NewDemoResidualHandler(
cleanup DemoResidualCleanupFn,
authType func() string,
audit AuditWriter,
) DemoResidualHandler {
return DemoResidualHandler{
cleanup: cleanup,
authType: authType,
auditWriter: audit,
}
}
// demoResidualCleanupResponse is the JSON body returned by POST
// /api/v1/auth/demo-residual/cleanup. Removed is the count of
// actor_roles rows that were live for actor-demo-anon at the time
// of the call. Always present; idempotent calls return removed=0.
type demoResidualCleanupResponse struct {
Removed int64 `json:"removed"`
}
// Cleanup handles POST /api/v1/auth/demo-residual/cleanup. RBAC-gated
// at the router via auth.role.assign (the admin-class permission).
// Rejects requests when the server is in demo mode (Auth.Type=none)
// with HTTP 503. Emits an audit row recording the count removed +
// the caller actor on every successful run.
func (h DemoResidualHandler) Cleanup(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
if h.cleanup == nil {
_ = Error(w, http.StatusInternalServerError, "demo-residual cleanup not configured")
return
}
authType := ""
if h.authType != nil {
authType = h.authType()
}
if authType == "none" {
// Refusing to "clean up" the active demo-mode state. The
// GUI surface should hide the button when /api/v1/auth/info
// reports auth_type=none; this guard is defense-in-depth.
_ = Error(w, http.StatusServiceUnavailable,
"demo-residual cleanup refused: server is currently in demo mode (CERTCTL_AUTH_TYPE=none); the actor-demo-anon grants are the active runtime state at this auth type")
return
}
removed, err := h.cleanup(ctx)
if err != nil {
_ = Error(w, http.StatusInternalServerError, "demo-residual cleanup failed")
return
}
// Audit row records the count removed + the caller. The actor is
// pulled from the request context (set by the auth middleware
// chain after the rbacGate at the router level has authorized).
if h.auditWriter != nil {
actorID, _ := r.Context().Value(auth.ActorIDKey{}).(string)
if actorID == "" {
actorID = "unknown"
}
actorTypeRaw, _ := r.Context().Value(auth.ActorTypeKey{}).(string)
actorType := domain.ActorType(actorTypeRaw)
if actorType == "" {
actorType = domain.ActorTypeAPIKey
}
_ = h.auditWriter.RecordEventWithCategory(
ctx, actorID, actorType,
"auth.demo_residual_grants_cleaned",
domain.EventCategoryAuth,
"actor_roles", authdomain.DemoAnonActorID,
map[string]interface{}{"removed": removed},
)
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(demoResidualCleanupResponse{Removed: removed})
}
// ErrDemoResidualNotConfigured is returned by callers that probe the
// handler's wiring state. Currently unused outside tests but exported
// to keep the contract observable for documentation purposes.
var ErrDemoResidualNotConfigured = errors.New("demo-residual cleanup not configured")
+229
View File
@@ -0,0 +1,229 @@
package handler
import (
"context"
"encoding/json"
"errors"
"net/http"
"net/http/httptest"
"strings"
"sync/atomic"
"testing"
"github.com/certctl-io/certctl/internal/auth"
"github.com/certctl-io/certctl/internal/domain"
)
// Audit 2026-05-11 A-8 — DemoResidualHandler regression coverage.
// Uses fake closures for the cleanup + authType deps so the test
// stays stdlib + httptest only (no DB needed). DB-shape coverage
// lives in cmd/server/preflight_demo_residual_test.go.
func fakeAuthType(s string) func() string { return func() string { return s } }
// fakeAuditWriter captures the last RecordEventWithCategory invocation.
type fakeAuditWriter struct {
called atomic.Bool
lastCall struct {
actor, action, category, resourceType, resourceID string
details map[string]interface{}
}
}
func (f *fakeAuditWriter) RecordEventWithCategory(
ctx context.Context, actor string, actorType domain.ActorType,
action, eventCategory, resourceType, resourceID string,
details map[string]interface{},
) error {
f.called.Store(true)
f.lastCall.actor = actor
f.lastCall.action = action
f.lastCall.category = eventCategory
f.lastCall.resourceType = resourceType
f.lastCall.resourceID = resourceID
f.lastCall.details = details
return nil
}
func authCtxReq(method, path string, actor string) *http.Request {
req := httptest.NewRequest(method, path, nil)
ctx := context.WithValue(req.Context(), auth.ActorIDKey{}, actor)
ctx = context.WithValue(ctx, auth.ActorTypeKey{}, string(domain.ActorTypeAPIKey))
return req.WithContext(ctx)
}
// TestDemoResidualCleanup_HappyPath — fake cleanup returns 3 rows
// removed; handler emits 200 + JSON body {removed:3} + audit row.
func TestDemoResidualCleanup_HappyPath(t *testing.T) {
audit := &fakeAuditWriter{}
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) { return 3, nil },
fakeAuthType("api-key"),
audit,
)
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusOK {
t.Fatalf("status = %d, want 200; body=%s", rec.Code, rec.Body.String())
}
var body demoResidualCleanupResponse
if err := json.Unmarshal(rec.Body.Bytes(), &body); err != nil {
t.Fatalf("decode body: %v", err)
}
if body.Removed != 3 {
t.Errorf("removed = %d, want 3", body.Removed)
}
// Audit row must be emitted with the right category + caller actor.
if !audit.called.Load() {
t.Fatal("expected audit RecordEventWithCategory to be called")
}
if audit.lastCall.action != "auth.demo_residual_grants_cleaned" {
t.Errorf("audit action = %q, want auth.demo_residual_grants_cleaned", audit.lastCall.action)
}
if audit.lastCall.category != domain.EventCategoryAuth {
t.Errorf("audit category = %q, want %q", audit.lastCall.category, domain.EventCategoryAuth)
}
if audit.lastCall.actor != "k-admin" {
t.Errorf("audit actor = %q, want k-admin", audit.lastCall.actor)
}
if audit.lastCall.resourceID != "actor-demo-anon" {
t.Errorf("audit resource_id = %q, want actor-demo-anon", audit.lastCall.resourceID)
}
if got, ok := audit.lastCall.details["removed"].(int64); !ok || got != 3 {
t.Errorf("audit details.removed = %v, want 3", audit.lastCall.details["removed"])
}
}
// TestDemoResidualCleanup_Idempotent_ReturnsZero — fake cleanup returns
// (0, nil); the handler still emits 200 + body {removed:0} + audit.
func TestDemoResidualCleanup_Idempotent_ReturnsZero(t *testing.T) {
audit := &fakeAuditWriter{}
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) { return 0, nil },
fakeAuthType("api-key"),
audit,
)
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusOK {
t.Fatalf("status = %d, want 200", rec.Code)
}
var body demoResidualCleanupResponse
if err := json.Unmarshal(rec.Body.Bytes(), &body); err != nil {
t.Fatalf("decode body: %v", err)
}
if body.Removed != 0 {
t.Errorf("removed = %d, want 0", body.Removed)
}
// Audit row should STILL fire on a no-op cleanup so the operator's
// action is recorded. This is intentional — the cleanup endpoint is
// admin-class and every invocation should leave a trail.
if !audit.called.Load() {
t.Error("audit row must fire even on no-op cleanup")
}
}
// TestDemoResidualCleanup_RejectsInDemoMode — Auth.Type=none returns 503.
func TestDemoResidualCleanup_RejectsInDemoMode(t *testing.T) {
audit := &fakeAuditWriter{}
var cleanupCalled atomic.Bool
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) {
cleanupCalled.Store(true)
return 0, nil
},
fakeAuthType("none"),
audit,
)
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusServiceUnavailable {
t.Fatalf("status = %d, want 503; body=%s", rec.Code, rec.Body.String())
}
if !strings.Contains(rec.Body.String(), "demo mode") {
t.Errorf("body = %q, want mention of demo mode", rec.Body.String())
}
// The cleanup closure must NOT have been called.
if cleanupCalled.Load() {
t.Error("cleanup closure called despite demo-mode reject")
}
// No audit row should fire on rejection — the action didn't happen.
if audit.called.Load() {
t.Error("audit row fired on rejected cleanup; should not")
}
}
// TestDemoResidualCleanup_CleanupError_Surfaces500 — cleanup func
// returns an error; handler emits 500.
func TestDemoResidualCleanup_CleanupError_Surfaces500(t *testing.T) {
audit := &fakeAuditWriter{}
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) { return 0, errors.New("boom") },
fakeAuthType("api-key"),
audit,
)
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusInternalServerError {
t.Fatalf("status = %d, want 500", rec.Code)
}
if audit.called.Load() {
t.Error("audit row fired on cleanup error; should not")
}
}
// TestDemoResidualCleanup_NilCleanupFn — handler with no wired
// cleanup returns 500 (defensive — should never happen in prod, but
// the contract should be observable).
func TestDemoResidualCleanup_NilCleanupFn(t *testing.T) {
h := DemoResidualHandler{cleanup: nil, authType: fakeAuthType("api-key")}
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusInternalServerError {
t.Fatalf("status = %d, want 500", rec.Code)
}
}
// TestDemoResidualCleanup_NilAuditWriter_DoesNotPanic — audit is
// optional (Bundle-2 wiring may set it nil in tests / minimal configs).
// Handler must still succeed with valid cleanup.
func TestDemoResidualCleanup_NilAuditWriter_DoesNotPanic(t *testing.T) {
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) { return 1, nil },
fakeAuthType("api-key"),
nil,
)
rec := httptest.NewRecorder()
h.Cleanup(rec, authCtxReq(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", "k-admin"))
if rec.Code != http.StatusOK {
t.Fatalf("status = %d, want 200", rec.Code)
}
}
// TestDemoResidualCleanup_MissingActorContext — caller without
// ActorIDKey gets "unknown" recorded; the cleanup still runs. The
// rbacGate at the router enforces that authenticated callers reach
// this point, so missing actor context is purely a test-shape thing.
func TestDemoResidualCleanup_MissingActorContext(t *testing.T) {
audit := &fakeAuditWriter{}
h := NewDemoResidualHandler(
func(ctx context.Context) (int64, error) { return 1, nil },
fakeAuthType("api-key"),
audit,
)
rec := httptest.NewRecorder()
// No auth context — bare httptest.NewRequest.
h.Cleanup(rec, httptest.NewRequest(http.MethodPost, "/api/v1/auth/demo-residual/cleanup", nil))
if rec.Code != http.StatusOK {
t.Fatalf("status = %d, want 200", rec.Code)
}
if audit.lastCall.actor != "unknown" {
t.Errorf("audit actor = %q, want unknown for missing actor context", audit.lastCall.actor)
}
}