fix(deploy): Hotfix #18 — apt-get retry loop in libest Dockerfile (transient mirror flake)

CI image-and-supply-chain job failed building deploy/test/libest/ Dockerfile: Get:62 http://deb.debian.org/debian bullseye/main amd64 libssh2-1 amd64 1.9.0-2+deb11u1 [156 kB] Err:62 http://deb.debian.org/debian bullseye/main amd64 libssh2-1 amd64 1.9.0-2+deb11u1 Error reading from server - read (104: Connection reset by peer) [IP: 151.101.202.132 80] E: Failed to fetch http://deb.debian.org/debian/pool/main/libs/ libssh2/libssh2-1_1.9.0-2%2bdeb11u1_amd64.deb E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? Root cause: Transient TCP reset from fastly's Debian mirror at 151.101.202.132 mid-fetch of one of 73 packages. Mirrors flake; the apt error message itself suggests "--fix-missing." This was NOT a code regression — the build sequence completed Dockerfile (main server), Dockerfile.agent, and f5-mock-icontrol/Dockerfile cleanly before hitting the flake on the 4th and final Dockerfile. The Go + npm steps for the main image all succeeded. The main Dockerfile already wraps `npm ci` in a 3-retry loop (Hotfix #9 from the Storybook lockfile saga; npm registry has the same flake profile as Debian mirrors). The libest Dockerfile's two apt-get install sites (builder stage line 85, runtime stage line 189) had no such wrapping. Fix: Wrap both apt-get install invocations in a 3-retry loop matching the main Dockerfile's npm-ci pattern. Each retry runs `apt-get update && apt-get install --fix-missing ...`, exits the loop on success, sleeps 5s between attempts. After 3 failed attempts the build fails (preserves CI's signal for a genuinely broken mirror state). --fix-missing telling apt to continue past temporarily-missing packages on subsequent retries; combined with the update + sleep, the 3-attempt loop covers the typical mirror-flake window (~30-60s of churn before another mirror takes over). Both apt-get sites in the libest Dockerfile get the same treatment (builder + runtime). The two are independent install operations so failure in one is independent of the other. Verification (sandbox): • Visual diff of both apt-get blocks — consistent retry shape + --fix-missing + error message + sleep cadence • No Go-side code touched; this is a pure CI-infrastructure Dockerfile change • Other Dockerfiles in the repo (main + agent + f5-mock-icontrol) don't need this fix today; the main Dockerfile already has the retry loop for npm ci, and agent + f5-mock use Alpine `apk` which has its own retry semantics Ground-truth: origin/master tip 7268d12 (FE-M6 just pushed) verified via GitHub API BEFORE commit. Falsifiable proof for the next CI run: the image-and-supply-chain job's libest build should either succeed on first attempt OR retry through the flake automatically. The expected outcome is a green build; a real broken-mirror state would still fail after 3 attempts (which is the right signal).
fix(web): Hotfix #17 — skip backend-dependent e2e specs in CI (e2e.yml turns green)
2026-06-07 22:31:36 +00:00 · 2026-05-14 20:57:24 +00:00 · 2026-05-14 20:54:43 +00:00 · 2026-05-14 20:40:55 +00:00 · 2026-05-14 20:14:26 +00:00 · 2026-05-14 20:04:25 +00:00
879 changed files with 106327 additions and 23900 deletions
@@ -7,7 +7,7 @@
 # ==============================================================================
 POSTGRES_DB=certctl
 POSTGRES_USER=certctl
-POSTGRES_PASSWORD=change-me-in-production
+POSTGRES_PASSWORD=replace-with-openssl-rand-hex-32

 # ==============================================================================
 # Certctl Server
@@ -24,24 +24,45 @@ POSTGRES_PASSWORD=change-me-in-production
 # seeds pg_authid on first boot of an empty volume. See docs/quickstart.md
 # "Warning" callout and `internal/repository/postgres/db.go::wrapPingError`
 # for the SQLSTATE 28P01 diagnostic that fires when the two drift.
-CERTCTL_DATABASE_URL=postgres://certctl:change-me-in-production@postgres:5432/certctl?sslmode=disable
+CERTCTL_DATABASE_URL=postgres://certctl:replace-with-openssl-rand-hex-32@postgres:5432/certctl?sslmode=disable
 CERTCTL_SERVER_HOST=0.0.0.0
 CERTCTL_SERVER_PORT=8443
 CERTCTL_LOG_LEVEL=info
 CERTCTL_LOG_FORMAT=json

-# Auth type: "api-key" (production) or "none" (demo/development).
-# For JWT/OIDC, run an authenticating gateway in front of certctl
-# (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and
-# set CERTCTL_AUTH_TYPE=none on the upstream — see
-# docs/architecture.md "Authenticating-gateway pattern". G-1 removed
-# the in-process "jwt" option (no JWT middleware shipped — silent auth
-# downgrade); see docs/upgrade-to-v2-jwt-removal.md if you previously
-# set CERTCTL_AUTH_TYPE=jwt.
-CERTCTL_AUTH_TYPE=none
-# Required when CERTCTL_AUTH_TYPE is "api-key".
-# Generate with: openssl rand -base64 32
-# CERTCTL_AUTH_SECRET=change-me-in-production
+# Auth type: "api-key" (production), "none" (demo/development), or
+# "oidc" (Auth Bundle 2 - native OIDC SSO via coreos/go-oidc/v3, ships
+# in Bundle 2 phases 5+6; setting CERTCTL_AUTH_TYPE=oidc on a build
+# without Bundle 2 wired triggers a clear refuse-to-start error rather
+# than a silent fallback to api-key). For JWT / SAML / LDAP, continue to
+# run an authenticating gateway in front of certctl (oauth2-proxy /
+# Envoy ext_authz / Traefik ForwardAuth / Pomerium) and set
+# CERTCTL_AUTH_TYPE=none on the upstream - see docs/architecture.md
+# "Authenticating-gateway pattern". G-1 removed the in-process "jwt"
+# option (no JWT middleware shipped - silent auth downgrade); see
+# docs/upgrade-to-v2-jwt-removal.md if you previously set
+# CERTCTL_AUTH_TYPE=jwt.
+#
+# Bundle 2 closure (2026-05-12): the docker-compose base file no longer
+# defaults to AUTH_TYPE=none. The base ships production-shaped; the demo
+# overlay (deploy/docker-compose.demo.yml) flips this baseline into the
+# populated-dashboard demo path.
+CERTCTL_AUTH_TYPE=api-key
+# Required when CERTCTL_AUTH_TYPE is "api-key". Generate with:
+#   openssl rand -base64 32
+# The Bundle 2 fail-closed Validate() REFUSES TO START if this value
+# equals the placeholder string "change-me-in-production" outside of
+# demo mode (CERTCTL_DEMO_MODE_ACK=true).
+CERTCTL_AUTH_SECRET=replace-with-openssl-rand-base64-32
+
+# Bundle 2 closure: AES-256-GCM key for encrypting issuer/target config
+# secrets at rest. Required for any deployment that uses the dynamic
+# config GUI to store issuer credentials. Generate with:
+#   openssl rand -base64 32
+# Minimum 32 bytes. The Bundle 2 fail-closed Validate() REFUSES TO
+# START if this value equals the placeholder string
+# "change-me-32-char-encryption-key" outside of demo mode.
+CERTCTL_CONFIG_ENCRYPTION_KEY=replace-with-openssl-rand-base64-32

 # ==============================================================================
 # Certctl Agent
@@ -50,8 +71,14 @@ CERTCTL_AUTH_TYPE=none
 # startup. Use the docker-compose self-signed bootstrap CA bundle from
 # `deploy/test/certs/ca.crt` or supply your own via CERTCTL_SERVER_CA_BUNDLE_PATH.
 CERTCTL_SERVER_URL=https://localhost:8443
-CERTCTL_API_KEY=change-me-in-production
+# Matches one of the server's CERTCTL_AUTH_SECRET rotation values. The
+# placeholder is rejected outside demo mode (Bundle 2 fail-closed guard).
+CERTCTL_API_KEY=replace-with-openssl-rand-base64-32
 CERTCTL_AGENT_NAME=local-agent
+# Returned from `POST /api/v1/agents` during agent enrollment. The agent
+# fail-fasts at startup with "agent-id flag or CERTCTL_AGENT_ID env var
+# is required" if this is unset.
+# CERTCTL_AGENT_ID=agent-from-registration-response

 # ==============================================================================
 # Optional: Scheduler Tuning (defaults are usually fine)
@@ -76,3 +76,154 @@ internal/mcp:
    Bundle K / Coverage-Audit C-002 — MCP per-tool dispatch via
    in-memory transport lifts package from 28.0% to 93.1% (per-
    package run). Floor at 85.
+
+internal/auth:
+  floor: 85
+  why: |
+    Bundle 1 Phase 12 — RBAC primitive coverage gate.
+    internal/auth ships keystore + middleware + RequirePermission +
+    bootstrap + the Phase-3 context keys + the protocol-endpoint
+    allowlist. Negative-test coverage (no actor → 401, no role →
+    403, wrong scope → 403, bootstrap-token-wrong → 401, bootstrap-
+    used-twice → 410, admin-already-exists → 410, zero-length token
+    rejection) is now in place. Prescribed Bundle 1 target was 90;
+    held at 85 to absorb the per-file-average dip from the
+    middleware shim files (testfixtures.go) which CI runs but only
+    test fixtures exercise. Sub-package internal/auth/bootstrap
+    inherits this floor.
+
+internal/service/auth:
+  floor: 85
+  why: |
+    Bundle 1 Phase 12 — RBAC service-layer coverage gate.
+    PermissionService + RoleService + ActorRoleService + Authorizer
+    each have positive + negative tests covering the
+    privilege-escalation guard (auth.role.assign required for
+    Grant/Revoke), the reserved-actor invariant (actor-demo-anon
+    cannot be mutated), the canonical-permission validation, the
+    role-in-use guard on Delete, and every sentinel-error path
+    (ErrUnauthenticated / ErrForbidden / ErrSelfRoleAssignment /
+    ErrAuthReservedActor / ErrAuthUnknownPermission /
+    ErrAuthRoleInUse).
+
+internal/auth/oidc:
+  floor: 90
+  why: |
+    Bundle 2 Phase 3 — OIDC service coverage gate. Phase 3 spec
+    pins the floor at 90 explicitly because every fail-closed
+    branch is load-bearing for the security posture: alg pinning
+    (deny-list HS*/none + allow-list RS*/ES*/EdDSA), audience
+    re-check, azp enforcement on multi-aud tokens, at_hash
+    REQUIRED-when-access-token-present (Phase 3 lifts the OIDC
+    core "MAY" to a service-level "MUST"), iat-window window,
+    nonce constant-time-compare, single-use state replay defense,
+    PKCE-S256 mandatory, IdP downgrade-attack defense at
+    provider-load + RefreshKeys time, JWKS-fail-closed semantics,
+    group-claim resolution + userinfo-fallback fail-closed
+    semantics, token-leak hygiene. A regression in any one of
+    these branches is a security incident; the floor catches it
+    before the commit lands. The mock-IdP fixture in
+    service_test.go is the load-bearing harness.
+
+internal/auth/oidc/groupclaim:
+  floor: 95
+  why: |
+    Bundle 2 Phase 3 — group-claim resolver. Hand-rolled (no
+    JSON-path dep per Decision 10); ~150 LOC, every branch
+    exercised by 19 unit tests covering the documented IdP shapes
+    (Okta string array, Keycloak realm_access.roles, Auth0
+    namespaced URL claim, single-string normalization,
+    deeply-nested 3-segment walks) plus every fail-closed branch
+    (empty path, missing key, missing nested key, non-object
+    intermediate, bool/number/object/nil values, array with
+    non-string element, URL-shape with dots-in-path treated as
+    literal). Resolver should be at 100%; floor at 95 leaves a
+    1-statement margin for future error-message refactors.
+
+internal/auth/oidc/domain:
+  floor: 90
+  why: |
+    Bundle 2 Phase 1 — OIDCProvider + GroupRoleMapping domain.
+    Validation-heavy package; constructors + Validate methods
+    cover all canonical IdP shapes (Okta / Azure AD / Google
+    Workspace / Keycloak / Authentik / Auth0). Floor at 90 to
+    catch any future field that ships without a validator.
+
+internal/auth/session:
+  floor: 90
+  why: |
+    Bundle 2 Phase 4 — session lifecycle service. Phase 4 spec
+    pins the floor at 90 because every fail-closed branch carries
+    a security invariant: HMAC-SHA256 cookie signing with a
+    LENGTH-PREFIXED canonical input (defeats the
+    `<a, bc>`-vs-`<ab, c>` concatenation collision attack on the
+    bare-concat form), v1. version-prefix lock, idle expiry,
+    absolute expiry, revocation, retired-but-in-retention key
+    success path, retired-past-retention failure path, CSRF
+    constant-time compare against the SHA-256-hashed copy on the
+    session row, optional IP/UA-bind defense-in-depth gates,
+    fail-fatal initial-key bootstrap. A regression in any one of
+    these branches is a security incident; the floor catches it
+    before the commit lands. The 15-case negative-test matrix in
+    service_test.go is the load-bearing harness; the in-memory
+    stubs of SessionRepo + SigningKeyRepo + AuditRecorder let the
+    state machine be exercised without the postgres testcontainer
+    overhead (which Phase 2's integration tests already cover).
+
+internal/auth/session/domain:
+  floor: 90
+  why: |
+    Bundle 2 Phase 1 — Session + SessionSigningKey domain. Both
+    types ship Validate() with full invariant coverage: ID prefix
+    enforcement (ses-/sk-), expiry-order CHECK (absolute > idle >
+    created), CSRFTokenHash format pin (64 lowercase hex chars),
+    KeyMaterialEncrypted non-empty, retired-before-created
+    rejection, TenantID defaulting. Cookie naming constants are
+    pinned by TestCookieNamingConstants because the GUI's
+    web/src/api/client.ts will read `certctl_csrf` by string.
+    Floor at 90 to catch any future field that ships without a
+    validator.
+
+internal/auth/breakglass:
+  floor: 90
+  why: |
+    Bundle 2 Phase 7.5 — break-glass admin service (Argon2id +
+    lockout state machine + constant-time-via-verifyDummy). Phase
+    13 Pre-merge audit: floor at 90 with no carve-out. Phase 7.5
+    spec ships the package at 91.5%, validated by 8 mandated
+    negatives + ~12 coverage-lift tests. Every fail-closed branch
+    is load-bearing for the security surface (default-OFF posture
+    only matters if every "disabled" path returns ErrDisabled
+    BEFORE any DB lookup; constant-time defense only matters if
+    every path goes through verifyDummy on the no-credential leg).
+    A regression that drops a fail-closed branch's coverage below
+    90 is a real security risk — gate trips, operator audits.
+
+internal/auth/breakglass/domain:
+  floor: 90
+  why: |
+    Bundle 2 Phase 1 — BreakglassCredential domain. Argon2id PHC
+    format pinned ($argon2id$ prefix), MinPasswordLengthBytes (12)
+    + MaxPasswordLengthBytes (256) constants pinned by dedicated
+    test, IsLocked(now) state machine helper. The package ships
+    at 100% coverage; floor at 90 is the standing-room floor for
+    any future field added without a validator.
+
+internal/auth/user/domain:
+  floor: 90
+  why: |
+    Bundle 2 Phase 1 — User domain (federated-human identity).
+    OIDCSubject + OIDCProviderID unique-index per the Phase 2
+    schema, WebAuthnCredentials JSONB reserved for v3, Validate()
+    enforces every on-disk invariant. The package ships at 96.4%
+    coverage. Floor at 90 to catch any future field added without
+    a validator.
+
+    Phase 13 prompt explicitly enumerates internal/auth/user/ at
+    floor 90. The parent (non-domain) directory has no Go source —
+    the user upsert lives in internal/auth/oidc/service.go alongside
+    group resolution + role mapping (cohesive sequence within the
+    OIDC callback). Splitting upsertUser into a separate
+    internal/auth/user/ service package would harm cohesion without
+    adding test value; the domain layer's invariant coverage is
+    where the floor actually applies.
@@ -14,12 +14,17 @@ jobs:
    name: Go Build & Test
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Set up Go
-        uses: actions/setup-go@v5
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
-          go-version: '1.25.9'
+          go-version: '1.25.10'
+          # Phase 3 TEST-L1 closure (2026-05-13): enable Go's module +
+          # build cache so re-runs hit the cache instead of recompiling
+          # the world. setup-go v5 cache: true by default; making it
+          # explicit so a future setup-go upgrade can't silently flip it.
+          cache: true

      - name: Go Build
        run: |
@@ -79,7 +84,7 @@ jobs:
        # does call, this step fails the build until either upstream
        # ships a fix OR we cut the dep. Deferred-call advisories that
        # legitimately can't be remediated yet should be added to the
-        # NIST SSDF deviation log in docs/security.md, not silenced here.
+        # NIST SSDF deviation log in docs/operator/security.md, not silenced here.
        run: govulncheck ./...

      - name: Install staticcheck (Bundle-7 / D-001)
@@ -103,11 +108,41 @@ jobs:
        run: staticcheck ./...

      - name: Race Detection
-        run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/crypto/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
+        # Phase 3 TEST-H1 closure (2026-05-13): the pre-Phase-3 invocation
+        # listed 9 explicit package roots, excluding internal/auth/*,
+        # internal/repository/*, internal/mcp, internal/scep, internal/pkcs7,
+        # internal/api/router, internal/api/acme, internal/cli, internal/cms,
+        # internal/config, internal/deploy, internal/integration,
+        # internal/ratelimit, internal/secret, internal/trustanchor, plus
+        # all of cmd/. Audit finding TEST-H1 flagged this as silent
+        # race-detection drift — packages added after the original list
+        # was authored were never covered.
+        #
+        # Post-Phase-3: ./... with -short. The 76 testing.Short() guards
+        # already in the integration-test surface (testcontainers, live-DB,
+        # multi-process) gate behind this flag, so race detection runs
+        # across every package without dragging in long-running suites.
+        # Timeout doubled from 300s to 600s because ./... is broader; the
+        # broader scope is what makes race coverage trustworthy.
+        run: go test -race -short ./... -count=1 -timeout 600s

      - name: Go Test with Coverage
+        # internal/ciparity/... — post-v2.1.0 anti-rot item 2 surface-
+        # parity tests; stdlib-only so they always pass in this job.
        run: |
-          go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
+          go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/api/router/... ./internal/auth/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... ./internal/ciparity/... -count=1 -cover -coverprofile=coverage.out
+
+      - name: Multi-replica rate-limit integration test (Phase 13 Sprint 13.2/13.3 — ARCH-M1 closure proof)
+        # The falsifiable proof that CERTCTL_RATE_LIMIT_BACKEND=postgres
+        # enforces caps cluster-wide. testcontainers-go spins one
+        # Postgres container; 3 *PostgresSlidingWindowLimiter instances
+        # share it; 100 concurrent Allow("test-key") with cap=10 must
+        # see exactly 10 succeed + 90 ErrRateLimited. Failure here =
+        # the row-lock arbitration broke; ARCH-M1 closure is invalid.
+        run: |
+          go test -tags=integration -race -count=1 -timeout=300s \
+              -run TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas \
+              ./internal/integration/...

      - name: Check Coverage Thresholds
        # ci-pipeline-cleanup Phase 2: per-package floors moved to
@@ -118,7 +153,7 @@ jobs:
        run: bash scripts/check-coverage-thresholds.sh

      - name: Upload Coverage Report
-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
        with:
          name: go-coverage
          path: coverage.out
@@ -135,62 +170,6 @@ jobs:
          GITHUB_REPOSITORY: ${{ github.repository }}
        run: bash scripts/coverage-pr-comment.sh

-      # Bundle P / Strengthening #6 — QA-doc drift guards. Forces every PR
-      # that adds a Part to docs/testing-guide.md OR a seed row to
-      # migrations/seed_demo.sql to keep docs/qa-test-guide.md in sync. This
-      # eliminates the doc-drift class structurally — the symptom Bundle I
-      # had to clean up by hand becomes a CI-time error going forward.
-      - name: QA-doc Part-count drift guard
-        run: |
-          set -e
-          DOC_PARTS=$(grep -oE '49 of [0-9]+ Parts' docs/qa-test-guide.md | grep -oE '[0-9]+' | tail -1)
-          GUIDE_PARTS=$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md)
-          if [ -z "$DOC_PARTS" ]; then
-            echo "::error::Could not extract Part count from docs/qa-test-guide.md headline."
-            echo "  Expected pattern: '49 of <N> Parts'"
-            exit 1
-          fi
-          if [ "$DOC_PARTS" != "$GUIDE_PARTS" ]; then
-            echo "::error::DRIFT — qa-test-guide.md headline claims $DOC_PARTS Parts; testing-guide.md has $GUIDE_PARTS Parts."
-            echo "  Update docs/qa-test-guide.md to match. Bundle I patched this once;"
-            echo "  Bundle P added this guard so the drift cannot recur silently."
-            exit 1
-          fi
-          echo "QA-doc Part-count drift guard: clean ($DOC_PARTS == $GUIDE_PARTS)."
-
-      - name: QA-doc seed-count drift guard
-        run: |
-          set -e
-          # Seed-cert count: agnostic to documented header format. The current
-          # documented count lives in `### Certificates (32 total in ...` —
-          # extract the first integer in that header.
-          DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
-          # Authoritative count: unique mc-* IDs in seed_demo.sql.
-          SEED_CERTS=$(grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
-          if [ -z "$DOC_CERTS" ]; then
-            echo "::warning::Could not extract documented cert count from docs/qa-test-guide.md."
-            echo "  Skipping cert-count drift check (header format may have changed)."
-          elif [ "$DOC_CERTS" != "$SEED_CERTS" ]; then
-            echo "::error::DRIFT — qa-test-guide.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
-            echo "  Update docs/qa-test-guide.md::Seed Data Reference to match."
-            exit 1
-          fi
-          # Issuers: seed-table count vs doc claim.
-          DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
-          # Authoritative: unique iss-* IDs (close enough proxy; the issuers
-          # table count IS the unique-ID count for this prefix).
-          SEED_ISS=$(grep -oE 'iss-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
-          if [ -z "$DOC_ISS" ]; then
-            echo "::warning::Could not extract documented issuer count."
-          elif [ "$DOC_ISS" != "$SEED_ISS" ] && [ "$((SEED_ISS - DOC_ISS))" -gt 5 ]; then
-            # Allow up to 5pp slack — iss-* IDs appear in audit_events and
-            # other reference tables that aren't issuer-table rows. Drift
-            # only flags when the spread grows large.
-            echo "::error::DRIFT — qa-test-guide.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
-            exit 1
-          fi
-          echo "QA-doc seed-count drift guard: clean."
-
      # Bundle Q / I-001 closure — test-naming convention guard (informational).
      # The convention is `Test<Func>_<Scenario>_<ExpectedResult>`. This step
      # prints any non-conformant tests but does NOT fail the build until the
@@ -207,9 +186,17 @@ jobs:
      # internal scenarios expressed via `t.Run` subtests. Requiring the
      # underscore-Scenario-Result triple repo-wide would mean renaming
      # 167 legitimate tests for no observable behavior change. The
-      # Test<Func>_<Scenario>_<ExpectedResult> form remains documented as
-      # the recommended pattern for parameterized scenarios in
-      # docs/qa-test-guide.md, but is not gated.
+      # Test<Func>_<Scenario>_<ExpectedResult> form remains the
+      # recommended pattern for parameterized scenarios, but is not gated.
+      # Phase 4 DEPL-* prerequisite (2026-05-14): helm-templates-lint.sh
+      # needs the `helm` CLI on PATH to run helm lint + helm template
+      # against the chart. The official azure/setup-helm action installs
+      # a SHA-pinned helm binary into the runner.
+      - name: Install Helm (for helm-templates-lint guard)
+        uses: azure/setup-helm@b9e51907a09c216f16ebe8536097933489208112  # v4.3.0
+        with:
+          version: v3.16.0
+
      - name: Regression guards (extracted to scripts/ci-guards/)
        # All named regression guards live at scripts/ci-guards/<id>.sh per
        # ci-pipeline-cleanup bundle Phase 1. Each guard is callable locally:
@@ -217,6 +204,7 @@ jobs:
        # Adding a new guard: drop a new <id>.sh; this loop auto-picks it up.
        # Contract: each guard MUST exit 0 on clean repo, non-zero with
        # ::error:: prefix on regression. See scripts/ci-guards/README.md.
+        #
        run: |
          set -e
          fail=0
@@ -229,14 +217,216 @@ jobs:
          done
          exit $fail

+  cross-platform-build:
+    # Phase 3 TEST-H2 closure (2026-05-13): the pre-Phase-3 CI ran
+    # exclusively on ubuntu-latest, leaving Windows-specific bugs
+    # (path separators, file permissions, exec.Command semantics)
+    # undetected. The agent + CLI binaries ship for Windows + macOS
+    # users; this matrix asserts they at least BUILD on every OS we
+    # claim to support.
+    #
+    # Build-only — no test run. Full test parity across OSes is a
+    # larger investment (testcontainers is Linux-only on Windows CI
+    # runners, file-permission tests differ, etc.). The build gate
+    # is the minimum that catches the cross-platform regressions
+    # we've seen in practice.
+    name: Cross-platform build (ubuntu / windows / macos)
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, windows-latest, macos-latest]
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+
+      - name: Set up Go
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
+        with:
+          go-version: '1.25.10'
+          cache: true
+
+      - name: Build server + agent + CLI + mcp-server
+        run: |
+          go build ./cmd/server
+          go build ./cmd/agent
+          go build ./cmd/cli
+          go build ./cmd/mcp-server
+
+  cold-db-compose-smoke:
+    # Per post-v2.1.0 anti-rot item 6 (Auditable Codebase Bundle).
+    #
+    # Catches migration-on-cold-DB regressions: wipe the postgres
+    # volume, bring the stack up cold, mint a day-0 admin, issue +
+    # renew + revoke a test certificate, assert audit rows, tear down.
+    # Targets the bug class that the warm-DB integration suite misses
+    # (canonical case: 2026-05-09 migration 000045 broken INSERT,
+    # fixed in commit 6444e13).
+    name: Cold-DB compose smoke
+    runs-on: ubuntu-latest
+    needs: go-build-and-test
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+
+      - name: Show Docker versions
+        run: |
+          docker --version
+          docker compose version
+
+      - name: Cold-DB compose smoke
+        # The smoke deliberately focuses on the bug class that ONLY a
+        # cold boot can catch: stack-startup correctness against a
+        # blank database. It is intentionally NOT a functional API
+        # walkthrough — the integration test suite under
+        # 'Go Test with Coverage' already covers issue / renew /
+        # revoke / audit-row plumbing against a warm DB.
+        #
+        # The bugs this gate is uniquely positioned to catch:
+        #   - Missing required env vars that fail Config.Validate()
+        #     at startup (e.g. CERTCTL_DEMO_MODE_ACK gap, 2026-05-12).
+        #   - Non-idempotent migrations that crash on the second boot
+        #     (e.g. migration 000043 CHECK constraint, 2026-05-12).
+        #   - Documented manual flows that don't work end-to-end on
+        #     a clean compose (e.g. CERTCTL_BOOTSTRAP_TOKEN
+        #     interpolation gap, 2026-05-12).
+        #
+        # Bugs OUTSIDE the scope of this smoke (covered elsewhere):
+        #   - API request/response contract changes (integration suite).
+        #   - Cert lifecycle correctness (integration suite + handler
+        #     tests).
+        #   - Audit row plumbing (handler tests).
+        #
+        # 10-min wall-clock cap covers cold image pull + compose-up +
+        # force-recreate + admin bootstrap + teardown. Increase only
+        # if the underlying steps legitimately grow.
+        #
+        # The smoke is inlined here on purpose — it is NOT a script in
+        # scripts/ci-guards/, because there is no value in a developer
+        # running this locally. The whole point of the gate is that CI
+        # owns the cold-DB state; the operator never has to remember to
+        # run it.
+        timeout-minutes: 10
+        working-directory: deploy
+        env:
+          STARTUP_TIMEOUT_SECONDS: 300
+        run: |
+          set -e
+          set -o pipefail
+
+          SERVER_URL="https://localhost:8443"
+          CACERT_PATH="${GITHUB_WORKSPACE}/deploy/test/certs/ca.crt"
+
+          log() { echo "[cold-db-smoke] $*"; }
+
+          wait_for_service_healthy() {
+            local svc="$1" deadline=$(( $(date +%s) + STARTUP_TIMEOUT_SECONDS ))
+            while [ "$(date +%s)" -lt "$deadline" ]; do
+              local state
+              state="$(docker compose ps --format json "$svc" 2>/dev/null | python3 -c '
+          import json, sys
+          try:
+              line = sys.stdin.read().strip()
+              if not line:
+                  print("not-up"); sys.exit(0)
+              rows = json.loads(line) if line.startswith("[") else [json.loads(l) for l in line.splitlines() if l.strip()]
+              if not rows:
+                  print("not-up")
+              else:
+                  print(rows[0].get("Health", rows[0].get("State", "?")))
+          except Exception as e:
+              print(f"err: {e}")
+          ')"
+              if [ "$state" = "healthy" ] || [ "$state" = "running" ]; then
+                log "  $svc → $state"; return 0
+              fi
+              sleep 2
+            done
+            log "  $svc did NOT reach healthy within ${STARTUP_TIMEOUT_SECONDS}s (last: $state)"
+            return 1
+          }
+
+          http_call() {
+            local method="$1" path="$2" data="${3:-}"
+            local args=(--silent --show-error --max-time 30 -X "$method" "$SERVER_URL$path")
+            [ -f "$CACERT_PATH" ] && args+=(--cacert "$CACERT_PATH") || args+=(--insecure)
+            [ -n "$data" ] && args+=(-H "Content-Type: application/json" -d "$data")
+            curl "${args[@]}"
+          }
+
+          # Bundle 2 closure (2026-05-12): the base compose is now
+          # production-shaped — auth=api-key + agent-keygen + fail-closed
+          # placeholder guards. The cold-DB smoke layers in the demo
+          # overlay so the boot path remains zero-config: the overlay
+          # supplies AUTH_TYPE=none + DEMO_MODE_ACK=true + the matching
+          # placeholder creds the fail-closed guards accept under
+          # DEMO_MODE_ACK. The agent service in the overlay also
+          # pre-seeds CERTCTL_AGENT_ID=agent-demo-1 so the bundled
+          # agent doesn't restart-loop. The smoke's purpose (catch
+          # migration-on-cold-DB regressions + verify bootstrap-token
+          # endpoint mints a day-0 admin against a freshly migrated
+          # schema) is orthogonal to whether the auth posture is
+          # demo-mode or api-key, so the overlay is acceptable here.
+          COMPOSE_FILES=(-f docker-compose.yml -f docker-compose.demo.yml)
+
+          # Phase 2 SEC-H3 (2026-05-13): the demo overlay sets
+          # CERTCTL_DEMO_MODE_ACK=true; the SEC-H3 fail-closed guard
+          # requires a paired CERTCTL_DEMO_MODE_ACK_TS within the last
+          # 24h (a static YAML value would rot). The overlay reads
+          # ${CERTCTL_DEMO_MODE_ACK_TS:-} from the shell, so we mint a
+          # fresh timestamp here and export it for every compose
+          # invocation in this job (initial up-d AND the force-recreate
+          # at step 4).
+          export CERTCTL_DEMO_MODE_ACK_TS="$(date +%s)"
+
+          log "1/4 down -v --remove-orphans"
+          docker compose "${COMPOSE_FILES[@]}" down -v --remove-orphans 2>&1 | tail -3 || true
+
+          log "2/4 up -d (cold boot)"
+          docker compose "${COMPOSE_FILES[@]}" up -d 2>&1 | tail -3
+
+          log "3/4 wait for healthchecks"
+          wait_for_service_healthy postgres
+          wait_for_service_healthy certctl-server
+          wait_for_service_healthy certctl-agent || log "  (agent skipped)"
+
+          log "4/4 minting day-0 admin (proves migration ladder + bootstrap path)"
+          TOKEN="$(openssl rand -base64 32 | tr -d '\n')"
+          {
+            echo "CERTCTL_BOOTSTRAP_TOKEN=$TOKEN"
+            # Re-emit the demo-mode ACK TS into the --env-file so the
+            # force-recreate at step 4 inherits it. `--env-file` REPLACES
+            # the shell-env source for variable interpolation on compose
+            # operations that use it, so omitting this line would re-trip
+            # the SEC-H3 guard.
+            echo "CERTCTL_DEMO_MODE_ACK_TS=$CERTCTL_DEMO_MODE_ACK_TS"
+          } > /tmp/_smoke.env
+          docker compose "${COMPOSE_FILES[@]}" --env-file /tmp/_smoke.env up -d --force-recreate certctl-server 2>&1 | tail -2
+          sleep 5
+          wait_for_service_healthy certctl-server
+          BODY="$(http_call POST /api/v1/auth/bootstrap "{\"token\":\"$TOKEN\",\"actor_name\":\"smoke-admin\"}")"
+          KEY="$(echo "$BODY" | python3 -c 'import json,sys; print(json.load(sys.stdin)["key_value"])')"
+          [ -n "$KEY" ] || { log "bootstrap failed: $BODY"; exit 1; }
+
+          log "PASS — cold boot + force-recreate + admin bootstrap all green"
+          log "tearing down"
+          docker compose "${COMPOSE_FILES[@]}" down -v 2>&1 | tail -2
+
+      - name: Dump compose logs on failure
+        if: failure()
+        working-directory: deploy
+        run: |
+          for svc in postgres certctl-server certctl-agent certctl-tls-init; do
+            echo "==== $svc ===="
+            docker compose -f docker-compose.yml -f docker-compose.demo.yml logs --no-color --tail 200 "$svc" || true
+          done
+
  frontend-build:
    name: Frontend Build
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Set up Node.js
-        uses: actions/setup-node@v4
+        uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
        with:
          node-version: '22'

@@ -244,6 +434,17 @@ jobs:
        working-directory: web
        run: npm ci

+      - name: npm audit (production deps, high+critical)
+        # Phase 1 TEST-L2 closure (2026-05-13):
+        # Production frontend dependencies must not carry high or
+        # critical CVEs. Dev-only deps (vitest, vite, eslint, etc.)
+        # are excluded via --omit=dev since they never ship to
+        # operators. If this gate fires, triage each finding via npm
+        # overrides, dep upgrade, or a tracked --ignore with an issue
+        # link. Do not mass-silence findings.
+        working-directory: web
+        run: npm audit --omit=dev --audit-level=high
+
      - name: TypeScript Check
        working-directory: web
        run: npx tsc --noEmit
@@ -279,26 +480,36 @@ jobs:
    name: Helm Chart Validation
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Install Helm
-        uses: azure/setup-helm@v4
+        uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4  # v4
        with:
          version: '3.13.0'

      # HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
      # configured. Every lint/template invocation below must pick exactly one
      # provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
-      # (certctl.tls.required) and docs/tls.md.
+      # (certctl.tls.required) and docs/operator/tls.md.
+      #
+      # Bundle 3 closure (2026-05-12, commit f1fa311): the chart now ALSO
+      # fails render when (a) server.auth.type=api-key + apiKey empty, or
+      # (b) postgresql.enabled=true + postgresql.auth.password empty.
+      # Every positive render below MUST pass both secrets; inverse tests
+      # at the bottom of this job pin the fail-fast guards in place.
      - name: Lint Helm Chart
        run: |
          helm lint deploy/helm/certctl/ \
-            --set server.tls.existingSecret=certctl-tls-ci
+            --set server.tls.existingSecret=certctl-tls-ci \
+            --set server.auth.apiKey=ci-api-key-placeholder \
+            --set postgresql.auth.password=ci-postgres-placeholder

      - name: Template Helm Chart (existingSecret mode)
        run: |
          helm template certctl deploy/helm/certctl/ \
            --set server.tls.existingSecret=certctl-tls-ci \
+            --set server.auth.apiKey=ci-api-key-placeholder \
+            --set postgresql.auth.password=ci-postgres-placeholder \
            > /dev/null

      - name: Template Helm Chart (cert-manager mode)
@@ -306,8 +517,30 @@ jobs:
          helm template certctl deploy/helm/certctl/ \
            --set server.tls.certManager.enabled=true \
            --set server.tls.certManager.issuerRef.name=letsencrypt-prod \
+            --set server.auth.apiKey=ci-api-key-placeholder \
+            --set postgresql.auth.password=ci-postgres-placeholder \
            > /dev/null

+      - name: Template Helm Chart (external Postgres mode — Bundle 3 D2)
+        run: |
+          # Closes Bundle 3 D2: postgresql.enabled=false must (a) render
+          # cleanly with externalDatabase.url and (b) emit ZERO postgres-*
+          # templates. The render output is grep-checked below.
+          out=$(helm template certctl deploy/helm/certctl/ \
+            --set server.tls.existingSecret=certctl-tls-ci \
+            --set postgresql.enabled=false \
+            --set externalDatabase.url='postgres://u:p@db.example.com:5432/certctl?sslmode=require' \
+            --set server.auth.apiKey=ci-api-key-placeholder)
+          # Bundled-Postgres resources must not appear when postgresql.enabled=false.
+          if echo "$out" | grep -qE "^kind: StatefulSet$"; then
+            echo "::error::Bundle 3 D2 regression: postgres StatefulSet rendered with postgresql.enabled=false"
+            exit 1
+          fi
+          if echo "$out" | grep -q "postgres-secret.yaml"; then
+            echo "::error::Bundle 3 D2 regression: postgres-secret rendered with postgresql.enabled=false"
+            exit 1
+          fi
+
      - name: Template Helm Chart (guard fails without TLS)
        run: |
          # Inverse test: the chart MUST refuse to render when no TLS source is
@@ -318,6 +551,58 @@ jobs:
            exit 1
          fi

+      - name: Template Helm Chart (guard fails — Bundle 3 D7 TLS both-set)
+        run: |
+          # Bundle 3 D7: setting BOTH existingSecret AND certManager.enabled
+          # creates two conflicting TLS sources of truth. Chart must refuse.
+          if helm template certctl deploy/helm/certctl/ \
+                --set server.tls.existingSecret=ci \
+                --set server.tls.certManager.enabled=true \
+                --set server.tls.certManager.issuerRef.name=foo \
+                --set server.auth.apiKey=k \
+                --set postgresql.auth.password=p \
+                > /dev/null 2>&1; then
+            echo "::error::Bundle 3 D7 regression: chart rendered with BOTH TLS sources configured"
+            exit 1
+          fi
+
+      - name: Template Helm Chart (guard fails — Bundle 3 D1 missing apiKey)
+        run: |
+          # Bundle 3 D1: missing server.auth.apiKey when auth.type=api-key
+          # must fail at template time, not silently render an empty Secret.
+          if helm template certctl deploy/helm/certctl/ \
+                --set server.tls.existingSecret=ci \
+                --set postgresql.auth.password=p \
+                > /dev/null 2>&1; then
+            echo "::error::Bundle 3 D1 regression: chart rendered with empty server.auth.apiKey"
+            exit 1
+          fi
+
+      - name: Template Helm Chart (guard fails — Bundle 3 D1 missing pg password)
+        run: |
+          # Bundle 3 D1: missing postgresql.auth.password when postgresql.enabled=true
+          # must fail at template time, not silently use a fallback default.
+          if helm template certctl deploy/helm/certctl/ \
+                --set server.tls.existingSecret=ci \
+                --set server.auth.apiKey=k \
+                > /dev/null 2>&1; then
+            echo "::error::Bundle 3 D1 regression: chart rendered with empty postgresql.auth.password"
+            exit 1
+          fi
+
+      - name: Template Helm Chart (guard fails — Bundle 3 D1 missing external DB URL)
+        run: |
+          # Bundle 3 D1: missing externalDatabase.url when postgresql.enabled=false
+          # must fail at template time.
+          if helm template certctl deploy/helm/certctl/ \
+                --set server.tls.existingSecret=ci \
+                --set postgresql.enabled=false \
+                --set server.auth.apiKey=k \
+                > /dev/null 2>&1; then
+            echo "::error::Bundle 3 D1 regression: chart rendered with postgresql.enabled=false + empty externalDatabase.url"
+            exit 1
+          fi
+
  # =============================================================================
  # deploy-vendor-e2e — single-job (collapsed from 12-job matrix)
  # =============================================================================
@@ -336,7 +621,7 @@ jobs:
  # RAM headroom on ubuntu-latest (16 GB ceiling) — operator-confirmed
  # in Phase 0 / frozen decision 0.14 prototype-branch run. If RAM
  # regresses, fall back to bucketed matrix per
-  # cowork/ci-pipeline-cleanup/decisions-revised.md.
+  # the project's frozen-decisions log.
  #
  # The Windows matrix (deploy-vendor-e2e-windows) was deleted entirely
  # per Phase 6 / frozen decision 0.5 (revises Bundle II decision 0.4).
@@ -348,12 +633,12 @@ jobs:
    needs: [go-build-and-test]
    timeout-minutes: 30
    steps:
-      - uses: actions/checkout@v5
+      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd  # v5

      - name: Set up Go
-        uses: actions/setup-go@v5
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
-          go-version: '1.25.9'
+          go-version: '1.25.10'
          cache: true

      - name: Build f5-mock-icontrol sidecar
@@ -445,12 +730,12 @@ jobs:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
-      - uses: actions/checkout@v5
+      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd  # v5

      - name: Set up Go
-        uses: actions/setup-go@v5
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
-          go-version: '1.25.9'
+          go-version: '1.25.10'
          cache: true

      - name: Digest validity (every @sha256 ref must resolve)
@@ -53,17 +53,17 @@ jobs:

    steps:
      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Set up Go
        if: matrix.language == 'go'
-        uses: actions/setup-go@v5
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
          # Match ci.yml + release.yml + security-deep-scan.yml.
-          go-version: '1.25.9'
+          go-version: '1.25.10'

      - name: Initialize CodeQL
-        uses: github/codeql-action/init@v3
+        uses: github/codeql-action/init@7fd177fa680c9881b53cdab4d346d32574c9f7f4  # v3
        with:
          languages: ${{ matrix.language }}
          # Use the security-and-quality query suite — security finds plus
@@ -72,10 +72,10 @@ jobs:
          queries: security-and-quality

      - name: Autobuild
-        uses: github/codeql-action/autobuild@v3
+        uses: github/codeql-action/autobuild@7fd177fa680c9881b53cdab4d346d32574c9f7f4  # v3

      - name: Perform CodeQL Analysis
-        uses: github/codeql-action/analyze@v3
+        uses: github/codeql-action/analyze@7fd177fa680c9881b53cdab4d346d32574c9f7f4  # v3
        with:
          category: "/language:${{ matrix.language }}"
          # SARIF upload is implicit (and is what populates the Security tab).
@@ -0,0 +1,108 @@
+# Phase 8 closure (TEST-H1 + TEST-H2): browser-driven E2E + visual
+# regression. Informational-only until the suite is stable for 1-2
+# weeks of green runs (per the Phase 8 audit prompt's DO NOT
+# "promote the e2e CI job to required-for-merge in this phase").
+#
+# The job is intentionally NOT in the merge gate. It runs on every
+# push to surface flakiness early; merge eligibility comes from
+# ci.yml's existing gates (Vitest, lint, build, the 34 CI guards).
+#
+# Once 1-2 weeks of green runs accumulate:
+#   1. Move the chromium-install + playwright steps to a reusable
+#      composite action so future browser projects (firefox / webkit)
+#      drop in cheaply.
+#   2. Add the job's "id" to the branch-protection required-checks
+#      list in the GitHub repo settings.
+#   3. Delete the "Informational" banner from this file's header.
+#
+# Visual regression: the 04-visual-regression.spec.ts file uses
+# Playwright `toHaveScreenshot()`. First-run on a new branch
+# regenerates baselines via the `--update-snapshots` flag; the
+# operator commits the resulting PNG bytes to git. Subsequent runs
+# pixel-diff. The dispatch input below provides an explicit knob
+# for that initial baseline pass without needing to edit the
+# workflow file.
+
+name: Frontend E2E (informational)
+
+on:
+  push:
+    branches: [master]
+    paths:
+      - 'web/**'
+      - '.github/workflows/e2e.yml'
+  pull_request:
+    paths:
+      - 'web/**'
+      - '.github/workflows/e2e.yml'
+  workflow_dispatch:
+    inputs:
+      update_snapshots:
+        description: 'Regenerate visual-regression baselines (use sparingly)'
+        type: boolean
+        default: false
+
+permissions:
+  contents: read
+
+jobs:
+  e2e:
+    name: Playwright E2E + visual regression (informational)
+    runs-on: ubuntu-latest
+    # Currently informational — do not block merges on this job.
+    # Update protected-branch rules in repo settings once stable.
+    continue-on-error: true
+    timeout-minutes: 15
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
+        with:
+          node-version: '22'
+
+      - name: Install Dependencies
+        working-directory: web
+        run: npm ci
+
+      - name: Install Playwright browsers
+        working-directory: web
+        # --with-deps installs OS packages (libnss3, libatk1.0-0, etc.)
+        # the chromium browser needs. Skipping this is the #1 source
+        # of "tests pass locally but fail on CI" for new Playwright
+        # users. The browser binary downloads to ~/.cache/ms-playwright;
+        # the actions/setup-node cache key does NOT include it, so each
+        # CI run re-downloads. Add an actions/cache step targeting
+        # ~/.cache/ms-playwright keyed by the @playwright/test version
+        # in package-lock.json once the suite is stable.
+        run: npx playwright install --with-deps chromium
+
+      - name: Run Playwright E2E + visual regression
+        working-directory: web
+        # The webServer block in playwright.config.ts boots `npm run dev`
+        # automatically and waits for http://localhost:5173 to be
+        # responsive before the first test fires. No separate "start
+        # server" step needed.
+        run: |
+          if [[ "${{ github.event.inputs.update_snapshots }}" == "true" ]]; then
+            echo "::warning::Regenerating visual-regression baselines"
+            npx playwright test --update-snapshots
+          else
+            npx playwright test
+          fi
+
+      - name: Upload Playwright report on failure
+        if: failure()
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4
+        with:
+          name: playwright-report
+          path: web/playwright-report/
+          retention-days: 7
+
+      - name: Upload visual-regression diffs on failure
+        if: failure()
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882  # v4
+        with:
+          name: visual-regression-diffs
+          path: web/test-results/
+          retention-days: 7
@@ -1,6 +1,6 @@
 # Load-test workflow — closes the #8 acquisition-readiness blocker from
 # the 2026-05-01 issuer coverage audit (see
-# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
+# the 2026-05-01 issuer coverage audit).
 #
 # CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests
 # are minutes long and don't provide useful per-PR signal — per-push
@@ -49,13 +49,13 @@ jobs:

    steps:
      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Set up Docker Buildx
        # The compose stack builds the certctl image from the repo
        # root Dockerfile. Buildx gives the build a usable cache and
        # works with newer compose versions.
-        uses: docker/setup-buildx-action@v3
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3

      - name: Run loadtest
        run: make loadtest
@@ -70,8 +70,70 @@ jobs:
        # authoritative machine-readable form; summary.txt is the
        # human-readable text the README baseline tracks.
        if: always()
-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
        with:
          name: k6-summary-${{ github.run_id }}
          path: deploy/test/loadtest/results/
          retention-days: 90
+
+  # ---------------------------------------------------------------------------
+  # Phase 8 SCALE-H2 — scale-tier scenarios. Three new k6 drivers:
+  #   - bulk-renewal: 10K-cert seed + criteria-mode POST /bulk-renew
+  #   - acme-burst:   200 concurrent VUs against directory/nonce/ARI
+  #   - agent-storm:  5K-agent seed + 167 heartbeats/sec sustained
+  #
+  # Matrix dispatch so each scenario runs on its own runner and a
+  # regression in one doesn't mask another. The matrix runs in parallel,
+  # which keeps total wall time around the existing 25-minute cap rather
+  # than ~70 minutes serialised. Each scenario brings up the full
+  # loadtest compose stack independently — there's no shared state
+  # between scenarios that would benefit from a single-runner serial
+  # invocation.
+  #
+  # Cadence: same as the API + connector tier job above (workflow_dispatch
+  # + Mondays 06:00 UTC). The scale scenarios DO produce useful per-PR
+  # signal in theory, but the per-run cost (image build + 5min run × 3)
+  # is too high to gate on every PR; weekly is the right trade-off.
+  # ---------------------------------------------------------------------------
+  k6-scale:
+    name: k6 scale tier (${{ matrix.scenario }})
+    runs-on: ubuntu-latest
+    timeout-minutes: 25
+    needs: k6
+    strategy:
+      # Parallel: a failure in one scenario shouldn't cancel the others.
+      # Each scenario's threshold breach is independent diagnostic data.
+      fail-fast: false
+      matrix:
+        scenario:
+          - bulk-renewal
+          - acme-burst
+          - agent-storm
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3
+
+      - name: Run scale loadtest (${{ matrix.scenario }})
+        env:
+          BUILDKIT_PROGRESS: plain
+        run: |
+          case "${{ matrix.scenario }}" in
+            bulk-renewal) make loadtest-scale-bulk ;;
+            acme-burst)   make loadtest-scale-acme ;;
+            agent-storm)  make loadtest-scale-agent ;;
+            *) echo "::error::unknown scenario ${{ matrix.scenario }}"; exit 1 ;;
+          esac
+
+      - name: Upload summary
+        if: always()
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
+        with:
+          # Per-scenario artifact name so the three matrix runs don't
+          # collide on upload.
+          name: k6-scale-${{ matrix.scenario }}-${{ github.run_id }}
+          path: deploy/test/loadtest/results/
+          retention-days: 90
@@ -1,5 +1,12 @@
 name: Release

+# Override the auto-generated run name (which would otherwise default to
+# the most recent commit subject + a #NN run number) so the Actions tab
+# shows "Release v2.0.69" instead of "chore: rename Go module path... #73".
+# `github.ref_name` resolves to the tag name (e.g., `v2.0.69`) for tag-triggered
+# workflows, which is the only trigger we set below.
+run-name: Release ${{ github.ref_name }}
+
 on:
  push:
    tags:
@@ -8,7 +15,7 @@ on:
 env:
  REGISTRY: ghcr.io
  # Keep in lock-step with .github/workflows/ci.yml (M-3).
-  GO_VERSION: '1.25.9'
+  GO_VERSION: '1.25.10'
  IMAGE_NAMESPACE: certctl-io

 jobs:
@@ -32,10 +39,10 @@ jobs:
        os: [linux, darwin]
        arch: [amd64, arm64]
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Set up Go
-        uses: actions/setup-go@v5
+        uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
          go-version: ${{ env.GO_VERSION }}

@@ -116,7 +123,7 @@ jobs:
          cat "${OUTPUT_NAME}.sha256"

      - name: Upload build artefacts
-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
        with:
          name: binary-${{ steps.build.outputs.output_name }}
          path: |
@@ -144,7 +151,7 @@ jobs:
      hashes: ${{ steps.hashes.outputs.hashes }}
    steps:
      - name: Download binary artefacts
-        uses: actions/download-artifact@v4
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
        with:
          pattern: binary-*
          path: artifacts
@@ -184,7 +191,7 @@ jobs:
            checksums.txt

      - name: Upload artefacts to GitHub Release
-        uses: softprops/action-gh-release@v2
+        uses: softprops/action-gh-release@3bb12739c298aeb8a4eeaf626c5b8d85266b0e65  # v2
        if: startsWith(github.ref, 'refs/tags/')
        with:
          files: |
@@ -205,11 +212,24 @@ jobs:
      actions: read
      id-token: write
      contents: write
-    uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0
+    uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@f7dd8c54c2067bafc12ca7a55595d5ee9b75204a  # v2.1.0
    with:
      base64-subjects: "${{ needs.aggregate-checksums.outputs.hashes }}"
      upload-assets: true
      provenance-name: multiple.intoto.jsonl
+      # Phase 1 RED-2 compat (2026-05-14): the SLSA reusable workflow's
+      # default path downloads a pre-built generator binary from a
+      # GitHub *release* of slsa-framework/slsa-github-generator —
+      # releases are keyed by tag name (vX.Y.Z), and the workflow
+      # rejects SHA-form refs with "Expected ref of the form
+      # refs/tags/vX.Y.Z". Phase 1 RED-2 SHA-pinned every Actions
+      # uses: line, so the default path errors out. Setting
+      # compile-generator: true instead builds the generator from the
+      # pinned-SHA source inside the workflow run — preserves
+      # supply-chain integrity (SHA pin retained), adds ~1 min build
+      # time. This is the SLSA project's documented escape hatch for
+      # SHA-pinned reusable-workflow consumers.
+      compile-generator: true

  # ----------------------------------------------------------------------
  # build-and-push-docker: push container images to GHCR with native
@@ -228,10 +248,10 @@ jobs:
      id-token: write  # Cosign keyless OIDC identity token

    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Log in to GitHub Container Registry
-        uses: docker/login-action@v3
+        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
@@ -242,14 +262,14 @@ jobs:
        run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"

      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
+        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3

      - name: Install Cosign
        uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003  # v4.1.1

      - name: Build and push server image
        id: server-push
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
        with:
          context: .
          file: ./Dockerfile
@@ -284,7 +304,7 @@ jobs:

      - name: Build and push agent image
        id: agent-push
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
        with:
          context: .
          file: ./Dockerfile.agent
@@ -327,7 +347,7 @@ jobs:
      contents: write

    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

      - name: Extract version from tag
        id: version
@@ -344,8 +364,13 @@ jobs:
        # README is the source of truth for those, and inlining them in every
        # release page produces the kind of "every release looks identical"
        # noise that gives operators no signal about what actually changed.
-        uses: softprops/action-gh-release@v2
+        uses: softprops/action-gh-release@3bb12739c298aeb8a4eeaf626c5b8d85266b0e65  # v2
        with:
+          # Pin the release title to the tag name. softprops/action-gh-release@v2
+          # falls back to the most recent commit subject when `name:` is omitted,
+          # which produces ugly titles like "chore: rename Go module path..." on
+          # the Releases page. `github.ref_name` evaluates to the tag (`v2.0.69`).
+          name: ${{ github.ref_name }}
          generate_release_notes: true
          body: |
            > **Install / upgrade:** see the [Quick Start section in the README](https://github.com/certctl-io/certctl/blob/master/README.md#quick-start) for Docker Compose, agent install, Helm, and binary download instructions.
@@ -20,7 +20,7 @@ name: security-deep-scan
 #
 # Each step is best-effort — failures are uploaded as artefacts but do
 # NOT block the workflow. Triage happens via the Bundle-7 receipt
-# directory under cowork/comprehensive-audit-2026-04-25/tool-output/.
+# the project's comprehensive-audit tool-output directory.

 on:
  schedule:
@@ -36,9 +36,9 @@ jobs:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4

-      - uses: actions/setup-go@v5
+      - uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff  # v5
        with:
          go-version: '1.25'

@@ -48,15 +48,26 @@ jobs:

      # --- Static analysis (slow paths) ---

-      - name: gosec
-        run: |
-          $(go env GOPATH)/bin/gosec -fmt sarif -out gosec.sarif ./... || true
-        continue-on-error: true
+      - name: gosec (G201/G202/G304/G108 subset — Phase 3 TEST-M2 hard gate)
+        # Phase 3 TEST-M2 closure (2026-05-13): gosec promoted from
+        # continue-on-error (advisory) to blocking on the 4 high-signal
+        # rule subset that targets real prod-bug classes:
+        #   G201 = SQL string formatting (SQL injection)
+        #   G202 = SQL string concatenation (SQL injection)
+        #   G304 = file-path traversal via tainted input
+        #   G108 = profiling endpoint exposed
+        # Other gosec rules (G1xx-G7xx broadly) remain in the SARIF
+        # report but don't gate the build — they have higher false-
+        # positive rates than these 4.
+        run: $(go env GOPATH)/bin/gosec -fmt sarif -out gosec.sarif -include=G201,G202,G304,G108 ./...

-      - name: osv-scanner (multi-ecosystem CVE)
-        run: |
-          $(go env GOPATH)/bin/osv-scanner -r --format json --output osv-scanner.json . || true
-        continue-on-error: true
+      - name: osv-scanner (multi-ecosystem CVE — Phase 3 TEST-M2 hard gate)
+        # Phase 3 TEST-M2 closure (2026-05-13): osv-scanner promoted from
+        # advisory to blocking. Complements govulncheck (already blocking
+        # in ci.yml) by covering non-Go dependencies (npm under web/,
+        # any docker base image deps). Findings fail the build; the
+        # exact CVE list lands in osv-scanner.json as a receipt either way.
+        run: $(go env GOPATH)/bin/osv-scanner -r --format json --output osv-scanner.json .

      # --- Race detector at -count=10 (D-002) ---

@@ -82,7 +93,7 @@ jobs:
      # package is mutated independently; the per-package summary line
      # (`The mutation score is X.YZ`) is grep-extracted into the receipt.
      # Acceptance threshold: ≥80% kill ratio per package; surviving
-      # mutants get triaged in cowork/comprehensive-audit-2026-04-25/
+      # mutants get triaged in the project's comprehensive-audit notes/
      # d003-mutation-results.md (per-mutant action item or
      # equivalent-mutation justification).

@@ -90,14 +101,39 @@ jobs:
        run: go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
        continue-on-error: true

-      - name: go-mutesting (crypto cluster)
+      - name: go-mutesting (crypto cluster — Phase 3 TEST-M1 hard gate at 55%)
+        # Phase 3 TEST-M1 closure (2026-05-13): go-mutesting promoted
+        # from advisory (continue-on-error + per-package `|| true`) to
+        # blocking with an explicit mutation-score floor of 55%.
+        # Per-package summary lines emit `The mutation score is X.YZ`;
+        # the awk filter extracts each, and the post-loop check fails
+        # the step if any package drops below 0.55.
+        #
+        # Floor rationale: 55% is the starter ratio that catches major
+        # regressions without rejecting the audit's "this is OK" steady
+        # state. Raise quarterly as the test suite hardens; the floor
+        # change ships in the same commit that adds the strengthening
+        # tests so the ratchet is documented.
        run: |
+          set -e
          : > go-mutesting.txt
          for pkg in ./internal/crypto/... ./internal/pkcs7/... ./internal/connector/issuer/local/...; do
            echo "=== $pkg ===" | tee -a go-mutesting.txt
-            $(go env GOPATH)/bin/go-mutesting "$pkg" 2>&1 | tee -a go-mutesting.txt || true
+            $(go env GOPATH)/bin/go-mutesting "$pkg" 2>&1 | tee -a go-mutesting.txt
          done
-        continue-on-error: true
+          # Extract every "The mutation score is X.YZ" line; fail on any
+          # score below 0.55. The check works against floats via awk so
+          # 0.55 is the literal threshold (not a percentage).
+          floor=0.55
+          fail=0
+          while IFS= read -r score; do
+            ok=$(awk -v s="$score" -v f="$floor" 'BEGIN{print (s>=f) ? 1 : 0}')
+            if [ "$ok" -ne 1 ]; then
+              echo "::error::mutation score $score below floor $floor"
+              fail=1
+            fi
+          done < <(grep -oE "The mutation score is [0-9.]+" go-mutesting.txt | awk '{print $NF}')
+          exit $fail

      # --- Container + supply chain (D-001 partial, D-006 partial) ---

@@ -105,11 +141,21 @@ jobs:
        run: docker build -t certctl:deep-scan .
        continue-on-error: true

-      - name: trivy image scan
+      - name: trivy image scan (HIGH+CRITICAL — Phase 3 TEST-M2 hard gate)
+        # Phase 3 TEST-M2 closure (2026-05-13): trivy promoted from
+        # advisory to blocking. --severity filter keeps the gate
+        # noise-free (LOW + MEDIUM findings stay in the JSON receipt
+        # but don't fail the build); --exit-code 1 makes HIGH+CRITICAL
+        # findings the actual gate. Trivy is the third hard deep-scan
+        # gate (alongside gosec + osv-scanner); ZAP / schemathesis /
+        # nuclei / testssl stay advisory because their false-positive
+        # rates on https://localhost:8443-targeted DAST runs are high.
        run: |
          docker run --rm -v "$PWD":/src aquasec/trivy:latest image \
-            --format json --output /src/trivy.json certctl:deep-scan || true
-        continue-on-error: true
+            --format json --output /src/trivy.json \
+            --severity HIGH,CRITICAL \
+            --exit-code 1 \
+            certctl:deep-scan

      - name: syft SBOM
        run: |
@@ -126,7 +172,7 @@ jobs:
        continue-on-error: true

      - name: ZAP baseline
-        uses: zaproxy/action-baseline@v0.10.0
+        uses: zaproxy/action-baseline@1e1871e84428617b969d4a1f981a8255630d54b0  # v0.10.0
        with:
          target: 'https://localhost:8443'
        continue-on-error: true
@@ -175,7 +221,7 @@ jobs:
      # --- Upload everything as artefacts ---

      - name: Upload deep-scan receipts
-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
        if: always()
        with:
          name: security-deep-scan-${{ github.run_id }}
@@ -10,6 +10,7 @@ bin/
 # Frontend
 web/node_modules/
 web/dist/
+web/.storybook-static/

 # Test binary, built with `go test -c`
 *.test
@@ -88,3 +89,17 @@ Thumbs.db
 # CERTCTL_TEST_CA_BUNDLE=./certs/ca.crt. Material is regenerated on every
 # `docker compose up` and never belongs in git.
 /deploy/test/certs/
+
+# Phase 1 RED-1 closure (2026-05-13): the f5-mock-icontrol Dockerfile
+# rebuilds from source via multi-stage build (deploy/test/f5-mock-icontrol/
+# Dockerfile line 13). The compiled ELF must not be tracked.
+deploy/test/f5-mock-icontrol/f5-mock-icontrol
+
+# Phase 0 closure (2026-05-13): cowork/ holds the operator's internal
+# legal / audit / strategy artifacts (counsel-signed AI-authorship
+# declaration, filter-repo callback, pre-rewrite bundle, audit HTML
+# scratch). It is private operator scratch space and must never
+# accidentally land in the public repo. See
+# docs/history-normalization.md for the public-facing description of
+# the Phase 0 git-history rewrite.
+cowork/
@@ -1,8 +1,771 @@
 # Changelog

-## v2.0.68 — Image registry path changed ⚠️
+## Unreleased

-> **Image registry path changed.** Starting this release, container images publish to `ghcr.io/certctl-io/certctl-server` and `ghcr.io/certctl-io/certctl-agent`. Existing pulls from `ghcr.io/shankar0123/certctl-{server,agent}:<tag>` continue to work for previously-published tags (the registry never deletes images), but the `:latest` tag at the old path stops moving forward at this release. Update your `docker pull` paths, `docker-compose.yml` `image:` keys, or Helm `image.repository` values to receive future updates. Old `git clone` / `git push` / install-script / API URLs continue to redirect forever — only the container-registry path changed.
+### Breaking changes (scheduled for v2.2.0)
+
+- **SEC-H1 staged: `CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY` opt-in flag.**
+  Phase 2 of the architecture diligence remediation (2026-05-13) introduces
+  a new env var that, when set to `true`, makes the server refuse to start
+  unless `CERTCTL_AGENT_BOOTSTRAP_TOKEN` is also set to a real value.
+  Default in this release: `false` (preserves the v2.1.x warn-mode
+  pass-through behavior for backward compatibility). Default flip to
+  `true` is scheduled for v2.2.0 per `WORKSPACE-ROADMAP.md`.
+
+  **Operator action before the v2.2.0 upgrade:** generate a real
+  bootstrap token (`openssl rand -base64 32`) and set
+  `CERTCTL_AGENT_BOOTSTRAP_TOKEN` in your env. When v2.2.0 ships, the
+  deny-empty default flips to `true` and a missing or empty token will
+  fail closed at boot. Operators with the token already set: no action
+  required.
+
+- **SEC-M4: `CERTCTL_ACME_INSECURE` now requires explicit ACK.**
+  Pre-Phase-2, `CERTCTL_ACME_INSECURE=true` produced only a boot-time
+  WARN log. Post-Phase-2 (THIS release), the server refuses to start
+  unless `CERTCTL_ACME_INSECURE_ACK=true` is set alongside it. ACME
+  directory TLS verification is the load-bearing defense against a
+  network attacker intercepting ACME enrollment; the existing flag was
+  too easy to flip via a copy-pasted Pebble runbook.
+
+  **Operator action:** if you intentionally run against a self-signed
+  ACME server (Pebble, step-ca, internal dev), add
+  `CERTCTL_ACME_INSECURE_ACK=true` to your env. Production deploys
+  MUST never set either flag.
+
+- **SEC-H3: `CERTCTL_DEMO_MODE_ACK` is no longer sticky — 24h re-ack required.**
+  Pre-Phase-2, setting `CERTCTL_DEMO_MODE_ACK=true` was sticky for the
+  lifetime of the container. Post-Phase-2, operators must ALSO set
+  `CERTCTL_DEMO_MODE_ACK_TS=$(date +%s)` to a unix epoch within the
+  last 24h. The next container restart past 24h refuses to start
+  unless a fresh TS is supplied. Catches the "forgotten demo deployment
+  promoted to production" failure mode.
+
+  **Operator action:** demo deploys must set `CERTCTL_DEMO_MODE_ACK_TS`
+  at every `docker compose up`. The demo Compose helper script handles
+  this automatically when wired; standalone demo deploys add it
+  manually. Production deploys: this guard is irrelevant
+  (`CERTCTL_DEMO_MODE_ACK` should not be set in production).
+
+### Security
+
+- **Alg-downgrade defense relaxed for Keycloak-shape IdPs (v2.1.0 pre-tag fix).**
+  Pre-fix, the IdP-bind alg-downgrade check at `internal/auth/oidc/service.go`
+  refused to load any OIDC provider whose discovery doc advertised HS256 /
+  HS384 / HS512 / `none` in `id_token_signing_alg_values_supported` —
+  even if RS256 was ALSO advertised. This broke binding against
+  Keycloak 26.x (and a handful of other real IdPs) which list every alg
+  the codebase is capable of in their discovery doc, regardless of which
+  one the realm actually signs with. The v2.1.0 Phase-10 live-IdP smoke
+  surfaced the regression: 6 testcontainers-Keycloak integration tests
+  failed with `oidc: IdP advertises weak signing algorithms (HS*/none); refusing to use as defense against downgrade attacks: HS256`.
+  **Fix:** the check now refuses only when the intersection of advertised
+  vs `DefaultAllowedAlgs` is EMPTY — an IdP advertising HS256 alongside
+  RS256 binds successfully, but an IdP advertising HS-only / none-only
+  still fails closed. The per-token alg pin at sig-verify time
+  (`isDisallowedAlg`, service.go ~L1177) remains the load-bearing defense
+  against the actual algorithm-confusion attack (forged HS256 token
+  signed with the IdP's RS256 pubkey as HMAC secret) — go-oidc/v3's
+  verifier rejects any token whose `alg` header isn't in the configured
+  allow-list, regardless of what the discovery doc claims. Updates:
+  `Service.getOrLoad` alg-check loop rewritten to compute intersection;
+  `ErrIdPDowngradeAdvertised` docstring reflects new semantics;
+  `TestDiscovery` dry-run validator surfaces HS*/none alongside RS* as
+  an informational note (not a hard fail); `docs/operator/auth-threat-model.md`
+  alg-allow-list section updated to call out the load-bearing-defense
+  hierarchy. Tests: `TestService_IdPDowngradeDefense_RS256PlusHS256_BindsSuccessfully`
+  (positive — Keycloak-shape) + `TestService_IdPDowngradeDefense_RejectsHSOnlyAdvertised`
+  (negative — pathological intersection-empty case) +
+  `TestService_RefreshKeys_CatchesPostLoadDowngrade` updated to assert
+  intersection-empty post-rotation; `TestTestDiscovery_AlgDowngrade_HS256AlongsideRS256_BindsWithNote`
+  + `TestTestDiscovery_AlgDowngrade_HSOnly_StillTrips_HardFail` pin the
+  dry-run validator's new behavior.
+
+### Tests
+
+- **Vitest coverage for the 2026-05-10/11 GUI batch (Audit 2026-05-11 Fix 12).**
+  The original GUI-batch commit `661b6db` claimed `npx tsc --noEmit PASS`
+  but shipped no Vitest cases for the new surfaces. The regression-
+  prevention layer was missing — a future refactor of `KeysPage`'s
+  assign modal could silently drop scope_type handling, the LOW-1 demo
+  banner could be hidden by a stray predicate flip, the LOW-11 hide of
+  the delete button on default roles could disappear and let operators
+  click straight into a backend 409, and nothing would surface in CI.
+  This closure adds 35 new test cases across five files:
+  `web/src/pages/auth/UsersPage.test.tsx` (new, 8 cases pinning the
+  active/deactivated/reactivate flow + provider filter + empty state +
+  loading state), `web/src/pages/auth/AuthSettingsPage.test.tsx`
+  (extended +4 cases pinning the MED-12 runtime-config panel —
+  alphabetical sort, `(empty)` placeholder, 403 silent-hide),
+  `web/src/pages/auth/KeysPage.test.tsx` (extended +8 cases pinning
+  the HIGH-10 GUI half — scope_type=global/profile/issuer body shape,
+  expires_at omission vs RFC3339 promotion, whitespace-only scope_id
+  rejection, demo-anon row mutation-button hide),
+  `web/src/pages/auth/RoleDetailPage.test.tsx` (new, 9 cases pinning
+  the MED-8 scope picker + the LOW-11 default-role delete-button hide
+  via the `DEFAULT_ROLE_IDS` set against `r-admin` + `r-auditor`),
+  `web/src/components/AuthProvider.test.tsx` (new, 5 cases pinning the
+  LOW-1 demo-banner visibility predicate — `authType==='none' &&
+  !loading` — across happy/api-key/oidc/loading/rejected branches; the
+  rejected-fetch path keeps the banner visible because the catch
+  treats it as an old-server-fallback to demo-mode, and that behavior
+  is pinned here so a future change surfaces in the diff). 40/40
+  test-file-scoped pass; `tsc --noEmit` clean.
+
+### Security
+
+- **CSRF rotation on logout closes HIGH-2 fourth call site (Audit 2026-05-11 Fix 13).**
+  The HIGH-2 closure (`dev/auth-bundle-2`) documented four
+  `RotateCSRFTokenForActor` call sites: login completion (fresh by
+  construction), Assign/RevokeRole on role-mutation (wired), Logout, and
+  an explicit operator endpoint. The 2026-05-11 review verified only 3
+  of the 4 — Logout did NOT rotate the actor's sibling sessions
+  post-revoke, leaving a window where a token captured pre-logout
+  (browser DevTools, malicious extension, session-storage leak) could
+  be replayed against the user's other-device/other-browser sessions
+  until those sessions hit their own idle/absolute expiry.
+  `SessionMinter` interface extended with `RotateCSRFTokenForActor`;
+  `Logout` invokes it after `Revoke(sess.ID)` succeeds. The
+  `auth.session_revoked` audit row gains a `csrf_rotated` detail key
+  carrying the rotated count so SOC / SIEM can correlate logout events
+  with CSRF churn. The no-cookie + invalid-cookie 204 short-circuit
+  paths skip rotation (no session row to rotate against). 3 regression
+  tests in `internal/api/handler/auth_session_oidc_test.go` pin the
+  happy path + the two short-circuit branches. The explicit operator
+  endpoint (4) remains intentionally unbuilt — the three automatic
+  triggers (login + role-mutation + logout) cover the threat model;
+  operators who want a nuclear option can use the existing
+  `RevokeAllForActor` flow which forces re-login → fresh session →
+  fresh CSRF. **HIGH-2 fully closed across all four documented call
+  sites.**
+
+- **Demo-mode residual-grants detector + cleanup endpoint + CI guard (Audit 2026-05-11 A-8).**
+  HIGH-12 (closure `b81588e`) added a fail-closed bind-address guard
+  that refuses startup when `CERTCTL_AUTH_TYPE=none` binds non-loopback
+  without `CERTCTL_DEMO_MODE_ACK=true`. The Phase 2 leg of that spec —
+  production-startup banner when `actor-demo-anon` has residual role
+  grants in `actor_roles` plus a CI guard banning new synthetic-admin
+  code paths — was deferred. This closure lands all three deferred
+  legs. (1) `cmd/server/preflight_demo_residual.go` runs after the DB
+  is open + audit service is constructed, before the HTTPS listener
+  starts; under any non-`none` auth type it queries `actor_roles` for
+  `actor-demo-anon` and emits a WARN log + `auth.demo_residual_grants_detected`
+  audit row when the row is present. The migration 000029 baseline
+  unconditionally seeds the `ar-demo-anon-admin` row at install time,
+  so EVERY production deploy will see this WARN on first boot — the
+  intended cutover workflow is documented at `docs/operator/security.md`.
+  (2) `POST /api/v1/auth/demo-residual/cleanup` is an admin-class
+  (`auth.role.assign`) cleanup endpoint that removes every
+  `actor-demo-anon` row from `actor_roles` and returns
+  `{"removed": <int64>}`; idempotent (a second call returns
+  `removed:0`), refuses 503 under `Auth.Type=none` (deleting the row
+  would break the demo path), audit-logs every invocation. (3) New
+  env var `CERTCTL_DEMO_MODE_RESIDUAL_STRICT` (default `false`)
+  pivots the WARN to fail-closed startup refusal for operators who
+  want a paranoid hostile-environment posture. (4) CI guard
+  `scripts/ci-guards/no-new-synthetic-admin.sh` pins the 17-entry
+  allowlist of source files that may reference the `actor-demo-anon`
+  literal; new runtime code paths that resolve to the synthetic actor
+  are rejected at PR time so the credibility gap stays closed. The
+  closure was framed as "credibility gap, not exploitable
+  vulnerability" — the residue requires a regression elsewhere in the
+  middleware chain to be exploitable. After this fix, the canonical
+  acquisition-readiness narrative ("RBAC primitive with no
+  synthetic-admin fallback") is fully true. Operator runbook at
+  `docs/operator/security.md#demo-to-production-cutover-audit-2026-05-11-a-8`.
+
+- **OIDC provider "Test connection" panel (Audit 2026-05-11 Fix 09 — MED-5 GUI half).**
+  MED-5's backend dry-run endpoint (`POST /api/v1/auth/oidc/test`, gated
+  `auth.oidc.create`) shipped on `dev/auth-bundle-2` but had no GUI caller —
+  the `authOIDCTestProvider` function in `web/src/api/client.ts` was dead
+  code. Operators had to complete the create form blind, save, then click
+  "Refresh" to discover whether the issuer URL worked; failures left a
+  broken provider row in the database that had to be deleted before
+  retrying. New shared component
+  `web/src/pages/auth/OIDCTestConnectionPanel.tsx` calls the backend
+  against the live form state and renders a four-row status panel inline:
+  Discovery fetched, JWKS reachable, supported algs (warns when the IdP
+  advertises none), and RFC 9207 iss-parameter advertisement (informational
+  `·` glyph, not ✗, because the spec is SHOULD). Backend per-leg `errors[]`
+  flow into an inline bullet list. The panel is mounted in the
+  OIDCProvidersPage create modal AND the OIDCProviderDetailPage edit form —
+  the edit-form half is load-bearing for verifying IdP rotations (Keycloak
+  realm rename, Okta tenant move) without committing first. Run button is
+  disabled until the issuer URL is non-empty (whitespace-trimmed); the
+  component is read-only — safe to run repeatedly. 8 Vitest tests pin the
+  glyph-vs-glyph contract (✓/✗/⚠/·), the button-disabled-without-issuer
+  shape, and the test-id-suffix collision-prevention when the panel is
+  mounted twice on the same page.
+
+- **OIDC JWKS health panel + Refresh-now button (Audit 2026-05-11 Fix 10 — MED-7 GUI half).**
+  MED-7's backend endpoint `GET /api/v1/auth/oidc/providers/{id}/jwks-status`
+  (commit `d85114f`) shipped the per-provider verifier counters on
+  `dev/auth-bundle-2` but the GUI never called it. The audit doc had
+  prematurely flipped the row to CLOSED; `authOIDCJWKSStatus` in the
+  API client was dead code. Operators investigating "why is login
+  failing for this IdP" couldn't see `last_refresh_at`,
+  `rejected_jws_count`, or `last_error` from the GUI — they had to
+  drop to curl. New shared component
+  `web/src/pages/auth/OIDCJWKSStatusPanel.tsx` queries the endpoint
+  via TanStack Query (30s `staleTime`, `retry: 0` so a 403 hides the
+  panel silently for callers without `auth.oidc.list`) and renders
+  six dt/dd rows: Last refresh (with `(never — cold cache)` sentinel
+  when the timestamp is empty), Refresh count, Rejected JWS count,
+  Last error (red treatment when non-empty, `(none)` sentinel
+  otherwise), RFC 9207 iss param ("supported by IdP" / "not
+  advertised"), and Current KIDs (`(not exposed — query jwks_uri
+  directly)` sentinel when the backend declines to expose the list).
+  A "Refresh now" button invokes the existing
+  `POST .../refresh` (RefreshKeys path) and invalidates the panel's
+  query so the freshly-updated counters render without a page
+  reload. The button is hidden for callers without `auth.oidc.edit`
+  via the panel's optional `canRefresh` prop. Mounted on
+  `OIDCProviderDetailPage.tsx` between the read-only field display
+  and the Actions section. 9 Vitest tests pin: loading state,
+  happy-path-all-six-rows, 403-hides-panel, refresh-invalidates-
+  query, refresh-failure-surfaces-inline-without-hiding-panel,
+  never-refreshed-cold-cache-sentinel, current-kids-empty-not-
+  exposed-sentinel, last-error-red-treatment, and canRefresh=false-
+  hides-the-button.
+
+- **UsersPage sidebar nav entry (Audit 2026-05-11 Fix 11 — MED-11
+  discoverability).** The MED-11 closure shipped `UsersPage.tsx` + wired
+  the `/auth/users` route in `web/src/main.tsx`, but the sidebar
+  navigation never gained a corresponding entry. Operators reached the
+  federated-user-admin surface (used during compliance audits — "show
+  me last login for every IdP-federated user") only by knowing the URL.
+  A page that exists but isn't navigable is a half-finished page. New
+  Users entry under the Auth section in `web/src/components/Layout.tsx`
+  sits between Sessions and Roles (federated-identity grouping). Three
+  Vitest tests in `Layout.test.tsx` pin the link's presence, the
+  `/auth/users` destination, and the DOM ordering relative to Sessions
+  so a future refactor that re-orders or removes the entry surfaces in
+  the diff.
+
+- **Scope-aware actor-role revoke (Audit 2026-05-11 A-4).**
+  HIGH-10 made it possible to grant the same role to the same actor at
+  multiple scopes (e.g. `r-operator` on `profile=p-acme` AND `profile=p-globex`)
+  via the unique constraint extension on `actor_roles`, but
+  `ActorRoleRepository.Revoke` ignored `(scope_type, scope_id)` and
+  unconditionally deleted every variant. Operators who wanted to drop
+  one scoped grant had to nuke them all and re-grant the remainder —
+  a race window where the actor's access was briefly different. The
+  `DELETE /v1/auth/keys/{id}/roles/{role_id}` endpoint now accepts
+  optional `?scope_type=` / `?scope_id=` query params that narrow the
+  revoke to a single variant; no-match returns 404. The legacy "revoke
+  every variant" semantic is preserved when the query params are
+  absent, so existing CLI / GUI buttons keep working unchanged. The
+  audit row's `details` payload records which mode fired so SOC / SIEM
+  can distinguish wide cleanups from targeted demotions. MCP tool
+  `certctl_auth_revoke_role_from_key` gains optional `scope_type` +
+  `scope_id` input fields with matching semantics. Documented in
+  `docs/operator/rbac.md` under "Revoke: legacy 'all variants' vs
+  scope-selective."
+
+### Security (BREAKING — silent-elevation closure)
+
+- **HIGH-10 actor-role scope is now enforced (Audit 2026-05-11 A-1).**
+  Pre-fix, `actor_roles.scope_type` / `scope_id` (added in migration 000043
+  by the HIGH-10 closure) were persisted by Grant + accepted on the handler
+  body + surfaced through the GUI/MCP — but the load-bearing
+  `EffectivePermissions` SQL never read them. A profile-scoped grant
+  silently elevated to global at authorization time. Canonical CRIT-5
+  lying-field shape, replicated. **The post-fix authorization narrows
+  correctly**: every existing `actor_roles` row with `scope_type != 'global'`
+  now takes effect.
+
+  > **Operator advisory:** if you used the HIGH-10 scope-bound role-grant
+  > API between commit `551812b` and the v2.1.0 tag (the column was
+  > populated but ignored), the grants were silently global. After
+  > upgrading, audit `SELECT actor_id, role_id, scope_type, scope_id FROM
+  > actor_roles WHERE scope_type != 'global'` and confirm the narrowing
+  > reflects intent. If an actor was granted a scoped role but expected
+  > global behavior, re-grant with `scope_type=global`.
+
+### Security (BREAKING)
+
+- **Federated-user deactivation now actually blocks login (Audit 2026-05-11 A-2).**
+  The MED-11 closure shipped `users.deactivated_at` + `DELETE /api/v1/auth/users/{id}`
+  + cascade-session-revoke, but the column was a "lying field" three legs over: the
+  postgres user repository never SELECTed it (so `User.DeactivatedAt` always read
+  nil), the `Update` SQL never wrote it (so the handler's mutation was a no-op),
+  and the OIDC `upsertUser` path never checked it (so the next login under the
+  same `(provider, subject)` tuple re-minted a session and re-elevated the user).
+  The cascade-revoke remained correct for the current cookie only. **Operator
+  advisory: if you deactivated a federated user between the MED-11 closure
+  (Bundle 2 merge `dea5053`) and the v2.1.0 release tag, verify the user cannot
+  OIDC-log-in after upgrading — the column took no effect at login time before
+  this fix. If needed, re-run the deactivation against the upgraded server.**
+  Closure: `userColumns` + `scanUser` now read `deactivated_at` via `sql.NullTime`;
+  `Create` + `Update` write it explicitly; `upsertUser` returns the new
+  `ErrUserDeactivated` sentinel before mutating fields (preserves `last_login_at`
+  forensics on rejected logins); `classifyOIDCFailure` surfaces the rejection
+  as audit category `user_deactivated`. Self-deactivate guard on
+  `DELETE /api/v1/auth/users/{id}` returns HTTP 409 + audit row
+  `auth.user_deactivate_self_rejected` (prevents an admin from one-way-door
+  locking themselves out via the standard handler — break-glass remains the
+  recovery path). New inverse endpoint `POST /api/v1/auth/users/{id}/reactivate`
+  (gated `auth.user.deactivate` — reactivation is the inverse op, not a separate
+  privilege) clears `deactivated_at`; emits audit row `auth.user_reactivated`.
+  Sessions revoked at deactivation stay revoked across reactivation — the user
+  must complete a fresh OIDC login. GUI: `UsersPage.tsx` now renders a Reactivate
+  button on deactivated rows. CWE-862 (missing authorization at the user-state
+  boundary). SOC 2 CC6.3 + ISO 27001 A.9.2.6 compliance-table-flipping fix.
+- **`__Host-` cookie prefix on all three auth cookies (Audit 2026-05-10 MED-14).**
+  The session cookie, CSRF cookie, and OIDC pre-login cookie are renamed from
+  `certctl_session` / `certctl_csrf` / `certctl_oidc_pending` to
+  `__Host-certctl_session` / `__Host-certctl_csrf` / `__Host-certctl_oidc_pending`
+  to gain browser-enforced subdomain-takeover protection (a `__Host-*` cookie can
+  only be set with `Path=/` + `Secure` + no `Domain` attribute, and the browser
+  rejects subdomain attempts to overwrite it). **Active sessions invalidate on
+  the rolling deploy that lands this change** — operators must re-authenticate
+  once after upgrading. The GUI's CSRF cookie reader was updated in lockstep.
+  See `docs/migration/oidc-enable.md` for operator-facing detail.
+
+### Security
+
+- **OIDC `allowed_email_domains` now editable in the GUI (Audit 2026-05-11 A-3).**
+  The backend gate that rejects logins whose email domain is outside the
+  configured allowlist landed in v2.1.0 (CRIT-5 closure, 2026-05-10), but the
+  GUI never exposed the field — GUI-driven operators had to use the API
+  directly to configure tenant isolation against multi-tenant IdPs (Auth0,
+  Azure AD common endpoint, Google Workspace). The OIDCProvidersPage create
+  modal and OIDCProviderDetailPage detail view now render a chip-style
+  multi-input with client-side validation that mirrors the backend rules
+  (no `@`, no whitespace, no wildcards, lowercase-only FQDNs). The read-only
+  view renders an explicit "any (no gate configured)" sentinel when the list
+  is empty so operators can tell "not configured" apart from "field is
+  invisible." A "Clear all" button on the edit form is gated by a confirm
+  dialog that warns about removing the tenant gate. **Operator advisory: if
+  you provisioned OIDC providers via the GUI between v2.1.0 and this fix,
+  verify `allowed_email_domains` matches your tenant policy — the field was
+  configurable only via API / MCP / direct SQL during that window.** Per-IdP
+  runbooks for multi-tenant IdPs in `docs/operator/oidc-runbooks/` already
+  documented the field; the GUI now matches.
+
+- **Approval payload preview (Audit 2026-05-11 A-5).**
+  The MED-10 closure claim ("PARTIAL: raw JSON preview; diff library
+  deferred") was inaccurate — `ApprovalsPage.tsx` rendered no payload
+  at all, so approvers were clicking Approve / Reject without seeing
+  the change they were authorizing. That defeats the entire four-eyes
+  primitive: an approver who can't see what they're approving is
+  rubber-stamping. Each row now carries a Preview toggle that expands
+  an inline panel dispatching by kind: `profile_edit` shows a
+  field-level before/after diff (changed-only rows, red/green cells,
+  `(unset)` sentinel for added/removed fields); `cert_issuance` shows
+  a definition list of CN / SANs / profile / key algo / must-staple /
+  validity (catches the wildcard-against-corp-internal-profile attack
+  at review time); unknown kinds render a generic JSON preview for
+  forward-compat with future approval kinds. The base64-encoded JSON
+  payload is decoded via the new `decodePayload` helper; malformed
+  inputs render an explicit decode-error fallback — silent failure on
+  the payload preview is what produced this bug in the first place.
+
+- **Strict pre-login UA/IP binding (Audit 2026-05-11 A-6).**
+  The MED-16 closure left a request-side empty-header bypass: when the
+  pre-login row carried a User-Agent or client-IP binding but the
+  `/auth/oidc/callback` request omitted the corresponding value, the
+  binding check was silently skipped. `curl` doesn't send User-Agent
+  by default; many programmatic clients omit it. An attacker who
+  acquired a pre-login cookie could replay it without the bound
+  header and bypass the RFC 9700 §4.7.1 defense. The check is now
+  strict-when-stored — an empty request-side value with a non-empty
+  stored binding rejects with HTTP 400 and the new audit failure
+  categories `prelogin_ua_missing` / `prelogin_ip_missing` (distinct
+  from the existing `*_mismatch` categories so SIEM rules can alert
+  specifically on bypass attempts). **Operator advisory:** environments
+  where the User-Agent is stripped in transit (some debug proxies, a
+  handful of CDN configurations) must set
+  `CERTCTL_OIDC_PRELOGIN_REQUIRE_UA=false` to keep logins working;
+  symmetric `CERTCTL_OIDC_PRELOGIN_REQUIRE_IP=false` exists for the
+  IP-side. The legacy-row compat window — pre-migration rows with no
+  stored binding — still passes through unchecked, but that window is
+  bounded by the 10-minute pre-login TTL.
+
+- **OIDC provider Advanced fields are now editable in the GUI (Audit 2026-05-11 A-7).**
+  The MED-4 row had been DEFERRED to v3 with the rationale "backend
+  already accepts these fields." The verifier hit the GUI and found
+  that the read-only display claimed the values were editable, but the
+  edit form had no inputs — the save handler passed `provider.scopes`
+  / `provider.groups_claim_path` / `provider.groups_claim_format` /
+  `provider.iat_window_seconds` / `provider.jwks_cache_ttl_seconds`
+  unchanged from the loaded object. Operators who wanted to bump the
+  IAT window or change the groups-claim path had to drop to curl /
+  MCP and trust the GUI's display matched what they'd set elsewhere.
+  Lying UX. The OIDCProviderDetailPage edit form now has a collapsible
+  Advanced section with five inputs (scopes as a space-separated text
+  field; groups-claim path; groups-claim format select with the
+  backend's `string-array` / `json-path` enum; IAT window number input
+  bounded 1–600; JWKS cache TTL number input with floor 60). Client-side
+  validation mirrors the backend `Validate` rules so common operator
+  mistakes (IAT > 600, JWKS TTL < 60, empty scopes, empty groups-claim-path)
+  reject inline instead of round-tripping a 400. The read-only `<dl>`
+  also gained the previously-invisible `jwks_cache_ttl_seconds` row.
+
+- **Pre-login cookie Path widened from `/auth/oidc/` to `/` (Audit MED-14
+  follow-on).** Required to satisfy the `__Host-` prefix's `Path=/` rule. The
+  cookie lifetime is unchanged (10 minutes) and only the callback handler
+  consumes it; the wider path scope is harmless.
+
+- **RFC 9207 `iss` URL parameter check on OIDC callback (Audit 2026-05-10
+  MED-17).** When the matched IdP's discovery doc advertises
+  `authorization_response_iss_parameter_supported: true`, certctl now requires
+  the `iss` query parameter on `/auth/oidc/callback` and enforces a
+  constant-time compare against the configured provider's `IssuerURL`. Mismatch
+  rejects with HTTP 400; the audit row's `failure_category` distinguishes
+  `iss_param_missing` / `iss_param_mismatch` (RFC 9207 leg) from the existing
+  `id_token_iss_mismatch` (in-token iss claim leg). Closes the mix-up-attack
+  defense for modern Keycloak, Authentik, and public-trust CAs that ship
+  RFC-9207 discovery. Providers that don't advertise support (the majority
+  today) keep pre-fix behavior — back-compat is preserved.
+
+- **Auth GUI batch (Audit 2026-05-10 MED-4/7/8/10/11/12 + LOW-1/11/12 +
+  HIGH-10 GUI).** New backend endpoints land alongside their GUI
+  consumers: `GET /api/v1/auth/users` + `DELETE /api/v1/auth/users/{id}`
+  (auth.user.read / auth.user.deactivate; migration 000045 adds
+  `users.deactivated_at` plus the two new permissions); `GET
+  /api/v1/auth/runtime-config` (auth.role.assign) returning a sanitized
+  flat-map of deployed CERTCTL_* values (no secrets leaked — only
+  set/unset booleans and counts); `GET
+  /api/v1/auth/oidc/providers/{id}/jwks-status` (auth.oidc.list)
+  returning the per-provider verifier counters (refresh count, last
+  refresh / error timestamps, rejected JWS count, RFC 9207 iss-param
+  flag). New `UsersPage` lists federated identities + soft-deactivates.
+  `AuthSettingsPage` gains the runtime-config panel. `KeysPage`'s
+  assign-role modal now collects `scope_type` / `scope_id` /
+  `expires_at`. `RoleDetailPage`'s add-permission form gains the same
+  scope picker, and the Delete button is hidden on the 7 default
+  system roles (server already rejected, this is pure UX).
+  `AuthProvider` renders a sticky red demo-mode banner when
+  `auth_type=none`. `actor-demo-anon` rows on `KeysPage` already had
+  buttons disabled.
+
+- **11 new MCP tools (Audit 2026-05-10 MED-13).** Approval workflow
+  (`certctl_approval_list` / `_get` / `_approve` / `_reject`), break-glass
+  credential admin (`certctl_breakglass_list` / `_set_password` /
+  `_unlock` / `_remove`), bootstrap status + consume
+  (`certctl_bootstrap_status` / `_consume`), and audit category filter
+  (`certctl_audit_list_with_category`). All route through the existing
+  HTTP client so server-side permission gates fire unchanged.
+  `certctl_bootstrap_consume`'s tool description carries an explicit
+  "NEVER WIRE THIS TO AUTONOMOUS OPERATION" warning — a leaked
+  bootstrap token mints a fresh admin API key bypassing every other
+  access-control gate, so the tool is for one-shot manual operator
+  invocation only.
+
+- **JWKS auto-refresh on cache-miss (Audit 2026-05-10 MED-6).** When
+  the IdP rotates its signing key between pre-login + callback, the
+  cached JWKS no longer contains the kid referenced by the inbound ID
+  token's JWS header. Pre-fix, the verify failed with a generic error
+  and the operator had to manually call `POST
+  /api/v1/auth/oidc/providers/{id}/refresh`. The service now detects
+  the kid-not-in-cache shape (`isKidMismatchError`) and runs a
+  one-shot `RefreshKeys` (evict cache → re-fetch discovery + JWKS →
+  re-run alg-downgrade defense) before retrying the verify exactly
+  once. Bounded recovery: a second failure surfaces as
+  `ErrJWKSUnreachable` per the original branches; no retry loop. A
+  separate matcher (`isKidMismatchError`) is intentionally narrow
+  so generic signature failures don't trigger refresh.
+
+- **OIDC provider test endpoint (Audit 2026-05-10 MED-5).** New
+  `POST /api/v1/auth/oidc/test` dry-runs an OIDC provider configuration
+  without persisting: fetches the discovery doc, runs the alg-downgrade
+  defense, detects RFC 9207 iss-parameter advertisement, and confirms
+  JWKS reachability. Returns `TestDiscoveryResult{discovery_succeeded,
+  jwks_reachable, supported_alg_values, iss_param_supported, errors[]}`
+  so the GUI (forthcoming) can render per-check status rows. Per-leg
+  failures ride in the response body's `errors` array; only a malformed
+  request body trips 400. Gate: `auth.oidc.create`. Audit row
+  `auth.oidc_provider_tested` carries the success/failure summary.
+
+- **Pre-login UA / source-IP binding on OIDC callback (Audit 2026-05-10
+  MED-16).** RFC 9700 §4.7.1 defense against stolen-pre-login-cookie replay
+  by a different browser / source. Migration `000044_prelogin_uaip` adds
+  `client_ip` + `user_agent` to `oidc_pre_login_sessions`; values captured at
+  `/auth/oidc/login` are constant-time compared at `/auth/oidc/callback`.
+  Mismatches return HTTP 400 with audit `failure_category` =
+  `prelogin_ua_mismatch` or `prelogin_ip_mismatch`. Two operator escape
+  hatches: `CERTCTL_OIDC_PRELOGIN_REQUIRE_UA` and
+  `CERTCTL_OIDC_PRELOGIN_REQUIRE_IP` (both default `true`) — operators on
+  enterprise proxies that rewrite UA, or dual-stack v4/v6 environments where
+  source IP routinely flips, can disable the affected leg. The binding column
+  is persisted even when enforcement is off, so retroactive forensics remain
+  possible. Empty values on either side pass through (rolling-deploy +
+  headless-proxy compat).
+
+## v2.1.0 - Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions ⚠️
+
+> **SECURITY: AUDIT YOUR API KEYS.**
+>
+> Bundle 1 ships role-based authorization. Every existing API key
+> configured via `CERTCTL_API_KEYS_NAMED` (or the legacy
+> `CERTCTL_AUTH_SECRET`) is mapped to the **r-admin role on the first
+> upgrade boot** so existing automation keeps working unchanged. Most
+> keys do NOT need full admin power; downgrade them before tagging
+> the next release.
+>
+> Recommended post-upgrade flow:
+>
+> ```bash
+> # 1. List every key with its current role:
+> certctl-cli auth keys list
+>
+> # 2. Walk an interactive prompt that downgrades each key:
+> certctl-cli auth keys scope-down
+>
+> # 3. Or get a heuristic suggestion based on 30 days of audit history:
+> certctl-cli auth keys scope-down --suggest
+> certctl-cli auth keys scope-down --suggest --apply   # applies the suggestion
+>
+> # 4. Or drive scope-down from a JSON config (Helm post-upgrade hook):
+> certctl-cli auth keys scope-down --non-interactive ./scope-down.json
+> ```
+>
+> The synthetic `actor-demo-anon` actor (used when
+> `CERTCTL_AUTH_TYPE=none` is configured) is system-managed and
+> excluded from the prompt loop.
+
+What else changed in v2.1.0:
+
+- **Audit 2026-05-10 CRIT-1 closure — wire-layer RBAC enforcement.**
+  The Bundle 1 + Bundle 2 audit surfaced that the permission catalogue
+  was enforced on ~24 admin-only routes only; the bulk of state-changing
+  routes (`POST /api/v1/certificates`, `PUT /api/v1/profiles/{id}`,
+  `DELETE /api/v1/issuers/{id}`, `POST /api/v1/agents/{id}/csr`, even
+  `POST /api/v1/auth/roles` + `POST /api/v1/auth/keys/{id}/roles`) had
+  no `rbacGate` wrap. A `r-viewer` Bearer was essentially `r-admin`
+  minus five fine-grained verbs at the wire layer (CWE-862). This
+  release wraps every state-changing + read endpoint with
+  `rbacGate` (global scope) or `rbacGateScoped` (per-profile / per-
+  issuer scope-bound grants), and adds an AST-level CI guard
+  (`TestRouterRBACGateCoverage`) that fails when a new route is
+  registered without enforcement. Catalogue extended via migration
+  000039 with 30 permissions covering `cert.edit`, `job.*`,
+  `approval.*`, `policy.*`, `team.*`, `owner.*`, `notification.*`,
+  `discovery.*`, `network_scan.*`, `healthcheck.*`, `digest.*`,
+  `verification.*`, `stats.read`, `metrics.read`. **AUDIT YOUR
+  KEYS** (the scope-down call-out above) now translates to real
+  reduction in blast radius. Auditor pin preserved at exactly
+  `{audit.read, audit.export}`.
+
+- **RBAC primitive shipped.** `tenants`, `roles`, `permissions`,
+  `role_permissions`, `actor_roles` tables (migration 000029); 33-permission
+  canonical catalogue; 7 default roles (`admin`, `operator`, `viewer`,
+  `agent`, `mcp`, `cli`, `auditor`); per-handler permission gates via
+  `auth.RequirePermission` middleware (replaces the legacy
+  `IsAdmin` boolean check on the 5 admin-only handlers).
+- **Day-0 admin bootstrap.** Set `CERTCTL_BOOTSTRAP_TOKEN` on a fresh
+  deploy and POST a single curl call against `/api/v1/auth/bootstrap` to
+  mint the first admin API key; one-shot, never logged, and locks
+  closed once any admin actor exists. Migration 000031 ships the
+  `api_keys` table that stores the SHA-256 hash; the plaintext is
+  shown in the response body once and never persisted.
+- **Auditor role split.** New `auditor` role holds only `audit.read`
+  + `audit.export`. Compliance reviewers can read the audit trail
+  without holding mutation power. Migration 000032 adds
+  `audit_events.event_category` so auditors can filter to
+  authentication-related events specifically.
+- **`/v1/auth/check` enrichment.** Response now includes the actor's
+  standing roles and effective permissions, so the GUI gates
+  affordances from a single fetch on app boot.
+- **Approval-bypass closure.** Edits to a profile that has (or
+  would have) `RequiresApproval=true` now route through the
+  `ApprovalService` two-person integrity gate (Phase 9). Migration
+  000033 adds `approval_kind` + `payload` to
+  `issuance_approval_requests` so cert-issuance and profile-edit
+  approvals share the same workflow. Same-actor self-approve is
+  rejected with `ErrApproveBySameActor` for both kinds. Closes the
+  flip-flop loophole where an admin could disable approval, mutate,
+  re-enable. Documented at
+  [`docs/reference/profiles.md`](docs/reference/profiles.md).
+- **GUI: Roles / API Keys / Auth Settings / Approvals queue.**
+  Four new pages under `/auth/*` consume `/v1/auth/me` for
+  permission-aware rendering. The Approvals queue blocks
+  self-approve at the client layer (Approve/Reject buttons hidden
+  when requested_by == current actor_id) on top of the server-side
+  enforcement. AuditPage gains a category filter (cert_lifecycle /
+  auth / config) for the auditor view.
+- **MCP server gains 12 RBAC tools.** Operators driving certctl
+  from Claude / VS Code / any MCP client get parity with the GUI
+  + CLI. Each tool routes through the same HTTP handler; permission
+  gates fire server-side.
+- **OpenAPI catalogues every new route.** Every Bundle 1 endpoint
+  ships with an `operationId`; the parity test guards against drift.
+- **Coverage gates.** `internal/auth/` and `internal/service/auth/`
+  now have ≥85% coverage floors in `.github/coverage-thresholds.yml`.
+  The 12-path negative-test list from the Bundle 1 prompt is
+  fully covered (path #12 deferred with in-tree TODO).
+- **Protocol-endpoint allowlist pinned at three layers.** The
+  middleware bypass (`auth.IsProtocolEndpoint`), the router-level
+  `AuthExemptRouterRoutes` constant, and a new
+  `phase12_protocol_allowlist_test.go` AST scan all guard against
+  accidentally wrapping ACME / SCEP / EST / OCSP / CRL routes in
+  `rbacGate`.
+- **Bundle 2: OIDC + sessions + back-channel logout + break-glass.**
+  Auth Bundle 2 ships in the same v2.1.0 release. Operators get OIDC
+  SSO support for Keycloak / Authentik / Okta / Auth0 / Microsoft
+  Entra ID / Google Workspace (via Keycloak broker), HMAC-signed
+  session cookies with idle/absolute timeouts + CSRF defense,
+  back-channel logout per OpenID Connect Back-Channel Logout 1.0,
+  and a default-OFF break-glass admin path with Argon2id passwords
+  for SSO-broken incidents. API-key auth keeps working unchanged
+  alongside; existing automation needs no changes. Migration walkthrough
+  at [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md);
+  per-IdP setup guides at
+  [`docs/operator/oidc-runbooks/index.md`](docs/operator/oidc-runbooks/index.md).
+- **OIDC token validation pinned at three layers.** Algorithm
+  allow-list (RS256/RS512/ES256/ES384/EdDSA only) with HS-family + `none`
+  rejected at the service-layer sentinel; IdP-downgrade-attack defense
+  at provider creation AND every JWKS RefreshKeys (intersects the IdP's
+  advertised `id_token_signing_alg_values_supported` against the allow-
+  list, rejects providers that advertise weak algs even before any
+  token is signed); OIDC Core §3.1.3.7 re-verification of `iss` /
+  `aud` / `azp` / `at_hash` (REQUIRED-when-access_token-present per
+  Phase 3 tightening of the spec MAY → MUST) / `exp` / `iat` window
+  / `nonce` constant-time-compare. PKCE-S256 mandatory; `plain`
+  rejected. Single-use state + nonce via atomic `DELETE...RETURNING`
+  on consume.
+- **Session cookies use length-prefixed HMAC.** The cookie wire format
+  is `v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`
+  with HMAC input `len:sid:len:kid` (NOT bare-concat) to defeat
+  concatenation collisions. `HttpOnly` + `Secure` + `SameSite=Lax`
+  default; `SameSite=Strict` configurable via `CERTCTL_SESSION_SAMESITE`.
+  Idle timeout 1h / absolute 8h defaults; scheduler GC sweeps expired
+  rows hourly. Signing keys rotate via the new `RotateSigningKey`
+  primitive; the old key stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION`
+  (default 24h) so existing cookies validate during rollover.
+- **CSRF defense via double-submit-cookie + hashed-token-on-row.**
+  Plaintext CSRF token in the JS-readable `certctl_csrf` cookie
+  (intentionally `HttpOnly=false` for the GUI to echo into the
+  `X-CSRF-Token` header); SHA-256 hash on the session row;
+  `subtle.ConstantTimeCompare` in the new `CSRFMiddleware`. API-key
+  actors are CSRF-exempt (no session row in context).
+- **OIDC `client_secret` encrypted at rest.** AES-256-GCM v3 blob
+  format (magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using
+  the existing `CERTCTL_CONFIG_ENCRYPTION_KEY`. Encryption invariant
+  pinned by an integration test asserting ciphertext != plaintext +
+  v3 blob shape + round-trip recovery + wrong-passphrase fails.
+- **OIDC first-admin bootstrap.** New `CERTCTL_BOOTSTRAP_ADMIN_GROUPS`
+  + `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` env vars: the first
+  OIDC-authenticated user with a matching group claim becomes admin
+  per tenant. Coexists with the Bundle 1 env-var-token bootstrap;
+  the admin-existence probe ensures only one wins. Audit row
+  (`bootstrap.oidc_first_admin`) on every grant.
+- **Break-glass admin (default-OFF).** New `CERTCTL_BREAKGLASS_ENABLED`
+  env var (default `false`). When enabled, the local Argon2id-password
+  admin path bypasses OIDC + group-claim layers — intended ONLY for
+  SSO-broken incidents. Argon2id with OWASP 2024 params (m=64 MiB,
+  t=3, p=4); lockout after 5 failures (configurable); constant-time
+  across all failure paths via `verifyDummy`; surface invisibility
+  (HTTP 404 on every endpoint when disabled, NOT 403). WARN log at
+  server boot when enabled. WebAuthn/FIDO2 second factor pairing on
+  the v3 roadmap (Decision 12).
+- **GUI: OIDC Providers + Group → Role Mappings + Sessions + login
+  buttons.** Four new pages under `/auth/*` consume the Bundle 2 API
+  surface. Login page renders one "Sign in with X" button per
+  configured OIDC provider (in addition to the API-key form, which
+  remains as a fallback for Bearer-mode + break-glass paths). Sessions
+  page exposes own-sessions + admin all-actors view. Every actionable
+  element is permission-gated server-side via `auth.oidc.*` and
+  `auth.session.*` perms; client-side hide is UX layer. Logout button
+  in the sidebar fires `POST /auth/logout` to clear the session
+  server-side before redirecting to login.
+- **MCP server gains 11 OIDC + session tools.** `certctl_auth_list_oidc_providers`,
+  `_get_oidc_provider`, `_create_oidc_provider`, `_update_oidc_provider`,
+  `_delete_oidc_provider`, `_refresh_oidc_provider`,
+  `_list_group_mappings`, `_add_group_mapping`, `_remove_group_mapping`,
+  `_list_sessions`, `_revoke_session`. Operator-facing MCP tool count
+  goes 12 (Bundle 1 RBAC) → 23 across the auth surface. Total MCP
+  tool count: `grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go` ≈ 150.
+- **Per-IdP runbooks: 6 production-tier setup guides** at
+  `docs/operator/oidc-runbooks/`. Each runbook follows a consistent
+  five-section layout (Prerequisites / IdP-side config / certctl-side
+  config / Verification / Troubleshooting + Validation checklist with
+  operator sign-off line). Keycloak is the canonical reference;
+  Authentik / Okta / Auth0 / Entra ID / Google Workspace document the
+  IdP-specific deltas (Auth0's namespaced custom claims; Entra ID's
+  group OBJECT IDs; Google Workspace's missing-groups-claim limitation
+  + the recommended Keycloak broker pattern).
+- **Threat model extended.** [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md)
+  ships 5 new "Defenses Bundle 2 ships" subsections + 8 new threat-
+  catalogue subsections (OIDC token forgery / session hijacking / IdP
+  compromise / back-channel logout failure modes / group-claim
+  manipulation / bootstrap risks / break-glass risks / token-leak
+  hygiene). 6 new SQL-shaped operator-facing checks. New "Threats
+  Bundle 2 does NOT close" section enumerating the 8 v3-backlog items
+  (WebAuthn / JIT elevation / SAML / multi-tenant activation /
+  HSM-FIPS / OIDC RP-initiated logout / Playwright / per-IdP
+  external-tester sign-off).
+- **Performance baselines documented.** [`docs/operator/auth-benchmarks.md`](docs/operator/auth-benchmarks.md)
+  ships four benchmarks with measured baselines on a 4 vCPU /
+  8 GiB / Postgres 16 / Go 1.25 floor: `BenchmarkSession_SteadyState`
+  p99 5 µs (target < 1 ms; 200× under), `BenchmarkSession_ColdProcess`
+  p99 7.1 ms (target < 10 ms), `BenchmarkOIDC_SteadyState` p99 1.5 ms
+  (target < 5 ms), `BenchmarkOIDC_ColdCache` operator-runs against
+  live Keycloak via `make benchmark-auth-coldcache`.
+- **Standards + RFC implementation table.** [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md)
+  ships 13 RFC / standard rows + 14 CWE rows with concrete file paths
+  + negative-test anchors per row. NOT a compliance-mapping doc per
+  the operator's 2026-05-05 retired-compliance-docs decision; the
+  doc explicitly says "build the framework mapping yourself against
+  the rows here using the framework-mapping methodology your audit
+  firm prescribes; this project does not own that mapping."
+- **Coverage gates held at floor 90 across all four Bundle 2
+  packages.** `internal/auth/oidc/` 93.7%, `internal/auth/session/`
+  94.9%, `internal/auth/breakglass/` 91.5%, `internal/auth/user/domain/`
+  96.4%. NO held-low-with-rationale entry — the Phase 13 prompt's
+  anti-Bundle-1-mistake rule held. Bundle 1's existing 85% floors
+  for `internal/auth/` + `internal/service/auth/` stay 85
+  (already-shipped-and-accepted) per the prompt's explicit
+  inheritance rule.
+- **Multi-tenant query CI guard.** New `scripts/ci-guards/multi-tenant-query-coverage.sh`
+  (ratchet-style, baseline 32 at v2.1.0 close): greps every
+  SELECT/UPDATE/DELETE in `internal/repository/postgres/` against
+  10 tenant-aware tables, fails on regression OR improvement (forces
+  the operator to lift / lower the baseline visibly). Forward-compat
+  protection so a future Bundle 3 / managed-service multi-tenant
+  activation can flip the switch without finding silent
+  tenant-data-leak bugs in shipped queries.
+- **Phase 10 Keycloak testcontainers integration test.** New build-tag-
+  gated suite at `internal/auth/oidc/testfixtures/` + `integration_keycloak_test.go`
+  drives the full OIDC flow against a live Keycloak container booted
+  by testcontainers-go. 5-test matrix: discovery + JWKS load, full
+  PKCE auth-code happy path with HTTP form scraping, logout-revokes-
+  session, JWKS rotation, unmapped-groups-fails-closed. Reuses one
+  container across the matrix to amortize the 60-90s boot. Optional
+  Okta smoke test (build-tagged `integration && okta_smoke`) for live
+  tenant validation. New Makefile targets: `make keycloak-integration-test`
+  + `make okta-smoke-test` + `make benchmark-auth-coldcache`.
+- **OpenAPI surface extended.** New `cookieAuth` security scheme
+  (apiKey/cookie/`certctl_session`) alongside the existing
+  `bearerAuth`. 13 new Bundle 2 endpoints across the OIDC + session
+  + group-mapping CRUD surface; 4 break-glass endpoints with
+  surface-invisibility framing. The N-bundle-2-security-empty-preserved
+  CI guard locks the `security: []` opt-out count at ≥ 14 so existing
+  public endpoints stay public.
+- **Bundle-1-only compat regression CI guard.** New
+  `scripts/ci-guards/bundle-1-compat-regression.sh` asserts the
+  load-bearing invariants that protect the Bundle-1-only-deploy
+  case (session middleware defers-to-next, CSRF passthrough on
+  missing session row, ChainAuthSessionThenBearer wired, public
+  OIDC routes in AuthExempt allowlist, AuthInfo guards on
+  OIDCProvidersResolver != nil). Sibling
+  `bundle-1-to-2-upgrade-regression.sh` asserts the upgrade-path
+  invariants (migrations 000034..000038 are CREATE TABLE IF NOT EXISTS
+  + BEGIN/COMMIT-wrapped + no DROP TABLE / ALTER...DROP COLUMN
+  against 19 protected Bundle-1 tables + ON CONFLICT DO NOTHING on
+  permission seed).
+
+Migration ordering, idempotency, and downgrade are documented in
+[`docs/migration/api-keys-to-rbac.md`](docs/migration/api-keys-to-rbac.md)
+(API-key → RBAC, Bundle 1) and [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md)
+(API-key → OIDC, Bundle 2). The threat model lives at
+[`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md).
+Day-2 RBAC operations live at [`docs/operator/rbac.md`](docs/operator/rbac.md).
+RFC + CWE evidence at [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md).
+
+## v2.0.68 - Image registry path changed ⚠️
+
+> **Image registry path changed.** Starting this release, container images publish to `ghcr.io/certctl-io/certctl-server` and `ghcr.io/certctl-io/certctl-agent`. Existing pulls from `ghcr.io/shankar0123/certctl-{server,agent}:<tag>` continue to work for previously-published tags (the registry never deletes images), but the `:latest` tag at the old path stops moving forward at this release. Update your `docker pull` paths, `docker-compose.yml` `image:` keys, or Helm `image.repository` values to receive future updates. Old `git clone` / `git push` / install-script / API URLs continue to redirect forever - only the container-registry path changed.

 This is the only operator-action-required change in v2.0.68. Other changes in this release are cosmetic URL refreshes after the GitHub-org transfer from `shankar0123/certctl` to `certctl-io/certctl` (HTTP redirects mean no other operator action is required) plus an internal contextcheck lint fix in the agent. Full commit list is on the [GitHub release page](https://github.com/certctl-io/certctl/releases/tag/v2.0.68).

@@ -13,18 +776,18 @@ notes are auto-generated from commit messages between consecutive tags.

 **Where to find what changed in a given release:**

- **[GitHub Releases](https://github.com/certctl-io/certctl/releases)** — every
+- **[GitHub Releases](https://github.com/certctl-io/certctl/releases)** - every
  tag has an auto-generated "What's Changed" section pulled from the commits
  between that tag and the previous one, plus per-release supply-chain
  verification instructions (Cosign / SLSA / SBOM).
- **`git log <prev-tag>..<this-tag> --oneline`** — same content, locally.
+- **`git log <prev-tag>..<this-tag> --oneline`** - same content, locally.

 **Why no hand-edited CHANGELOG.md:**

 certctl is solo-developed and pushes directly to master. Maintaining a
 hand-edited CHANGELOG meant the file drifted (entries piled into
 `[unreleased]` and never got promoted to per-version sections when tags were
-cut). A stale CHANGELOG is worse than no CHANGELOG — it signals abandoned
+cut). A stale CHANGELOG is worse than no CHANGELOG - it signals abandoned
 maintenance to security-conscious operators doing diligence.

 The auto-generated release notes work here because commit messages follow a
@@ -63,7 +63,7 @@ RUN for i in 1 2 3; do \
    npm run build

 # Stage 2: Build Go binary
-FROM golang:1.25-alpine@sha256:5caaf1cca9dc351e13deafbc3879fd4754801acba8653fa9540cea125d01a71f AS builder
+FROM golang:1.25.10-alpine@sha256:8d22e29d960bc50cd025d93d5b7c7d220b1ee9aa7a239b3c8f55a57e987e8d45 AS builder

 # Proxy propagation (M-4, Issue #9) — see Stage 1 rationale.
 ARG HTTP_PROXY=
@@ -5,7 +5,7 @@
 # operator runbook; the pins here MUST be bumped in the same pass.

 # Stage 1: Build
-FROM golang:1.25-alpine@sha256:5caaf1cca9dc351e13deafbc3879fd4754801acba8653fa9540cea125d01a71f AS builder
+FROM golang:1.25.10-alpine@sha256:8d22e29d960bc50cd025d93d5b7c7d220b1ee9aa7a239b3c8f55a57e987e8d45 AS builder

 # Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
 # behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
@@ -2,9 +2,9 @@ Business Source License 1.1

 Parameters

-Licensor:             Shankar Kambam
+Licensor:             certctl LLC
 Licensed Work:        certctl
-                      The Licensed Work is © 2026 Shankar Kambam.
+                      The Licensed Work is © 2026 certctl LLC.

 Additional Use Grant: You may make use of the Licensed Work, including in
                      production for your internal business operations and
@@ -12,15 +12,23 @@ Additional Use Grant: You may make use of the Licensed Work, including in
                      your own customers, provided that you may not offer
                      the Licensed Work as a Commercial Certificate Service.

-                      A "Commercial Certificate Service" is a product or
-                      service whose principal value to a third party is the
+                      A "Commercial Certificate Service" is any product
+                      or service that provides third parties with access
+                      to or control of any substantial set of the
                      certificate management functionality of the Licensed
                      Work — including but not limited to lifecycle
                      management, discovery, monitoring, alerting, renewal
-                      automation, deployment, and revocation — where the
-                      third party accesses or controls that functionality
-                      and compensation is received for that access or
-                      control.
+                      automation, deployment, revocation, certificate
+                      authority operation, certificate issuance,
+                      certificate signing, or any combination thereof —
+                      where compensation, in any form, is received in
+                      connection with such access or control. This
+                      restriction applies irrespective of whether such
+                      functionality is the principal, ancillary,
+                      supporting, or one of several values provided by the
+                      product or service, and irrespective of whether the
+                      Licensed Work is presented under its original name,
+                      a modified name, or no name at all.

                      For the avoidance of doubt:

@@ -36,12 +44,17 @@ Additional Use Grant: You may make use of the Licensed Work, including in

                      (b) for the purposes of this Additional Use Grant,
                          "third party" excludes (i) your employees, (ii)
-                          your contractors acting on your behalf, and (iii)
-                          your Affiliates. "Affiliate" means any entity
-                          that controls, is controlled by, or is under
-                          common control with, you, where "control" means
-                          ownership of more than fifty percent (50%) of
-                          the voting interests of the entity;
+                          your contractors acting on your behalf, and
+                          (iii) your Affiliates. "Affiliate" means any
+                          entity that (1) directly or indirectly controls
+                          you, (2) is directly or indirectly controlled by
+                          you, or (3) is directly or indirectly under
+                          common control with you, where "control" means
+                          either (A) ownership of more than fifty percent
+                          (50%) of the voting interests of the entity, or
+                          (B) the power to direct the management and
+                          policies of the entity, whether through voting
+                          securities, contract, or otherwise;

                      (c) the restriction on offering a Commercial
                          Certificate Service applies regardless of whether
@@ -67,16 +80,34 @@ works, redistribute, and make non-production use of the Licensed Work. The
 Licensor may make an Additional Use Grant, above, permitting limited production
 use.

-Effective on the Change Date, or the fourth anniversary of the first publicly
-available distribution of a specific version of the Licensed Work under this
-License, whichever comes first, the Licensor hereby grants you rights under
+Effective on the Change Date, the Licensor hereby grants you rights under
 the terms of the Change License, and the rights granted in the paragraph
 above terminate.

 If your use of the Licensed Work does not comply with the requirements
 currently in effect as described in this License, you must purchase a
 commercial license from the Licensor, its affiliated entities, or authorized
-resellers, or you must refrain from using the Licensed Work.
+resellers, or you must refrain from using the Licensed Work. Rights granted
+under any commercial license from the Licensor are personal to the licensee
+and may not be sublicensed, transferred, assigned, or resold to any third
+party without the Licensor's prior written consent. Any attempted sublicense,
+transfer, assignment, or resale in violation of this provision is void.
+
+Restricted Activities. Notwithstanding any other provision of this License,
+you may not:
+
+  (i)   provide the Licensed Work or substantially similar functionality
+        to third parties as a hosted, managed, embedded, bundled, or
+        integrated service, except as expressly permitted in the
+        Additional Use Grant;
+
+  (ii)  move, change, disable, circumvent, or work around any license,
+        security, attribution, audit-trail, or feature-gating
+        functionality contained in the Licensed Work; or
+
+  (iii) alter or remove any license, copyright, attribution, trademark,
+        or other notice from the Licensed Work, its derivatives, or any
+        substantial portion thereof.

 All copies of the original and modified Licensed Work, and derivative works
 of the Licensed Work, are subject to this License. This License applies
@@ -110,8 +141,12 @@ the Licensor or to any repository hosting the Licensed Work is provided at
 the submitter's sole risk, confers no rights or obligations on the
 Licensor, and is not incorporated into the Licensed Work.

-This License does not grant you any right in any trademark or logo of the
-Licensor or its Affiliates.
+Trademark and naming. This License does not grant you any right in any
+trademark, service mark, trade name, or logo of the Licensor or its
+Affiliates. Forks, derivative works, and modifications of the Licensed Work
+must not use the name "certctl," any name confusingly similar to "certctl,"
+or any Licensor trademark in their distributed form, marketing materials,
+package metadata, or service offerings.

 Governing law and venue. This License shall be governed by and construed in
 accordance with the laws of the State of Florida, USA, without giving
@@ -1,4 +1,4 @@
-.PHONY: help build run test lint verify verify-docs verify-deploy loadtest acme-cert-manager-test acme-rfc-conformance-test clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build qa-stats
+.PHONY: help build run test lint verify verify-deploy loadtest loadtest-scale loadtest-scale-bulk loadtest-scale-acme loadtest-scale-agent acme-cert-manager-test acme-rfc-conformance-test keycloak-integration-test okta-smoke-test benchmark-auth benchmark-auth-coldcache clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build e2e-test qa-stats

 # Default target - show help
 help:
@@ -16,7 +16,6 @@ help:
 	@echo "  make lint           Run linter (golangci-lint)"
 	@echo "  make fmt            Format code with gofmt"
 	@echo "  make verify         Pre-commit gate: fmt + vet + lint + test (CI-parity)"
-	@echo "  make verify-docs    Pre-tag gate:    QA-doc drift checks (operator-facing docs)"
 	@echo "  make verify-deploy  Pre-push gate:   digest validity + OpenAPI parity + docker build smoke"
 	@echo "  make loadtest       k6 throughput run against postgres + certctl (NOT in verify; manual + cron only)"
 	@echo ""
@@ -119,20 +118,6 @@ verify:
 	@echo ""
 	@echo "verify: PASS — safe to commit"

-# verify-docs: pre-tag gate. Runs the QA-doc Part-count + seed-count
-# drift guards that ci-pipeline-cleanup Phase 11 / frozen decision 0.13
-# moved out of CI (was per-push blocking; now operator-runs pre-tag).
-# These guards protect docs/qa-test-guide.md headlines from drifting
-# vs the underlying source-of-truth (testing-guide Part count, seed
-# row count). Operator-facing docs only — not product-affecting.
-verify-docs:
-	@echo "==> QA-doc Part-count drift"
-	@bash scripts/qa-doc-part-count.sh
-	@echo "==> QA-doc seed-count drift"
-	@bash scripts/qa-doc-seed-count.sh
-	@echo ""
-	@echo "verify-docs: PASS — safe to tag"
-
 # verify-deploy: optional pre-push gate. Runs the digest-validity check,
 # the OpenAPI ↔ handler parity check, and a Docker build smoke for the
 # production images (server + agent only — fast subset for local; CI
@@ -168,6 +153,97 @@ loadtest:
 	@echo "==> results landed in deploy/test/loadtest/results/"
 	@if [ -f deploy/test/loadtest/results/summary.txt ]; then cat deploy/test/loadtest/results/summary.txt; fi

+# Phase 8 SCALE-H2 — scale-tier load tests. Profile-gated in the
+# loadtest compose so the default `make loadtest` stays fast and
+# focused on the per-PR regression scope (API tier + connector tier).
+#
+# loadtest-scale-bulk runs the 10K-cert bulk-renew scenario.
+# loadtest-scale-acme runs the 200-VU ACME directory/nonce/ARI burst.
+# loadtest-scale-agent runs the 5K-agent heartbeat storm.
+#
+# Each target uses --exit-code-from <scenario-driver> so a threshold
+# breach surfaces as a non-zero make exit. The scale-seed init runs
+# once per invocation (idempotent via ON CONFLICT) so re-running a
+# target against the same compose stack is fine.
+loadtest-scale-bulk:
+	@echo "==> Phase 8 SCALE-H2: bulk-renewal scenario (10K cert fixture, ~6m)"
+	@cd deploy/test/loadtest && docker compose --profile scale up --build \
+	  --abort-on-container-exit --exit-code-from k6-scale-bulk
+	@echo ""
+	@echo "==> results: deploy/test/loadtest/results/summary-bulk-renewal.{json,txt}"
+	@if [ -f deploy/test/loadtest/results/summary-bulk-renewal.txt ]; then \
+	  cat deploy/test/loadtest/results/summary-bulk-renewal.txt; fi
+
+loadtest-scale-acme:
+	@echo "==> Phase 8 SCALE-H2: ACME enrollment burst (200 VU, ~6m)"
+	@cd deploy/test/loadtest && docker compose --profile scale up --build \
+	  --abort-on-container-exit --exit-code-from k6-scale-acme
+	@echo ""
+	@echo "==> results: deploy/test/loadtest/results/summary-acme-burst.{json,txt}"
+	@if [ -f deploy/test/loadtest/results/summary-acme-burst.txt ]; then \
+	  cat deploy/test/loadtest/results/summary-acme-burst.txt; fi
+
+loadtest-scale-agent:
+	@echo "==> Phase 8 SCALE-H2: agent heartbeat storm (5K agent fixture, ~6m)"
+	@cd deploy/test/loadtest && docker compose --profile scale up --build \
+	  --abort-on-container-exit --exit-code-from k6-scale-agent
+	@echo ""
+	@echo "==> results: deploy/test/loadtest/results/summary-agent-storm.{json,txt}"
+	@if [ -f deploy/test/loadtest/results/summary-agent-storm.txt ]; then \
+	  cat deploy/test/loadtest/results/summary-agent-storm.txt; fi
+
+# All three Phase 8 scenarios serially. Use the matrix in
+# .github/workflows/loadtest.yml for parallel CI runs.
+loadtest-scale: loadtest-scale-bulk loadtest-scale-acme loadtest-scale-agent
+
+# Auth Bundle 2 Phase 10 — Keycloak end-to-end OIDC integration test.
+# Boots a Keycloak container via testcontainers-go (quay.io/keycloak:25.0),
+# imports a canned realm with two groups + two users, and drives the
+# full OIDC flow against the certctl service: discovery + JWKS,
+# auth-code login, group-claim parsing, group-role mapping, session
+# mint, and JWKS rotation.
+#
+# Build-tag-gated under `integration` so `make verify` (which runs
+# go test -short) NEVER pulls in the 60-90s Keycloak boot. Requires a
+# local Docker daemon. Skips cleanly with t.Skip() when -short is set.
+keycloak-integration-test:
+	@echo "==> running Keycloak OIDC integration test (requires Docker)"
+	@go test -tags=integration -count=1 -timeout=10m \
+	  ./internal/auth/oidc/...
+
+# Auth Bundle 2 Phase 10 — optional Okta smoke test. Gated behind TWO
+# build tags (integration + okta_smoke) so it only runs when invoked
+# manually against the operator's own Okta dev tenant. Requires the
+# OKTA_ISSUER + OKTA_CLIENT_ID + OKTA_CLIENT_SECRET env vars; the test
+# t.Skip's with a clear message when any are missing. Documented in
+# internal/auth/oidc/integration_okta_smoke_test.go.
+okta-smoke-test:
+	@echo "==> running Okta smoke test (requires OKTA_ISSUER / _CLIENT_ID / _CLIENT_SECRET env vars)"
+	@go test -tags='integration okta_smoke' -count=1 -timeout=2m \
+	  ./internal/auth/oidc/...
+
+# Auth Bundle 2 Phase 14 — auth performance benchmarks. Three default-
+# tag benchmarks (session steady-state + session cold-process + oidc
+# steady-state) producing p50/p95/p99/max numbers per the auth-
+# benchmarks.md operator-doc table.
+benchmark-auth:
+	@echo "==> running auth performance benchmarks (session + oidc steady-state)"
+	@go test -bench='BenchmarkSession_|BenchmarkOIDC_SteadyState' -benchmem \
+	  -benchtime=2000x -run='^$$' \
+	  ./internal/auth/session/ ./internal/auth/oidc/
+
+# Auth Bundle 2 Phase 14 — OIDC cold-cache benchmark against a live
+# Keycloak container (requires Docker). Build-tag-gated so the
+# default-tag benchmarks above never pull in the 60-90s container
+# boot. Runs the integration test FIRST to populate the
+# sharedKeycloak fixture, then runs the benchmark.
+benchmark-auth-coldcache:
+	@echo "==> running OIDC cold-cache benchmark against live Keycloak (requires Docker)"
+	@go test -tags integration -count=1 -timeout=10m \
+	  -run TestKeycloakIntegration_RefreshKeysFetchesDiscoveryAndJWKS \
+	  -bench BenchmarkOIDC_ColdCache -benchmem -benchtime=10x \
+	  ./internal/auth/oidc/
+
 # Phase 5 — kind-driven cert-manager integration test. Requires
 # `kind`, `kubectl`, `helm`, and a local Docker daemon. Sets
 # KIND_AVAILABLE=1 so the test runs (it skips cleanly when unset, which
@@ -262,10 +338,23 @@ frontend-build:
 	cd web && npm ci && npx vite build
 	@echo "Frontend build complete"

-# QA Suite Stats — Bundle P / Strengthening #8.
-# Single source-of-truth for every count claim in docs/qa-test-guide.md +
-# docs/testing-guide.md. The Strengthening #6 CI drift guards consume the
-# same numbers, eliminating the doc-drift class structurally.
+# Phase 3 TEST-M3 closure (2026-05-13): browser-driven E2E smoke
+# target. The full 15-flow suite from web/src/__tests__/e2e/README.md
+# ships in frontend-design-audit Phase 8; this target is the harness
+# wiring that lets `make e2e-test` work today.
+#
+# First-time setup: `cd web && npm install && npx playwright install --with-deps chromium`.
+# The webServer block in web/playwright.config.ts boots `npm run dev`
+# automatically; no separate `make docker-up` needed.
+e2e-test:
+	@echo "Running Playwright E2E (smoke + any *.spec.ts under web/src/__tests__/e2e/)..."
+	cd web && npx playwright test
+	@echo "E2E run complete"
+
+# qa-stats: snapshot of the test-suite size at the current commit.
+# Backend Go tests + subtests + fuzz targets + skipped sites, plus the
+# seed-data counts in migrations/seed_demo.sql. Useful before a release
+# to spot-check that no whole layer dropped off.
 qa-stats:
 	@echo "=== certctl QA Suite Stats ==="
 	@echo "Date: $$(date +%Y-%m-%d)"
@@ -278,9 +367,8 @@ qa-stats:
 	@echo "Fuzz targets: $$(grep -rE 'func Fuzz[A-Z]' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
 	@echo "t.Skip sites: $$(grep -rE 't\.Skip(Now|f)?\(' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
 	@echo "qa_test.go Part_ subtests: $$(grep -cE 't\.Run\(\"Part[0-9]+_' deploy/test/qa_test.go 2>/dev/null || echo 0)"
-	@echo "testing-guide.md Parts: $$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md 2>/dev/null || echo 0)"
 	@echo "Seed unique mc-* IDs:  $$(grep -oE "mc-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
-	@echo "Seed unique ag-* IDs:  $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 12)"
+	@echo "Seed unique ag-* IDs:  $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 13 incl. agent-demo-1 + 3 cloud sentinels + server-scanner)"
 	@echo "Seed unique iss-* IDs: $$(grep -oE "iss-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (issuers table count is 13)"
 	@echo "Seed unique tgt-* IDs: $$(grep -oE "tgt-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
 	@echo "Seed unique nst-* IDs: $$(grep -oE "nst-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
@@ -0,0 +1,18 @@
+certctl
+Copyright 2026 certctl LLC.
+
+This product is distributed under the Business Source License 1.1.
+See LICENSE at the repository root for the full license text and
+the Additional Use Grant carve-outs.
+
+This product links third-party Go modules and JavaScript packages
+whose own license terms apply to those components. The full
+inventory of third-party dependencies and their respective licenses
+is enumerated in THIRD_PARTY_NOTICES.md at the repository root.
+
+Effective March 14, 2076, the BSL 1.1 license converts to the
+Apache License 2.0 per the Change Date in LICENSE.
+
+For inquiries about commercial licensing terms outside the
+Additional Use Grant — including the Commercial Certificate
+Service restriction — contact certctl@proton.me.
@@ -9,138 +9,36 @@
 [![GitHub Release](https://img.shields.io/github/v/release/certctl-io/certctl)](https://github.com/certctl-io/certctl/releases)
 [![GitHub Stars](https://img.shields.io/github/stars/certctl-io/certctl?style=flat&logo=github)](https://github.com/certctl-io/certctl/stargazers)

-TLS certificate lifespans are shrinking fast. The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) unanimously in April 2025, setting a phased reduction: **200 days** by March 2026, **100 days** by March 2027, and **47 days** by March 2029. Organizations managing dozens or hundreds of certificates can no longer rely on spreadsheets, calendar reminders, or manual renewal workflows. The math doesn't work — at 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever.
+certctl is a self-hosted platform that automates the entire TLS certificate lifecycle, from issuance through renewal to deployment, with zero human intervention. Twelve native CA connectors plus an OpenSSL / shell-script adapter for custom CAs; fifteen native deployment-target connectors plus a proxy-agent pattern for network appliances and agentless targets. Private keys stay on your infrastructure where they belong. Free, source-available under BSL 1.1, covers the same lifecycle that enterprise platforms charge $100K+/year for.

-certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. It's free, self-hosted, and covers the same lifecycle that enterprise platforms charge $100K+/year for.
+The CA/Browser Forum's [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) caps public TLS certificates at **200 days by March 2026**, **100 days by 2027**, and **47 days by 2029**. At 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever. Manual workflows stop being a choice.

-```mermaid
-gantt
-    title TLS Certificate Maximum Lifespan — CA/Browser Forum Ballot SC-081v3
-    dateFormat YYYY-MM-DD
-    axisFormat
-    todayMarker off
-    section 2015
-        5 years (1825 days)    :done, 2020-01-01, 1825d
-    section 2018
-        825 days               :done, 2020-01-01, 825d
-    section 2020
-        398 days               :active, 2020-01-01, 398d
-    section 2026
-        200 days               :crit, 2020-01-01, 200d
-    section 2027
-        100 days               :crit, 2020-01-01, 100d
-    section 2029
-        47 days                :crit, 2020-01-01, 47d
-```
+> **Status: Early-access — actively looking for design partners.**

-> **Actively maintained — shipping weekly.** Found something? [Open a GitHub issue](https://github.com/certctl-io/certctl/issues) — issues get triaged same-day. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
+> The certificate lifecycle core is production-quality today: Local CA, ACME, agent deployment, audit, [role-based access control](docs/operator/rbac.md) with auditor split and four-eyes approval. v2.1.0 adds federated identity on top — [OIDC SSO](docs/operator/oidc-runbooks/index.md), server-side sessions, back-channel logout, and a break-glass admin path for SSO-outage recovery.

-**Ready to try it?** Jump to the [Quick Start](#quick-start) — you'll have a running dashboard in under 5 minutes.
+> If your team runs PKI infrastructure that could use real automation, we'd love to have you on certctl. Lab and dev deployments are great. Production is welcome too — especially on the federated-identity surface, where real-world IdP shapes are exactly the exposure we can't manufacture in CI. Battle-testing certctl in your environment is genuinely valuable to us.
+
+> [File issues](https://github.com/certctl-io/certctl/issues) liberally. Every IdP quirk, every connector edge, every doc gap you hit — that's how the platform earns the right to drop the "early-access" label. The faster the loop, the faster everyone benefits.
+
+> **Actively maintained, shipping weekly.** [Open an issue](https://github.com/certctl-io/certctl/issues) if something breaks. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
+
+**Ready to try it?** Jump to the [Quick Start](#quick-start). For the marketing site, see [certctl.io](https://certctl.io).

 ## Documentation

-| Guide | Description |
-|-------|-------------|
-| [Why certctl?](docs/why-certctl.md) | How certctl compares to ACME clients, agent-based SaaS, and enterprise platforms |
-| [Concepts](docs/concepts.md) | TLS certificates explained from scratch — for beginners who know nothing about certs |
-| [Quick Start](docs/quickstart.md) | 5-minute setup — dashboard, API, CLI, discovery, stakeholder demo flow |
-| [Docker Compose Environments](deploy/ENVIRONMENTS.md) | Service-by-service walkthrough of all 4 compose files, env var reference |
-| [Deployment Examples](docs/examples.md) | 5 turnkey scenarios (ACME+NGINX, wildcard DNS-01, private CA, step-ca, multi-issuer) with migration guides |
-| [Advanced Demo](docs/demo-advanced.md) | Issue a certificate end-to-end with technical deep-dives |
-| [Architecture](docs/architecture.md) | System design, data flow diagrams, security model |
-| [Feature Inventory](docs/features.md) | Complete reference of all capabilities, API endpoints, and configuration |
-| [Connector Reference](docs/connectors.md) | Configuration for all issuer, target, and notifier connectors |
-| [MCP Server](docs/mcp.md) | AI integration via Model Context Protocol — setup, available tools, examples |
-| [OpenAPI 3.1 Spec](docs/openapi.md) | API reference guide with endpoint overview ([raw spec](api/openapi.yaml)) |
-| [Compliance Mapping](docs/compliance.md) | SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 alignment guides |
-| [Migrate from certbot](docs/migrate-from-certbot.md) | Step-by-step migration from certbot cron jobs to certctl |
-| [Migrate from acme.sh](docs/migrate-from-acmesh.md) | Migration guide for acme.sh users, DNS hook compatibility |
-| [certctl for cert-manager users](docs/certctl-for-cert-manager-users.md) | How certctl complements cert-manager for mixed infrastructure |
-| [Test Environment](docs/test-env.md) | Docker Compose test environment with real CA backends |
-| [Testing Guide](docs/testing-guide.md) | Comprehensive test procedures, smoke tests, and release sign-off checklist |
+The full audience-organized index lives at [`docs/README.md`](docs/README.md). Top-level entry points:

-## Supported Integrations
+| Audience | Start here |
+|---|---|
+| New to certctl | [Concepts](docs/getting-started/concepts.md) → [Quickstart](docs/getting-started/quickstart.md) → [Examples](docs/getting-started/examples.md) |
+| Production operator | [Architecture](docs/reference/architecture.md) → [Security posture](docs/operator/security.md) → [Disaster recovery runbook](docs/operator/runbooks/disaster-recovery.md) |
+| PKI engineer | [ACME server](docs/reference/protocols/acme-server.md) → [SCEP server](docs/reference/protocols/scep-server.md) → [EST server](docs/reference/protocols/est.md) → [CA hierarchy](docs/reference/intermediate-ca-hierarchy.md) |
+| Migrating from another tool | [from certbot](docs/migration/from-certbot.md) / [from acme.sh](docs/migration/from-acmesh.md) / [cert-manager coexistence](docs/migration/cert-manager-coexistence.md) |

-### Certificate Issuers
+For the connector reference (12 issuers, 15 targets, 6 notifiers) see [`docs/reference/connectors/index.md`](docs/reference/connectors/index.md).

-| Issuer | Type | Notes |
-|--------|------|-------|
-| Local CA (self-signed + sub-CA) | `GenericCA` | Sub-CA mode chains to enterprise root (ADCS, etc.) |
-| ACME v2 (Let's Encrypt, ZeroSSL, etc.) | `ACME` | HTTP-01, DNS-01, DNS-PERSIST-01 challenges. EAB auto-fetch from ZeroSSL. Profile selection (`tlsserver`, `shortlived`). |
-| step-ca (Smallstep) | `StepCA` | JWK provisioner auth, issuance + renewal + revocation |
-| OpenSSL / Custom CA | `OpenSSL` | Shell script adapter — any CA with a CLI |
-| HashiCorp Vault PKI | `VaultPKI` | Token auth, synchronous issuance, CRL/OCSP delegated to Vault |
-| DigiCert CertCentral | `DigiCert` | Async order model, OV/EV support, PEM bundle parsing |
-| Sectigo SCM | `Sectigo` | 3-header auth, DV/OV/EV, collect-not-ready graceful handling |
-| Google Cloud CAS | `GoogleCAS` | OAuth2 service account, synchronous issuance, CA pool selection |
-| AWS ACM Private CA | `AWSACMPCA` | Synchronous issuance, configurable signing algorithm/template ARN |
-| Entrust Certificate Services | `Entrust` | mTLS client certificate auth, synchronous/approval-pending issuance |
-| GlobalSign Atlas HVCA | `GlobalSign` | mTLS + API key/secret dual auth, serial-based tracking |
-| EJBCA (Keyfactor) | `EJBCA` | Dual auth (mTLS or OAuth2), self-hosted open-source CA |
-
-**Note:** ADCS integration is handled via the Local CA's sub-CA mode — certctl operates as a subordinate CA with its signing certificate issued by ADCS. Any CA with a shell-accessible signing interface can be integrated via the OpenSSL/Custom CA connector.
-
-### Deployment Targets
-
-| Target | Type | Notes |
-|--------|------|-------|
-| NGINX | `NGINX` | Atomic write + `nginx -t` validate + `nginx -s reload` + post-deploy TLS verify + rollback (deploy-hardening I) |
-| Apache httpd | `Apache` | Atomic write + `apachectl configtest` + graceful reload + post-deploy TLS verify + rollback |
-| HAProxy | `HAProxy` | Combined PEM atomic write + `haproxy -c -f` validate + `systemctl reload` + post-deploy TLS verify + rollback |
-| Traefik | `Traefik` | Atomic write + post-deploy TLS verify + rollback (file watcher auto-reloads) |
-| Caddy | `Caddy` | Atomic write (file mode) or `POST /load` (api mode) + admin API ValidateOnly probe |
-| Envoy | `Envoy` | Atomic write + SDS file watcher auto-reload |
-| Postfix | `Postfix` | Atomic write + `postfix check` + `postfix reload` + post-deploy TLS verify + rollback |
-| Dovecot | `Dovecot` | Atomic write + `doveconf -n` + `doveadm reload` + post-deploy TLS verify + rollback |
-| Microsoft IIS | `IIS` | Local PowerShell or remote WinRM, PEM→PFX, SNI support, explicit pre-deploy backup + post-rollback re-import |
-| F5 BIG-IP | `F5` | iControl REST via proxy agent, transaction-based atomic updates + post-deploy TLS verify on Virtual Server |
-| SSH (Agentless) | `SSH` | SFTP cert/key deployment + pre-deploy SCP backup + tls.Dial post-verify |
-| Windows Certificate Store | `WinCertStore` | PowerShell Import-PfxCertificate + Get-ChildItem snapshot for rollback |
-| Java Keystore | `JavaKeystore` | PEM→PKCS#12→keytool pipeline + keytool snapshot for rollback |
-| Kubernetes Secrets | `KubernetesSecrets` | `kubernetes.io/tls` Secrets, atomic API + SHA-256 verify + kubelet sync poll |
-
-**Deploy-hardening I** (post-2026-04-30 master bundle): every connector now goes through `internal/deploy.Apply` for atomic-write + ownership-preservation + SHA-256 idempotency + per-target-type Prometheus counters (`certctl_deploy_*_total`). See [`docs/deployment-atomicity.md`](docs/deployment-atomicity.md) for the operator guide.
-
-### Enrollment Protocols
-
-| Protocol | Standard | Use Case |
-|----------|----------|----------|
-| **EST (production-grade)** | RFC 7030 + RFC 9266 channel binding | Native EST server hardened for enterprise WiFi/802.1X, IoT bootstrap, and corporate device enrollment (post-2026-04-29 hardening master bundle). All six RFC 7030 endpoints — `cacerts` / `simpleenroll` / `simplereenroll` / `csrattrs` (profile-driven) / `serverkeygen` (CMS EnvelopedData wire format). Multi-profile dispatch (`/.well-known/est/<pathID>/`). Per-profile auth modes: mTLS sibling route at `/.well-known/est-mtls/<pathID>/`, HTTP Basic enrollment-password (constant-time compare + per-source-IP failed-auth limiter), RFC 9266 `tls-exporter` channel binding (TLS 1.3, opt-in per profile). Per-(CN, sourceIP) sliding-window rate limit. EST-source-scoped bulk revoke (`POST /api/v1/est/certificates/bulk-revoke`, M-008 admin-gated). Tabbed admin GUI at `/est` (Profiles / Recent Activity / Trust Bundle). `SIGHUP`-equivalent trust-bundle reload. libest reference-client interop tested in CI (`deploy/test/libest/Dockerfile` + `deploy/test/est_e2e_test.go`). Typed audit-action codes per failure dimension (`est_simple_enroll_success`/`_failed`, `est_auth_failed_basic`/`_mtls`/`_channel_binding`, `est_rate_limited`, `est_csr_policy_violation`, `est_bulk_revoke`, `est_trust_anchor_reloaded`, etc. — full set in `internal/service/est_audit_actions.go`). CLI + matching MCP tool family (rebuild count via `grep -cE '"est_' internal/mcp/tools_est.go`). See [`docs/est.md`](docs/est.md) for the operator guide — WiFi/802.1X + FreeRADIUS recipe, IoT bootstrap, troubleshooting matrix per audit-action code. |
-| SCEP (Simple Certificate Enrollment Protocol) | RFC 8894 | MDM platforms (Jamf, Intune), network devices, ChromeOS. Full RFC 8894 wire format: EnvelopedData decryption, signerInfo POPO verification, CertRep PKIMessage builder; PKCSReq + RenewalReq + GetCertInitial messageType dispatch; multi-profile dispatch (`/scep/<pathID>`); per-profile RA cert + key. Lightweight raw-CSR clients keep working via the legacy MVP fall-through path. |
-| **Microsoft Intune SCEP fleet (drop-in NDES replacement)** | RFC 8894 + Intune Connector signed-challenge dispatcher | Per-profile Intune dispatcher validates the Connector's signed challenge against an operator-supplied trust anchor; binds device claim to CSR (set-equality on CN + SAN-DNS/RFC822/UPN); replay cache + per-device rate limit; `SIGHUP`-reloadable trust pool; admin GUI **SCEP Administration** page at `/scep` (Profiles tab with per-profile RA cert expiry + mTLS status, Intune Monitoring tab with per-status counters + reload, Recent Activity tab with full SCEP audit log filter). See [`docs/scep-intune.md`](docs/scep-intune.md) for the migration playbook + Microsoft support statement. |
-| ACME v2 | RFC 8555 | Public CA automated issuance (Let's Encrypt, ZeroSSL) |
-| ACME ARI (Renewal Information) | RFC 9773 | CA-directed renewal timing — the CA tells you when to renew |
-
-### Standards & Revocation
-
-| Capability | Standard | Notes |
-|------------|----------|-------|
-| DER-encoded X.509 CRL | RFC 5280 + RFC 7232 caching | Per-issuer, signed by issuing CA, 24h validity. Pre-generated by the scheduler (`CERTCTL_CRL_GENERATION_INTERVAL`, default 1h) and cached in `crl_cache` so HTTP fetches do not rebuild per request. **Production hardening II:** weak-form `ETag` (W/"<sha256-prefix>") + `Cache-Control: public, max-age=3600, must-revalidate` + `If-None-Match` HTTP 304 short-circuit on `GET /.well-known/pki/crl/{issuer_id}` — CDNs and reverse proxies serve repeated fetches from edge cache. |
-| CRL DistributionPoints auto-injection | RFC 5280 §4.2.1.13 | **Production hardening II.** Local issuer config field `CRLDistributionPointURLs []string` — when set, every issued cert carries the `id-ce-cRLDistributionPoints` extension pointing at certctl's own CRL endpoint. Refusing to silently inject an empty CDP is deliberate (silent-empty fails relying-party validation worse than no CDP). |
-| Embedded OCSP responder | RFC 6960 + §4.4.1 nonce echo | GET + POST forms (`POST /.well-known/pki/ocsp/{issuer_id}` per §A.1.1). Signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6) carrying `id-pkix-ocsp-nocheck` (§4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Responder cert auto-rotates within 7d of expiry. **Production hardening II:** RFC 6960 §4.4.1 nonce extension echoed in the response (defends against replay attacks); empty/oversized (>32 bytes per CA/B Forum BR §4.10.2) nonces produce the canonical "unauthorized" status (status 6) — never echo malformed bytes. |
-| OCSP pre-signed response cache | — | **Production hardening II.** Per-`(issuer, serial)` pre-signed responses in the new `ocsp_response_cache` table; read-through facade in `CAOperationsSvc.GetOCSPResponseWithNonce` consults the cache for nil-nonce requests. **Load-bearing security wire:** `RevocationSvc.RevokeCertificateWithActor` calls `InvalidateOnRevoke` after a successful revoke so the next OCSP fetch returns the revoked status — no stale-good window. |
-| Per-endpoint rate limits | — | **Production hardening II.** OCSP per-source-IP cap at `CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN` (default 1000/min, zero disables); cert-export per-actor cap at `CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR` (default 50/hr, zero disables). OCSP rate-limit trip returns the canonical "unauthorized" OCSP blob plus `Retry-After: 60`; cert-export trip returns HTTP 429. The OCSP limiter does NOT honor `X-Forwarded-For` (publicly reachable; spoofed headers would bypass the cap). |
-| Cert-export typed audit | — | **Production hardening II.** Typed action constants (`cert_export_pem` / `cert_export_pkcs12` / `cert_export_pem_with_key` reserved / `cert_export_failed`) emitted via split-emit alongside the legacy bare codes for back-compat. Detail map carries `has_private_key` (always false in V2) and `cipher` (`AES-256-CBC-PBE2-SHA256` — pinned so a future dependency upgrade that changes the encoder default surfaces in audit drift review). |
-| Prometheus per-area metrics | OpenMetrics | `GET /api/v1/metrics/prometheus` — production hardening II surfaces `certctl_ocsp_counter_total{label="..."}` per-event series (`request_get`/`_post`, `request_success`/`_invalid`, `nonce_echoed`/`_malformed`, `rate_limited`, `signing_failed`, etc.) wired from the shared counter table that ticks in the cache hot path. CRL / cert-export / EST / SCEP / Intune per-area counters plug in via the same `SetXxxCounters` setter pattern as follow-up commits. |
-| Disaster-recovery runbook | — | **Production hardening II.** [`docs/disaster-recovery.md`](docs/disaster-recovery.md) — 8-section operator-grade runbook: CRL cache recovery, OCSP responder cert recovery, OCSP response cache recovery, CA private-key rotation 9-step playbook, Postgres restore + operator-managed-artifacts list, trust-bundle reload semantics, printable DR checklist. The SOC 2 / PCI procurement-team deliverable. |
-| S/MIME certificates | RFC 8551 | Email protection EKU, adaptive KeyUsage flags (`DigitalSignature \| ContentCommitment` instead of the TLS default `DigitalSignature \| KeyEncipherment`). |
-| Certificate export | — | PEM (JSON/file) and PKCS#12 (cert-only trust-store mode via `pkcs12.Modern` — AES-256-CBC PBE2 with SHA-256 KDF). Key-bearing PKCS#12 export deferred — V2 export is cert-only by design (private keys live on agents, never touch the control plane). |
-| ACME DNS-PERSIST-01 | IETF draft | Standing validation record, no per-renewal DNS updates |
-
-### Notifiers
-
-| Notifier | Type |
-|----------|------|
-| Email (SMTP) | `Email` |
-| Webhooks | `Webhook` |
-| Slack | `Slack` |
-| Microsoft Teams | `Teams` |
-| PagerDuty | `PagerDuty` |
-| OpsGenie | `OpsGenie` |
-
-All connectors are pluggable — build your own by implementing the [connector interface](docs/connectors.md).
-
-### Screenshots
+## Screenshots

 <table>
 <tr>
@@ -148,7 +46,7 @@ All connectors are pluggable — build your own by implementing the [connector i
 <td><a href="docs/screenshots/v2-certificates.png"><img src="docs/screenshots/v2-certificates.png" width="400" alt="Certificates"></a><br><b>Certificates</b><br><sub>Inventory with bulk ops, status filters, owner/team columns</sub></td>
 </tr>
 <tr>
-<td><a href="docs/screenshots/v2-issuers.png"><img src="docs/screenshots/v2-issuers.png" width="400" alt="Issuers"></a><br><b>Issuers</b><br><sub>Catalog with 10 CA types, GUI config, test connection</sub></td>
+<td><a href="docs/screenshots/v2-issuers.png"><img src="docs/screenshots/v2-issuers.png" width="400" alt="Issuers"></a><br><b>Issuers</b><br><sub>Catalog with 12 CA types, GUI config, test connection</sub></td>
 <td><a href="docs/screenshots/v2-jobs.png"><img src="docs/screenshots/v2-jobs.png" width="400" alt="Jobs"></a><br><b>Jobs</b><br><sub>Issuance, renewal, deployment queue with approval workflow</sub></td>
 </tr>
 </table>
@@ -157,165 +55,101 @@ All connectors are pluggable — build your own by implementing the [connector i

 ## Why certctl

-Certificate lifecycle tooling falls into two camps: enterprise platforms (Venafi, Keyfactor) that cost six figures and take months to deploy, or single-purpose tools (certbot, cert-manager) that handle one slice of the problem. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, and target-agnostic. If you're running certbot cron jobs, manually renewing certs, or stitching together scripts across mixed infrastructure, certctl replaces all of that.
+Certificate lifecycle tooling has historically split into two camps. Enterprise platforms charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together cron jobs across a fleet, manually renewing certs, or writing custom integration scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.

-Built for **platform engineering and DevOps teams** managing 10–500+ certificates, **security and compliance teams** who need audit trails and policy enforcement for SOC 2, PCI-DSS 4.0, or NIST SP 800-57 ([compliance mapping included](docs/compliance.md)), and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For a detailed comparison, see [Why certctl?](docs/why-certctl.md)
+Built for **platform engineering and DevOps teams** managing 10 to 500+ certificates, **security teams** who need audit trails and policy enforcement, and **small teams without enterprise budgets** who need enterprise-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see [Why certctl?](docs/getting-started/why-certctl.md).

-**Architecture.** Go 1.25 control plane with handler→service→repository layering, PostgreSQL 16 backend (21 tables), and a pull-only deployment model — the server never initiates outbound connections. Agents poll for work. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). Background scheduler runs 7 loops: renewal with ARI integration (1h), job processing (30s), agent health (2m), notifications (1m), short-lived cert expiry (30s), network scanning (6h), certificate digest (24h). See [Architecture Guide](docs/architecture.md) for full system diagrams.
+## What it does

-**Security-first.** Agents generate ECDSA P-256 keys locally — private keys never touch the control plane. API key auth enforced by default with SHA-256 hashing and constant-time comparison. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Atomic idempotency guards on scheduler loops. Issuer and target credentials encrypted at rest with AES-256-GCM. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit.
+certctl handles the full certificate lifecycle in one self-hosted control plane:

-**Key design decisions.** TEXT primary keys — human-readable prefixed IDs (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resources at a glance in logs and queries. Idempotent migrations (`IF NOT EXISTS`, `ON CONFLICT DO NOTHING`) safe for repeated execution. Dynamic configuration via GUI with AES-256-GCM encrypted credential storage and env var backward compatibility. Handlers define their own service interfaces for clean dependency inversion.
+- **Issue and renew** from any CA. Let's Encrypt and any ACME provider, an embedded ACME server you can point cert-manager / certbot / lego at directly, a built-in local CA with sub-CA mode (chains under your enterprise root like ADCS), step-ca, Vault PKI, EJBCA, AWS ACM PCA, Google CAS, DigiCert, Sectigo, GlobalSign, Entrust, plus an OpenSSL / shell-script adapter for anything custom. Twelve native issuer connectors. See the [connector reference](docs/reference/connectors/index.md).
+- **Deploy automatically** to NGINX, Apache, HAProxy, Caddy, Traefik, Envoy, IIS, Windows Cert Store, Java keystore, Kubernetes Secrets, AWS ACM, Azure Key Vault, SSH known-hosts, Postfix + Dovecot, F5 BIG-IP. Fifteen native target connectors. File-based targets share an atomic-write + SHA-256 idempotency + on-failure rollback + per-target Prometheus counters primitive (the `deploy.Apply` path covers 12 of 13 file-based connectors). Cloud / API targets (AWS ACM, Azure Key Vault) use vendor-SDK semantics rather than the file primitive; F5 uses iControl REST transactions; Kubernetes Secrets is preview. For the per-target guarantee matrix, see [`docs/reference/deployment-model.md`](docs/reference/deployment-model.md). The reload / validate commands operators configure for shell-using targets (NGINX, Apache, HAProxy, Postfix, JavaKeystore, SSH) are validated server-side AND agent-side against shell-metacharacter injection before execution (see [`internal/connector/target/configcheck`](internal/connector/target/configcheck)).
+- **Run as an ACME server** so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See [`docs/reference/protocols/acme-server.md`](docs/reference/protocols/acme-server.md).
+- **Run as a SCEP server** for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See [`docs/reference/protocols/scep-server.md`](docs/reference/protocols/scep-server.md).
+- **Run as an EST server** for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See [`docs/reference/protocols/est.md`](docs/reference/protocols/est.md).
+- **Manage multi-level CA hierarchies** with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU `PermittedDNSDomains`, and 2-level internal PKI. See [`docs/reference/intermediate-ca-hierarchy.md`](docs/reference/intermediate-ca-hierarchy.md).
+- **Gate high-stakes issuance** behind two-person-integrity approval. Flag a profile as `RequiresApproval`, the request lands in a queue, a non-requester approves, the scheduler dispatches. Profile-edit changes on approval-tier profiles route through the same gate so the flip-flop bypass is closed. See [`docs/operator/approval-workflow.md`](docs/operator/approval-workflow.md).
+- **Authorize with role-based access control.** Seven default roles (admin, operator, viewer, agent, mcp, cli, auditor) over a fine-grained permission catalogue with global / per-profile / per-issuer scope. Auditor role is read-only on the audit trail (`audit.read` + `audit.export`, nothing else) so a regulator's key cannot read certificates or mutate config. Day-0 admin via a one-shot `CERTCTL_BOOTSTRAP_TOKEN` endpoint that closes itself the moment any admin lands. Privilege-escalation guard requires `auth.role.assign` to grant or revoke a role. See [`docs/operator/rbac.md`](docs/operator/rbac.md), [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md), and the v2.0.x → v2.1.0 [migration guide](docs/migration/api-keys-to-rbac.md).
+- **Sign in with OIDC SSO** against any standards-compliant identity provider. Per-IdP setup runbooks for Keycloak, Authentik, Okta, Auth0, Microsoft Entra ID, and Google Workspace. Group-claim → role mapping for automatic provisioning; client_secret encrypted at rest (AES-256-GCM); JWKS auto-refresh on `kid` miss; PKCE-S256 required; RFC 9700 §4.7.1 pre-login UA/IP binding; RFC 9207 `iss` URL-param check on callback. Server mints HMAC-signed session cookies with the `__Host-` prefix (browser-enforced subdomain-takeover defense), CSRF rotation on every privileged write, and idle + absolute expiry. [RFC OIDC Back-Channel Logout 1.0](docs/reference/auth-standards-implemented.md) revokes sessions on IdP-driven logout. Argon2id break-glass admin path for SSO-outage recovery — disabled by default; 404-invisible to scanners when `CERTCTL_BREAKGLASS_ENABLED=false`. See [`docs/operator/oidc-runbooks/index.md`](docs/operator/oidc-runbooks/index.md) for the per-IdP onboarding guides and [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md) for enabling SSO on an existing deploy.
+- **Discover** existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
+- **Revoke** with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See [`docs/reference/protocols/crl-ocsp.md`](docs/reference/protocols/crl-ocsp.md).
+- **Alert** via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See [`docs/operator/runbooks/expiry-alerts.md`](docs/operator/runbooks/expiry-alerts.md).
+- **Drive the platform from natural language** via the bundled MCP (Model Context Protocol) server. The full REST API is exposed as MCP tools — ask your AI client "show me all expiring certificates", "revoke the VPN cert, key compromised", or "what agents are offline?" and it translates to API calls. Stateless stdio-transport binary at `cmd/mcp-server/`; same auth as the REST API; no extra attack surface. See [`docs/reference/mcp.md`](docs/reference/mcp.md).

-## What It Does
+## Architecture and security

-**Automated lifecycle.** Certificates renew and deploy themselves. The scheduler monitors expiration, issues through your CA, and deploys to targets — zero human intervention. ACME ARI (RFC 9773) lets the CA direct renewal timing. Ready for 47-day (SC-081v3) and 6-day (Let's Encrypt shortlived) certificate lifetimes.
+Go 1.25 control plane with handler → service → repository layering. PostgreSQL 16 backend with idempotent migrations. Pull-only deployment model — the server never initiates outbound connections. Agents poll for work and generate ECDSA P-256 keys locally so private keys never touch the control plane. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). See the [Architecture Guide](docs/reference/architecture.md) for full system diagrams.

-**Operational dashboard.** 26-page GUI covers the entire lifecycle: certificate inventory with bulk ops, deployment timeline with rollback, discovery triage, network scan management, agent fleet health, short-lived credential countdown, approval workflows, and observability metrics. Configure issuers and targets from the dashboard — no env var editing, no server restarts.
-
-**Private keys stay on your servers.** Agents generate ECDSA P-256 keys locally, submit only the CSR. The control plane never touches private keys. After deployment, agents probe the live TLS endpoint and compare SHA-256 fingerprints to confirm the right certificate is actually being served.
-
-**Discovery.** Agents scan filesystems for existing PEM/DER certificates. The network scanner probes TLS endpoints across CIDR ranges without agents. Cloud discovery finds certificates in AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager. Continuous TLS health monitoring tracks endpoint status (healthy/degraded/down/cert_mismatch) with configurable thresholds and historical probe data. All discovery modes feed into a unified triage workflow — claim, dismiss, or import what you find.
-
-**Policy engine.** Certificate profiles constrain key types, max TTL, and EKUs — with crypto policy enforcement that validates every CSR against profile rules before it reaches the issuer. MaxTTL caps are enforced per issuer connector. Approval workflows pause jobs for human review. Ownership tracking routes notifications to the right team. Agent groups match devices by OS, architecture, IP CIDR, and version.
-
-**Enrollment protocols.** EST server (RFC 7030) for device and WiFi enrollment. SCEP server (RFC 8894) for MDM platforms and network devices — full wire format (EnvelopedData decrypt + signerInfo POPO verify + CertRep PKIMessage builder), tested against ChromeOS-shape requests; multi-profile dispatch (`/scep/<pathID>`); RenewalReq + GetCertInitial messageType support; lightweight raw-CSR fallback for legacy clients. See [docs/legacy-est-scep.md](docs/legacy-est-scep.md) for the operator + device-integration guide. S/MIME issuance with email protection EKU.
-
-**Revocation.** Single and bulk revocation (by profile, owner, agent, or issuer). RFC 5280 reason codes. Production-grade revocation status surface for relying parties: DER-encoded X.509 CRL per issuer, scheduler-pre-generated and cached so HTTP fetches do not rebuild per request; embedded OCSP responder serving both GET and POST forms (RFC 6960 §A.1.1) with responses signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6, `id-pkix-ocsp-nocheck` per §4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Both endpoints live unauthenticated under `/.well-known/pki/` per RFC 8615. Short-lived certs (TTL < 1 hour) are exempt — expiry is sufficient revocation. See [docs/crl-ocsp.md](docs/crl-ocsp.md) for the relying-party integration guide.
-
-**Audit and observability.** Immutable append-only audit trail records every lifecycle action, every API call, and every approval decision. Prometheus metrics endpoint. Scheduled certificate digest emails. Continuous endpoint health monitoring with state machine transitions and real-time alerts.
-
-**Notifications.** Slack, Teams, PagerDuty, OpsGenie, SMTP, webhooks. Routed by certificate owner. Daily digest emails with stats and expiring certs.
-
-**Multiple interfaces.** REST API (111 routes), CLI (12 commands), MCP server (80 tools for Claude, Cursor, Windsurf), Helm chart, web dashboard. Certificate export in PEM and PKCS#12.
-
-**First-run onboarding.** Wizard guides you through connecting a CA, deploying an agent, and issuing your first certificate. Or start with the pre-populated demo — 32 certificates, 10 issuers, 180 days of history.
-
-For the complete capability breakdown, see the [Feature Inventory](docs/features.md).
+Security: three authentication paths — API keys (SHA-256 hashed + constant-time compared), [OIDC SSO](docs/operator/oidc-runbooks/index.md) (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace), and Argon2id [break-glass admin](docs/operator/security.md) for SSO-outage recovery. Successful OIDC login mints an HMAC-signed server-side session with `__Host-` cookies, CSRF rotation on every privileged write, and [RFC OIDC Back-Channel Logout](docs/reference/auth-standards-implemented.md) for IdP-driven session revoke. Role-based authorization on every gated handler with global / per-profile / per-issuer scope. Auditor split keeps regulator-class actors strictly read-only on the audit trail. Day-0 admin via a one-shot bootstrap token; granting or revoking roles requires the dedicated `auth.role.assign` permission. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Issuer + target + OIDC client_secret credentials encrypted at rest with AES-256-GCM. HTTPS-only control plane with TLS 1.3 pinned and a fail-closed startup gate that refuses to boot if the TLS bundle is unusable. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, static analysis, and vulnerability scanning on every commit. See [`docs/operator/security.md`](docs/operator/security.md) for the full posture and [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md) for what's defended vs deferred.

 ## Quick Start

-### Docker Compose (Recommended)
+### Docker Compose (recommended)
+
+**Demo path — zero config, populated dashboard:**

 ```bash
 git clone https://github.com/certctl-io/certctl.git
 cd certctl
+./deploy/demo-up.sh -d --build
+```
+
+Wait ~30 seconds, then open **https://localhost:8443** in your browser. The `demo-up.sh` wrapper exports a fresh `CERTCTL_DEMO_MODE_ACK_TS=$(date +%s)` and forwards the remaining args to `docker compose -f docker-compose.yml -f docker-compose.demo.yml up`. The timestamp export is required by the Phase 2 SEC-H3 fail-closed guard in `internal/config/config.go::Validate` — demo deploys must re-ACK every 24h so a forgotten demo container never silently ends up serving production traffic with `auth-type=none`. The bare `docker compose ... up` command without the timestamp refuses to boot; the wrapper script is the supported entry point.
+
+The demo overlay flips the base into demo-mode auth (every request served as the synthetic admin actor `actor-demo-anon` — the server emits a prominent ⚠ DEMO MODE banner at boot reminding you this posture is for evaluation only) and seeds 180 days of realistic history across 13 issuers, 8 agents, managed + discovered certs, jobs, deploys, audit, and notification events. The `certctl-tls-init` init container self-signs an ECDSA-P256 cert on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.
+
+**Production path — `.env` required, fail-closed on placeholders:**
+
+```bash
+cp .env.example deploy/.env       # or root .env if running outside compose
+"${EDITOR:-nano}" deploy/.env     # set POSTGRES_PASSWORD, CERTCTL_AUTH_SECRET,
+                                   # CERTCTL_API_KEY, CERTCTL_CONFIG_ENCRYPTION_KEY,
+                                   # CERTCTL_AGENT_ID — all via openssl rand
+                                   # (replace nano with your preferred editor)
 docker compose -f deploy/docker-compose.yml up -d --build
 ```

-Wait ~30 seconds, then open **https://localhost:8443** in your browser. (The shipped `docker-compose.yml` self-signs a cert via the `certctl-tls-init` init container on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.) The onboarding wizard walks you through connecting a CA, deploying an agent, and issuing your first certificate.
-
-**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history:
-
-```bash
-docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
-```
-
-The `deploy/` directory has four compose files: `docker-compose.yml` (base platform), `docker-compose.demo.yml` (demo data overlay), `docker-compose.dev.yml` (PgAdmin + debug logging), and `docker-compose.test.yml` (standalone integration tests with real CA backends). See the [Docker Compose Environments Guide](deploy/ENVIRONMENTS.md) for a service-by-service walkthrough, or the [Quick Start](docs/quickstart.md#docker-compose-environments) for a summary.
+The base compose alone (no demo overlay) ships production-shaped: default `auth-type=api-key`, default `keygen-mode=agent`, no demo seed, no demo-mode synthetic admin. The fail-closed startup guards in `internal/config/config.go::Validate` refuse to boot when any of the change-me-... placeholder credentials reach config outside of demo mode (Bundle 2 closure, 2026-05-12). The four compose files (`docker-compose.yml` base, `docker-compose.demo.yml` overlay, `docker-compose.dev.yml` for PgAdmin + debug logging, `docker-compose.test.yml` for integration tests) are documented at [`deploy/ENVIRONMENTS.md`](deploy/ENVIRONMENTS.md).

 ```bash
 curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
 # {"status":"healthy"}
 ```

-The control plane is HTTPS-only (TLS 1.3, no plaintext listener). See [`docs/tls.md`](docs/tls.md) for cert provisioning patterns and [`docs/upgrade-to-tls.md`](docs/upgrade-to-tls.md) if you're upgrading from a pre-v2.2 release.
+The control plane is HTTPS-only with TLS 1.3 pinned. See [`docs/operator/tls.md`](docs/operator/tls.md) for cert provisioning patterns.

-### Agent Install (One-Liner)
+### Agent install (one-liner)

 ```bash
 curl -sSL https://raw.githubusercontent.com/certctl-io/certctl/master/install-agent.sh | bash
 ```

-Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh) for details.
+Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh).

-### Helm Chart (Kubernetes)
+### Helm chart (Kubernetes)

 ```bash
+# Required: TLS (pick one), server API key, and Postgres password.
+# The chart fail-fasts at template time if any required value is missing.
 helm install certctl deploy/helm/certctl/ \
-  --set server.apiKey=your-api-key \
-  --set postgres.password=your-db-password
+  --set server.tls.existingSecret=<your-kubernetes.io/tls-secret-name> \
+  --set server.auth.apiKey=$(openssl rand -base64 32) \
+  --set postgresql.auth.password=$(openssl rand -base64 32)
 ```

-Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml) for all configuration options.
+Production-ready chart with Server Deployment, PostgreSQL StatefulSet (or external Postgres), Agent DaemonSet, health probes, container-scope security hardening (read-only rootfs, drop-all capabilities, non-root UID), optional PodDisruptionBudget, NetworkPolicy, Prometheus ServiceMonitor, and Ingress. See [values.yaml](deploy/helm/certctl/values.yaml) and the [external-Postgres example](deploy/helm/examples/values-external-db.yaml).

-### Docker Pull
+### Container images

 ```bash
-docker pull shankar0123.docker.scarf.sh/certctl-server
-docker pull shankar0123.docker.scarf.sh/certctl-agent
-```
-
-## Verifying this release
-
-Every `v*` tag publishes signed, attested release artefacts. Binaries
-(`certctl-agent`, `certctl-server`, `certctl-cli`, `certctl-mcp-server` for
-`linux|darwin × amd64|arm64`) ship alongside a `checksums.txt`, per-binary
-SPDX-JSON SBOMs, Cosign signatures, and SLSA Level 3 provenance. Container
-images on `ghcr.io/certctl-io/certctl-{server,agent}` are built with
-`docker/build-push-action` `provenance: mode=max` + `sbom: true` and are
-additionally signed with Cosign at the image digest.
-
-All signatures use Cosign keyless OIDC; the signing identity is the
-release workflow running on a signed tag.
-
-**1. Verify SHA-256 checksums:**
-
-```bash
-sha256sum -c checksums.txt
-```
-
-**2. Verify the Cosign signature on `checksums.txt`:**
-
-```bash
-cosign verify-blob \
-  --bundle checksums.txt.sigstore.json \
-  --certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
-  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
-  checksums.txt
-```
-
-Every individual binary ships with its own `.sigstore.json` bundle
-(unified Sigstore bundle containing signature, certificate chain, and
-Rekor inclusion proof). Swap `checksums.txt` for any binary name and
-point `--bundle` at the matching `<binary>.sigstore.json` to verify it
-directly.
-
-**3. Verify SLSA Level 3 provenance on a binary:**
-
-```bash
-slsa-verifier verify-artifact \
-  --provenance-path multiple.intoto.jsonl \
-  --source-uri github.com/certctl-io/certctl \
-  --source-tag v2.1.0 \
-  certctl-agent-linux-amd64
-```
-
-**4. Verify a container image signature and its SBOM / provenance attestations:**
-
-```bash
-IMAGE=ghcr.io/certctl-io/certctl-server:v2.1.0
-
-cosign verify \
-  --certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
-  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
-  "$IMAGE"
-
-# SBOM attestation (SPDX-JSON, emitted by docker/build-push-action)
-cosign verify-attestation --type spdxjson \
-  --certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
-  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
-  "$IMAGE"
-
-# SLSA provenance attestation (docker/build-push-action `provenance: mode=max`)
-cosign verify-attestation --type slsaprovenance \
-  --certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
-  --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
-  "$IMAGE"
+docker pull ghcr.io/certctl-io/certctl-server:latest
+docker pull ghcr.io/certctl-io/certctl-agent:latest
 ```

 ## Examples

-Pick the scenario closest to your setup and have it running in 2 minutes.
+Pick the scenario closest to your setup and have it running in 2 minutes:

 | Example | Scenario |
 |---------|----------|
@@ -327,100 +161,38 @@ Pick the scenario closest to your setup and have it running in 2 minutes.

 Each directory contains a `docker-compose.yml` and a `README.md` explaining the scenario, prerequisites, and customization.

-## CLI
+## Verifying a release

-```bash
-# Install
-go install github.com/certctl-io/certctl/cmd/cli@latest
-
-# Configure
-export CERTCTL_SERVER_URL=https://localhost:8443
-export CERTCTL_API_KEY=your-api-key
-export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt   # or --ca-bundle on the CLI; --insecure for dev self-signed
-
-# Usage
-certctl-cli certs list                    # List all certificates
-certctl-cli certs renew mc-api-prod       # Trigger renewal
-certctl-cli certs revoke mc-api-prod --reason keyCompromise
-certctl-cli agents list                   # List registered agents
-certctl-cli jobs list                     # List jobs
-certctl-cli status                        # Server health + summary stats
-certctl-cli import certs.pem              # Bulk import from PEM file
-certctl-cli certs list --format json      # JSON output (default: table)
-```
-
-## MCP Server (AI Integration)
-
-certctl ships a standalone MCP (Model Context Protocol) server that exposes all 80 API endpoints as tools for AI assistants — Claude, Cursor, Windsurf, OpenClaw, VS Code Copilot, and any MCP-compatible client.
-
-```bash
-# Install and run
-go install github.com/certctl-io/certctl/cmd/mcp-server@latest
-export CERTCTL_SERVER_URL=https://localhost:8443
-export CERTCTL_API_KEY=your-api-key
-export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt   # required for self-signed bootstrap
-mcp-server
-```
-
-The MCP server is env-vars-only — there are no CLI flags for TLS. If you must bypass verification for local development against a self-signed cert, set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`. Never set that in production.
-
-**Claude Desktop** (`claude_desktop_config.json`):
-```json
-{
-  "mcpServers": {
-    "certctl": {
-      "command": "mcp-server",
-      "env": {
-        "CERTCTL_SERVER_URL": "https://localhost:8443",
-        "CERTCTL_API_KEY": "your-api-key",
-        "CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/ca.crt"
-      }
-    }
-  }
-}
-```
+Every `v*` tag publishes signed, attested artefacts (Cosign keyless OIDC + SLSA Level 3 provenance + SPDX-JSON SBOMs). For the verification procedure, see [`docs/reference/release-verification.md`](docs/reference/release-verification.md).

 ## Development

 ```bash
 make build              # Build server + agent binaries
 make test               # Run tests
-make lint               # golangci-lint (11 linters)
+make lint               # golangci-lint (govet + staticcheck + contextcheck + unused)
 govulncheck ./...       # Vulnerability scan
 make docker-up          # Start Docker Compose stack
 ```

-CI runs on every push: `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%). Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build. 1,668 Go test functions with 625+ subtests, plus frontend test suite.
-
-## Roadmap
-
-### V1 (v1.0.0) — Shipped
-Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector, agent-side key generation, API auth + rate limiting, React dashboard, CI pipeline with coverage gates, Docker images on GHCR.
-
-### V2: Operational Maturity — Shipped
-30+ milestones shipping enterprise-grade features for free. Sub-CA mode, ACME DNS-01/DNS-PERSIST-01/EAB/ARI (RFC 9773)/profile selection, step-ca, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM PCA, Entrust, GlobalSign, EJBCA, OpenSSL/Custom CA issuers. NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets targets. EST server (RFC 7030) and SCEP server (RFC 8894) enrollment protocols. RFC 5280 revocation with DER CRL + embedded OCSP responder. Certificate profiles, ownership tracking, team assignment, agent groups, interactive approval workflows. Filesystem, network, and cloud secret manager (AWS SM, Azure KV, GCP SM) certificate discovery with triage GUI. Dynamic issuer/target configuration via GUI with AES-256-GCM encrypted storage. First-run onboarding wizard. Post-deployment TLS verification. Certificate export (PEM/PKCS#12). S/MIME support. Prometheus metrics. Scheduled certificate digest emails. Slack, Teams, PagerDuty, OpsGenie, SMTP notifications. MCP server (80 tools), CLI (12 commands), Helm chart. Compliance mapping (SOC 2, PCI-DSS 4.0, NIST SP 800-57). 5 turnkey deployment examples. Agent install script. Migration guides from certbot, acme.sh, and cert-manager. See the [Feature Inventory](docs/features.md) for details.
-
-### Forward-looking work — all free, all self-hostable
-Everything ships free under BSL 1.1. No paid tier, no V3 / V4 gating, no enterprise edition. Future revenue path is a managed-service hosting offering — operate certctl-server as a hosted service while customers self-install only the agent.
+CI runs `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-package coverage thresholds (service 70%, handler 75%, crypto 88%, auth packages 85-95%) on every push. The thresholds-as-data file is `.github/coverage-thresholds.yml`; lowering a floor requires corresponding test work, not a config flip. Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build.

 ## License

-Certctl is licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial offering to third parties, whether hosted, managed, embedded, bundled, or integrated.
+Licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial certificate-management offering to third parties. See the LICENSE file for the full Additional Use Grant.

 For licensing inquiries: certctl@proton.me

 ## Dependencies

-Backend dependency footprint is auditable on demand:
-
-```
+```bash
 go list -m all | wc -l   # total module count (direct + transitive)
-go mod why <path>        # explain why a particular module is pulled in
+go mod why <path>        # explain why a module is pulled in
 govulncheck ./...        # vulnerability scan (CI runs this on every commit)
 ```

-The release-time SBOM is published as a syft-produced cyclonedx file alongside each release artifact in `.github/workflows/release.yml`.
+The release-time SBOM is published as an SPDX-JSON file alongside each release artifact.

 ---

-If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests — [open an issue](https://github.com/certctl-io/certctl/issues).
+If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests: [open an issue](https://github.com/certctl-io/certctl/issues).
@@ -0,0 +1,161 @@
+# Third-Party Notices
+
+certctl is distributed under the Business Source License 1.1
+(see [LICENSE](LICENSE)). The binaries built from this source link
+third-party Go and JavaScript libraries listed below; certctl LLC
+acknowledges each library's authors and reproduces their copyright
+and license terms here in compliance with each library's license.
+
+Full license text for each library lives in that library's upstream
+repository. The license type is provided per-row; for the canonical
+notice, refer to the upstream source.
+
+- **Last reviewed:** 2026-05-13
+- **Holder:** certctl LLC
+- **License:** BSL 1.1 (Apache 2.0 effective March 14, 2076)
+
+## Go Modules (binary-link dependencies)
+
+Generated by walking `go list -deps ./...` against the certctl
+server, agent, CLI, and MCP-server build paths. Excludes the Go
+standard library and the certctl-io/certctl module itself.
+
+**Count:** see commit; generate via `go list -deps -f '{{if .Module}}{{.Module.Path}} {{.Module.Version}}{{end}}' ./...`
+
+| Module | Version | License |
+|---|---|---|
+| `github.com/Azure/azure-sdk-for-go/sdk/azcore` | v1.20.0 | MIT |
+| `github.com/Azure/azure-sdk-for-go/sdk/azidentity` | v1.13.1 | MIT |
+| `github.com/Azure/azure-sdk-for-go/sdk/internal` | v1.11.2 | MIT |
+| `github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/azcertificates` | v1.4.0 | MIT |
+| `github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/internal` | v1.2.0 | MIT |
+| `github.com/Azure/go-ntlmssp` | v0.1.1 | MIT |
+| `github.com/AzureAD/microsoft-authentication-library-for-go` | v1.6.0 | MIT |
+| `github.com/ChrisTrenkamp/goxpath` | v0.0.0-20210404020558-97928f7e12b6 | MIT |
+| `github.com/aws/aws-sdk-go-v2` | v1.41.7 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/config` | v1.32.17 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/credentials` | v1.19.16 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/feature/ec2/imds` | v1.18.23 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/internal/configsources` | v1.4.23 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/internal/endpoints/v2` | v2.7.23 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/internal/v4a` | v1.4.24 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/acm` | v1.38.3 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/acmpca` | v1.46.14 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding` | v1.13.9 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/internal/presigned-url` | v1.13.23 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/signin` | v1.0.11 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/sso` | v1.30.17 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/ssooidc` | v1.35.21 | Apache-2.0 |
+| `github.com/aws/aws-sdk-go-v2/service/sts` | v1.42.1 | Apache-2.0 |
+| `github.com/aws/smithy-go` | v1.25.1 | Apache-2.0 |
+| `github.com/bodgit/ntlmssp` | v0.0.0-20240506230425-31973bb52d9b | BSD-2/3-Clause |
+| `github.com/bodgit/windows` | v1.0.1 | BSD-2/3-Clause |
+| `github.com/coreos/go-oidc/v3` | v3.18.0 | Apache-2.0 |
+| `github.com/go-jose/go-jose/v4` | v4.1.4 | Apache-2.0 |
+| `github.com/go-logr/logr` | v1.4.3 | Apache-2.0 |
+| `github.com/gofrs/uuid` | v4.4.0+incompatible | MIT |
+| `github.com/golang-jwt/jwt/v5` | v5.3.0 | MIT |
+| `github.com/google/jsonschema-go` | v0.4.2 | MIT |
+| `github.com/google/uuid` | v1.6.0 | BSD-2/3-Clause |
+| `github.com/hashicorp/go-cleanhttp` | v0.5.2 | MPL-2.0 |
+| `github.com/hashicorp/go-uuid` | v1.0.3 | MPL-2.0 |
+| `github.com/jcmturner/aescts/v2` | v2.0.0 | Apache-2.0 |
+| `github.com/jcmturner/dnsutils/v2` | v2.0.0 | Apache-2.0 |
+| `github.com/jcmturner/gofork` | v1.7.6 | BSD-2/3-Clause |
+| `github.com/jcmturner/goidentity/v6` | v6.0.1 | Apache-2.0 |
+| `github.com/jcmturner/gokrb5/v8` | v8.4.4 | Apache-2.0 |
+| `github.com/jcmturner/rpc/v2` | v2.0.3 | Apache-2.0 |
+| `github.com/kr/fs` | v0.1.0 | BSD-2/3-Clause |
+| `github.com/kylelemons/godebug` | v1.1.0 | Apache-2.0 |
+| `github.com/lib/pq` | v1.10.9 | MIT |
+| `github.com/masterzen/simplexml` | v0.0.0-20190410153822-31eea3082786 | Apache-2.0 |
+| `github.com/masterzen/winrm` | v0.0.0-20250927112105-5f8e6c707321 | Apache-2.0 |
+| `github.com/modelcontextprotocol/go-sdk` | v1.4.1 | Apache-2.0 |
+| `github.com/pkg/browser` | v0.0.0-20240102092130-5ac0b6a4141c | BSD-2/3-Clause |
+| `github.com/pkg/sftp` | v1.13.10 | BSD-2/3-Clause |
+| `github.com/segmentio/asm` | v1.1.3 | MIT |
+| `github.com/segmentio/encoding` | v0.5.4 | MIT |
+| `github.com/tidwall/transform` | v0.0.0-20201103190739-32f242e2dbde | ISC |
+| `github.com/yosida95/uritemplate/v3` | v3.0.2 | BSD-2/3-Clause |
+| `golang.org/x/crypto` | v0.50.0 | BSD-2/3-Clause |
+| `golang.org/x/net` | v0.53.0 | BSD-2/3-Clause |
+| `golang.org/x/oauth2` | v0.36.0 | BSD-2/3-Clause |
+| `golang.org/x/sync` | v0.20.0 | BSD-2/3-Clause |
+| `golang.org/x/sys` | v0.43.0 | BSD-2/3-Clause |
+| `golang.org/x/text` | v0.36.0 | BSD-2/3-Clause |
+| `software.sslmate.com/src/go-pkcs12` | v0.7.0 | BSD-2/3-Clause |
+
+## JavaScript Packages (production transitive closure)
+
+Generated by walking the `dependencies` graph from `web/package.json`
+through `node_modules/`. Excludes devDependencies (Vitest, Playwright,
+Vite, etc.) since they don't ship in the distributed frontend bundle.
+
+| Package | Version | License |
+|---|---|---|
+| `@reduxjs/toolkit` | 2.11.2 | MIT |
+| `@remix-run/router` | 1.23.2 | MIT |
+| `@standard-schema/spec` | 1.1.0 | MIT |
+| `@standard-schema/utils` | 0.3.0 | MIT |
+| `@tanstack/query-core` | 5.90.20 | MIT |
+| `@tanstack/react-query` | 5.90.21 | MIT |
+| `@types/d3-array` | 3.2.2 | MIT |
+| `@types/d3-color` | 3.1.3 | MIT |
+| `@types/d3-ease` | 3.0.2 | MIT |
+| `@types/d3-interpolate` | 3.0.4 | MIT |
+| `@types/d3-path` | 3.1.1 | MIT |
+| `@types/d3-scale` | 4.0.9 | MIT |
+| `@types/d3-shape` | 3.1.8 | MIT |
+| `@types/d3-time` | 3.0.4 | MIT |
+| `@types/d3-timer` | 3.0.2 | MIT |
+| `@types/use-sync-external-store` | 0.0.6 | MIT |
+| `clsx` | 2.1.1 | MIT |
+| `d3-array` | 3.2.4 | ISC |
+| `d3-color` | 3.1.0 | ISC |
+| `d3-ease` | 3.0.1 | BSD-3-Clause |
+| `d3-format` | 3.1.2 | ISC |
+| `d3-interpolate` | 3.0.1 | ISC |
+| `d3-path` | 3.1.0 | ISC |
+| `d3-scale` | 4.0.2 | ISC |
+| `d3-shape` | 3.2.0 | ISC |
+| `d3-time` | 3.1.0 | ISC |
+| `d3-time-format` | 4.1.0 | ISC |
+| `d3-timer` | 3.0.1 | ISC |
+| `decimal.js-light` | 2.5.1 | MIT |
+| `es-toolkit` | 1.45.1 | MIT |
+| `eventemitter3` | 5.0.4 | MIT |
+| `immer` | 10.2.0 | MIT |
+| `internmap` | 2.0.3 | ISC |
+| `js-tokens` | 4.0.0 | MIT |
+| `loose-envify` | 1.4.0 | MIT |
+| `react` | 18.3.1 | MIT |
+| `react-dom` | 18.3.1 | MIT |
+| `react-redux` | 9.2.0 | MIT |
+| `react-router` | 6.30.3 | MIT |
+| `react-router-dom` | 6.30.3 | MIT |
+| `recharts` | 3.8.0 | MIT |
+| `redux` | 5.0.1 | MIT |
+| `redux-thunk` | 3.1.0 | MIT |
+| `reselect` | 5.1.1 | MIT |
+| `scheduler` | 0.23.2 | MIT |
+| `tiny-invariant` | 1.3.3 | MIT |
+| `use-sync-external-store` | 1.6.0 | MIT |
+| `victory-vendor` | 37.3.6 | MIT AND ISC |
+
+## Test-fixture-only dependencies
+
+**Cisco libest.** The certctl integration test suite exercises the EST
+(RFC 7030) endpoints against Cisco's libest reference client. libest
+runs as a sidecar container (`certctl-test-libest`) only when the
+`est-e2e` Docker Compose profile is active — it is **not** vendored
+into the certctl source tree and **not** linked into any distributed
+release artifact (server, agent, CLI, MCP-server, container images,
+or release tarballs). For libest's own license terms, see
+<https://github.com/cisco/libest>.
+
+**f5-mock-icontrol.** The F5 deployment-target integration test
+ships a small Go program at `deploy/test/f5-mock-icontrol/main.go`
+under the same BSL 1.1 license as the rest of certctl. The compiled
+ELF was removed from the tracked tree in Phase 1 closure (commit
+eda3b48, 2026-05-13); it now rebuilds via the Dockerfile's
+multi-stage build on demand.
@@ -0,0 +1 @@
+0
@@ -1,30 +1,100 @@
 # Routes registered in internal/api/router/router.go that are intentionally
-# NOT in api/openapi.yaml. Each entry needs a one-line `why:` justification.
+# NOT in api/openapi.yaml. Each entry needs a one-line `why:` justification
+# AND a required `category:` field (added in Phase 13 Sprint 13.1,
+# 2026-05-14, architecture diligence audit ARCH-H1).
+#
 # Adding a new entry requires PR-time review.
 #
 # OpenAPI-shaped REST endpoints belong in api/openapi.yaml, NOT here.
-# This list is for protocol-shaped (SCEP wire endpoints) and operational
-# (health, metrics, pprof) routes only.
+# This list is for protocol-shaped (SCEP/ACME/EST wire endpoints) and
+# operational (health, metrics, pprof) routes only.
 #
 # Per ci-pipeline-cleanup bundle Phase 9 / frozen decision 0.11.
+#
+# ──────────────────────────────────────────────────────────────────────
+# The two-bucket contract (Phase 13 Sprint 13.1)
+# ──────────────────────────────────────────────────────────────────────
+#
+#   category: wire-protocol
+#     The route's wire shape is dictated by an IETF RFC (SCEP RFC 8894,
+#     ACME RFC 8555, ACME ARI RFC 9773, EST RFC 7030) or it's a
+#     sibling/shorthand variant of such a route (same wire semantics,
+#     different cosmetic path — e.g. trailing-slash forms, default-
+#     profile shorthands). Documenting these as REST operations in
+#     openapi.yaml would duplicate the RFC with no information gain;
+#     the canonical operator references live in docs/acme-server.md +
+#     docs/operator/scep.md + docs/operator/est.md. These entries
+#     NEVER burn down — they're protocol contracts, not gaps.
+#
+#   category: rest-deferred
+#     The route is REST-shaped (resource CRUD, JSON request/response,
+#     RBAC-gated) but its OpenAPI operation was deferred when the
+#     handler shipped. These MUST monotonically decrease to zero.
+#     Phase 13 Sprints 13.4-13.6 author the OpenAPI ops + delete the
+#     corresponding exception entries; the
+#     openapi-rest-deferred-monotonic.sh CI guard fails any PR that
+#     grows the rest-deferred bucket vs the checked-in baseline at
+#     api/openapi-handler-exceptions-baseline.txt.
+#
+# ──────────────────────────────────────────────────────────────────────
+# Phase 13 Sprint 13.1 categorization (2026-05-14)
+# ──────────────────────────────────────────────────────────────────────
+#
+# Current split, re-derived by the parity script's bucket-reporting
+# subcommand (post-Sprint-13.6 / 2026-05-14):
+#
+#   total entries:           36
+#   wire-protocol:           36
+#   rest-deferred:           0    ← THE FLOOR — ARCH-H1 substantive close
+#
+# Burn-down progress:
+#
+#   Sprint 13.4 SHIPPED — 28 - 13 = 15 (auth/sessions cluster 3 ops +
+#                               auth/oidc CRUD + JWKS + test + refresh
+#                               + group-mappings cluster, 10 ops)
+#   Sprint 13.5 SHIPPED — 15 -  8 =  7 (auth/breakglass admin 4 ops +
+#                               auth/users 3 ops + auth/runtime-config
+#                               1 op, 8 ops total)
+#   Sprint 13.6 SHIPPED —  7 -  7 =  0 (audit/export 1 op + demo-
+#                               residual/cleanup 1 op + auth/logout 1 op +
+#                               auth/breakglass/login 1 op + 3 OIDC
+#                               browser-flow endpoints, 7 ops total)
+#
+# Sprint 13.7 next tightens the parity-script's rest-deferred floor
+# from monotonic-decrease to a hard zero-exact pin. After that, any
+# new REST route MUST land with an OpenAPI op or fail CI — no escape
+# hatch via `category: rest-deferred`.
+#
+# Each authored OpenAPI op needs request/response schemas (not
+# placeholders) so the generated client at web/orval.config.ts emits
+# typed signatures. When an op lands, delete the corresponding entry
+# below + bump api/openapi-handler-exceptions-baseline.txt downward.

 documented_exceptions:
  - route: "GET /scep"
    why: "SCEP wire-protocol endpoint per RFC 8894 §3.1; serves CA certs via GetCACert/GetCACaps query params, NOT a REST resource."
+    category: wire-protocol
  - route: "POST /scep"
    why: "SCEP wire-protocol endpoint per RFC 8894 §3.1; receives PKCSReq / RenewalReq PKIMessages, NOT a REST resource."
+    category: wire-protocol
  - route: "GET /scep/"
    why: "SCEP wire-protocol endpoint with trailing-slash variant; ChromeOS clients send the trailing-slash form."
+    category: wire-protocol
  - route: "POST /scep/"
    why: "SCEP wire-protocol endpoint with trailing-slash variant; ChromeOS clients send the trailing-slash form."
+    category: wire-protocol
  - route: "GET /scep-mtls"
    why: "SCEP-mTLS sibling endpoint per ci-pipeline-cleanup-prerequisite EST RFC 7030 hardening Phase 6.5; same wire-protocol semantics, mutually-authenticated TLS variant."
+    category: wire-protocol
  - route: "POST /scep-mtls"
    why: "SCEP-mTLS sibling endpoint, POST variant."
+    category: wire-protocol
  - route: "GET /scep-mtls/"
    why: "SCEP-mTLS sibling endpoint, trailing-slash variant."
+    category: wire-protocol
  - route: "POST /scep-mtls/"
    why: "SCEP-mTLS sibling endpoint, trailing-slash POST variant."
+    category: wire-protocol

  # ACME server (RFC 8555 + RFC 9773 ARI) — wire-protocol surface.
  # Like SCEP/EST, ACME is a JWS-signed-JSON wire protocol whose
@@ -36,59 +106,96 @@ documented_exceptions:
  # challenge, cert, key-change, revoke-cert, renewal-info routes land.
  - route: "GET /acme/profile/{id}/directory"
    why: "ACME server RFC 8555 §7.1.1 directory; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "HEAD /acme/profile/{id}/new-nonce"
    why: "ACME server RFC 8555 §7.2 new-nonce; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "GET /acme/profile/{id}/new-nonce"
    why: "ACME server RFC 8555 §7.2 new-nonce GET form; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/new-account"
    why: "ACME server RFC 8555 §7.3 new-account (JWS jwk); documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/account/{acc_id}"
    why: "ACME server RFC 8555 §7.3.2 + §7.3.6 (JWS kid) account update + deactivation; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "GET /acme/directory"
    why: "ACME server default-profile shorthand; mirrors per-profile when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID is set."
+    category: wire-protocol
  - route: "HEAD /acme/new-nonce"
    why: "ACME server default-profile shorthand for new-nonce HEAD."
+    category: wire-protocol
  - route: "GET /acme/new-nonce"
    why: "ACME server default-profile shorthand for new-nonce GET."
+    category: wire-protocol
  - route: "POST /acme/new-account"
    why: "ACME server default-profile shorthand for new-account."
+    category: wire-protocol
  - route: "POST /acme/account/{acc_id}"
    why: "ACME server default-profile shorthand for account update + deactivation."
+    category: wire-protocol

  # Phase 2 — orders + finalize + authz + cert.
  - route: "POST /acme/profile/{id}/new-order"
    why: "ACME server RFC 8555 §7.4 new-order; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/order/{ord_id}"
    why: "ACME server RFC 8555 §7.4 order POST-as-GET; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/order/{ord_id}/finalize"
    why: "ACME server RFC 8555 §7.4 finalize; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/authz/{authz_id}"
    why: "ACME server RFC 8555 §7.5 authz POST-as-GET; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/challenge/{chall_id}"
    why: "ACME server RFC 8555 §7.5.1 challenge response; dispatches to Phase 3 validator pool."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/cert/{cert_id}"
    why: "ACME server RFC 8555 §7.4.2 cert download; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/new-order"
    why: "Phase 2 default-profile shorthand for new-order."
+    category: wire-protocol
  - route: "POST /acme/order/{ord_id}"
    why: "Phase 2 default-profile shorthand for order POST-as-GET."
+    category: wire-protocol
  - route: "POST /acme/order/{ord_id}/finalize"
    why: "Phase 2 default-profile shorthand for finalize."
+    category: wire-protocol
  - route: "POST /acme/authz/{authz_id}"
    why: "Phase 2 default-profile shorthand for authz POST-as-GET."
+    category: wire-protocol
  - route: "POST /acme/challenge/{chall_id}"
    why: "Phase 3 default-profile shorthand for challenge response."
+    category: wire-protocol
  - route: "POST /acme/cert/{cert_id}"
    why: "Phase 2 default-profile shorthand for cert download."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/key-change"
    why: "ACME server RFC 8555 §7.3.5 doubly-signed key rollover; documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/profile/{id}/revoke-cert"
    why: "ACME server RFC 8555 §7.6 revoke-cert (kid OR cert-key auth); documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "GET /acme/profile/{id}/renewal-info/{cert_id}"
    why: "ACME server RFC 9773 ACME Renewal Information (unauthenticated GET); documented in docs/acme-server.md."
+    category: wire-protocol
  - route: "POST /acme/key-change"
    why: "Phase 4 default-profile shorthand for key rollover."
+    category: wire-protocol
  - route: "POST /acme/revoke-cert"
    why: "Phase 4 default-profile shorthand for revoke-cert."
+    category: wire-protocol
  - route: "GET /acme/renewal-info/{cert_id}"
    why: "Phase 4 default-profile shorthand for ARI."
+    category: wire-protocol
+
+  # =============================================================================
+  # Auth Bundle 2 + audit-2026-05-10/11 fix bundle — REST endpoints not yet
+  # represented in api/openapi.yaml. These are operator-facing REST endpoints
+  # (not protocol-shaped); the OpenAPI surface is scheduled to land pre-v2.2.0
+  # alongside the GUI E2E coverage push. Documented here so the parity guard
+  # stays green for the v2.1.0 release tag. Threat model + handler contracts
+  # live in docs/operator/{rbac.md,auth-threat-model.md,oidc-runbooks/*}.
+  # =============================================================================
@@ -0,0 +1,443 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"encoding/pem"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"path/filepath"
+	"strings"
+
+	"github.com/certctl-io/certctl/internal/connector/target"
+	"github.com/certctl-io/certctl/internal/connector/target/apache"
+	"github.com/certctl-io/certctl/internal/connector/target/awsacm"
+	"github.com/certctl-io/certctl/internal/connector/target/azurekv"
+	"github.com/certctl-io/certctl/internal/connector/target/caddy"
+	"github.com/certctl-io/certctl/internal/connector/target/envoy"
+	"github.com/certctl-io/certctl/internal/connector/target/f5"
+	"github.com/certctl-io/certctl/internal/connector/target/haproxy"
+	"github.com/certctl-io/certctl/internal/connector/target/iis"
+	jks "github.com/certctl-io/certctl/internal/connector/target/javakeystore"
+	k8s "github.com/certctl-io/certctl/internal/connector/target/k8ssecret"
+	"github.com/certctl-io/certctl/internal/connector/target/nginx"
+	pf "github.com/certctl-io/certctl/internal/connector/target/postfix"
+	sshconn "github.com/certctl-io/certctl/internal/connector/target/ssh"
+	"github.com/certctl-io/certctl/internal/connector/target/traefik"
+	wcs "github.com/certctl-io/certctl/internal/connector/target/wincertstore"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
+// cmd/agent/main.go via the Option B sibling-file pattern.
+//
+// This file holds the DEPLOYMENT executor + the target connector
+// factory + the deploy-only helpers:
+//
+//   - executeDeploymentJob: handles Pending deployment jobs by
+//     fetching the cert PEM from the control plane, loading the
+//     locally-held private key (in agent keygen mode), instantiating
+//     the appropriate target connector via createTargetConnector,
+//     calling DeployCertificate on it, and reporting Completed or
+//     Failed back to the control plane.
+//   - createTargetConnector: the big switch over target_type that
+//     instantiates one of 14 target connectors (apache / awsacm /
+//     azurekv / caddy / envoy / f5 / haproxy / iis / javakeystore /
+//     k8ssecret / nginx / postfix / ssh / traefik / wincertstore).
+//     Context is threaded into SDK-driven connectors (AWSACM,
+//     AzureKeyVault) so credential resolution honors caller
+//     cancellation per the contextcheck linter — see CI commit
+//     502823d.
+//   - splitPEMChain: split a PEM chain into (first cert, rest).
+//   - fetchCertificate: pull the PEM chain from
+//     GET /api/v1/certificates/{certID}/version.
+//
+// All 14 target-connector imports were used ONLY by
+// createTargetConnector; moving the factory here also moved the
+// 14 connector imports out of main.go, leaving the surviving
+// cmd/agent/main.go with the minimal stdlib surface its lifecycle
+// + HTTP infrastructure needs.
+
+// executeDeploymentJob executes a deployment job by fetching the certificate and deploying it
+// to the target system using the appropriate connector (NGINX, F5 BIG-IP, or IIS).
+//
+// For agent keygen mode, the private key is read from the local key store (keyDir/certID.key)
+// rather than fetched from the server. The deployment includes the locally-held key.
+//
+// Flow:
+// 1. Report job as Running
+// 2. Fetch the certificate PEM from the control plane
+// 3. Load local private key if it exists (agent keygen mode)
+// 4. Instantiate the target connector based on target_type from the work response
+// 5. Call DeployCertificate on the connector
+// 6. Report job as Completed (or Failed)
+func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
+	a.logger.Info("executing deployment job",
+		"job_id", job.ID,
+		"certificate_id", job.CertificateID,
+		"target_type", job.TargetType)
+
+	// Report job as running
+	if err := a.reportJobStatus(ctx, job.ID, "Running", ""); err != nil {
+		a.logger.Error("failed to report job running", "error", err)
+	}
+
+	// Fetch the certificate from the control plane
+	certPEM, err := a.fetchCertificate(ctx, job.CertificateID)
+	if err != nil {
+		a.logger.Error("failed to fetch certificate",
+			"job_id", job.ID,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("cert fetch failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+
+	a.logger.Info("certificate fetched for deployment",
+		"job_id", job.ID,
+		"cert_length", len(certPEM))
+
+	// Split PEM into cert and chain (separated by double newline between PEM blocks)
+	certOnly, chainPEM := splitPEMChain(certPEM)
+
+	// Check for locally-stored private key (agent keygen mode)
+	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
+	var keyPEM string
+	keyData, err := os.ReadFile(keyPath)
+	if err != nil {
+		a.logger.Error("failed to read local private key for deployment",
+			"job_id", job.ID,
+			"key_path", keyPath,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key read failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
+		}
+		return
+	}
+	keyPEM = string(keyData)
+	a.logger.Info("loaded local private key for deployment",
+		"job_id", job.ID,
+		"key_path", keyPath)
+
+	// Deploy to the target using the appropriate connector
+	if job.TargetType != "" {
+		connector, err := a.createTargetConnector(ctx, job.TargetType, job.TargetConfig)
+		if err != nil {
+			a.logger.Error("failed to create target connector",
+				"job_id", job.ID,
+				"target_type", job.TargetType,
+				"error", err)
+			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("connector init failed: %v", err)); reportErr != nil {
+				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+			}
+			return
+		}
+
+		// Bundle 1 / RT-C1 closure (2026-05-12): defense in depth. The server
+		// runs internal/connector/target/configcheck.Validate on the way IN
+		// (Create/Update), and rejects shell metacharacters in command-bearing
+		// fields. Re-run the connector's full ValidateConfig here on the way
+		// OUT, before any DeployCertificate call. This catches (a) configs
+		// that pre-date the server-side guard, (b) corruption/tampering of
+		// the encrypted config blob, and (c) per-connector filesystem
+		// invariants (cert dir exists, paths writable) that the server can't
+		// check because the filesystem is on the agent host.
+		if err := connector.ValidateConfig(ctx, job.TargetConfig); err != nil {
+			a.logger.Error("connector config validation failed",
+				"job_id", job.ID,
+				"target_type", job.TargetType,
+				"error", err)
+			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("%s config validation failed: %v", job.TargetType, err)); reportErr != nil {
+				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+			}
+			return
+		}
+
+		deployReq := target.DeploymentRequest{
+			CertPEM:      certOnly,
+			KeyPEM:       keyPEM,
+			ChainPEM:     chainPEM,
+			TargetConfig: job.TargetConfig,
+			Metadata: map[string]string{
+				"certificate_id": job.CertificateID,
+				"job_id":         job.ID,
+			},
+		}
+
+		// Phase 2 of the deploy-hardening I master bundle:
+		// per-target deploy mutex. Acquire BEFORE
+		// DeployCertificate so two concurrent renewals against
+		// the same target ID serialize. The lock is held for the
+		// full Deploy duration including PreCommit (validate),
+		// PostCommit (reload), and post-deploy verify (Phases
+		// 4-9). Released on every return path via defer.
+		var targetID string
+		if job.TargetID != nil {
+			targetID = *job.TargetID
+		}
+		if mu := a.targetDeployMutex(targetID); mu != nil {
+			mu.Lock()
+			defer mu.Unlock()
+		}
+
+		result, err := connector.DeployCertificate(ctx, deployReq)
+		if err != nil {
+			a.logger.Error("deployment failed",
+				"job_id", job.ID,
+				"target_type", job.TargetType,
+				"error", err)
+			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("deployment failed: %v", err)); reportErr != nil {
+				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+			}
+			return
+		}
+
+		a.logger.Info("target connector deployment completed",
+			"job_id", job.ID,
+			"target_type", job.TargetType,
+			"success", result.Success,
+			"message", result.Message)
+
+		// If verification is enabled, verify the deployment by probing the live TLS endpoint
+		targetHost, targetPort, err := extractTargetHostAndPort(job.TargetConfig)
+		if err != nil {
+			a.logger.Warn("could not extract target host/port for verification",
+				"job_id", job.ID,
+				"error", err)
+		} else {
+			a.verifyAndReportDeployment(ctx, job, targetHost, targetPort, certOnly)
+		}
+	} else {
+		a.logger.Info("no target type specified, skipping connector invocation",
+			"job_id", job.ID)
+	}
+
+	// Report job as completed
+	if err := a.reportJobStatus(ctx, job.ID, "Completed", ""); err != nil {
+		a.logger.Error("failed to report job completed", "error", err)
+		return
+	}
+
+	a.logger.Info("deployment job completed", "job_id", job.ID)
+}
+
+// createTargetConnector instantiates the appropriate target connector based on type.
+// ctx is threaded into SDK-driven connectors (AWSACM, AzureKeyVault) so credential
+// resolution honors caller cancellation / deadlines instead of using a fresh
+// context.Background() (the contextcheck linter enforces this — the original Rank 5
+// implementation used Background() and tripped CI on commit 502823d).
+func (a *Agent) createTargetConnector(ctx context.Context, targetType string, configJSON json.RawMessage) (target.Connector, error) {
+	switch targetType {
+	case "NGINX":
+		var cfg nginx.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid NGINX config: %w", err)
+			}
+		}
+		return nginx.New(&cfg, a.logger), nil
+
+	case "Apache":
+		var cfg apache.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Apache config: %w", err)
+			}
+		}
+		return apache.New(&cfg, a.logger), nil
+
+	case "HAProxy":
+		var cfg haproxy.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid HAProxy config: %w", err)
+			}
+		}
+		return haproxy.New(&cfg, a.logger), nil
+
+	case "F5":
+		var cfg f5.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid F5 config: %w", err)
+			}
+		}
+		conn, err := f5.New(&cfg, a.logger)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create F5 connector: %w", err)
+		}
+		return conn, nil
+
+	case "IIS":
+		var cfg iis.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid IIS config: %w", err)
+			}
+		}
+		return iis.New(&cfg, a.logger)
+
+	case "Traefik":
+		var cfg traefik.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Traefik config: %w", err)
+			}
+		}
+		return traefik.New(&cfg, a.logger), nil
+
+	case "Caddy":
+		var cfg caddy.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Caddy config: %w", err)
+			}
+		}
+		return caddy.New(&cfg, a.logger), nil
+
+	case "Envoy":
+		var cfg envoy.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Envoy config: %w", err)
+			}
+		}
+		return envoy.New(&cfg, a.logger), nil
+
+	case "Postfix":
+		var cfg pf.Config
+		cfg.Mode = "postfix"
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Postfix config: %w", err)
+			}
+		}
+		return pf.New(&cfg, a.logger), nil
+
+	case "Dovecot":
+		var cfg pf.Config
+		cfg.Mode = "dovecot"
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid Dovecot config: %w", err)
+			}
+		}
+		return pf.New(&cfg, a.logger), nil
+
+	case "SSH":
+		var cfg sshconn.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid SSH config: %w", err)
+			}
+		}
+		return sshconn.New(&cfg, a.logger)
+
+	case "WinCertStore":
+		var cfg wcs.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid WinCertStore config: %w", err)
+			}
+		}
+		return wcs.New(&cfg, a.logger)
+
+	case "JavaKeystore":
+		var cfg jks.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid JavaKeystore config: %w", err)
+			}
+		}
+		return jks.New(&cfg, a.logger), nil
+
+	case "KubernetesSecrets":
+		var cfg k8s.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid KubernetesSecrets config: %w", err)
+			}
+		}
+		return k8s.New(&cfg, a.logger)
+
+	case "AWSACM":
+		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
+		// AWS Certificate Manager target — SDK-driven (no file I/O).
+		// LoadDefaultConfig handles the standard AWS credential chain
+		// (IRSA / EC2 instance profile / SSO / env vars) without any
+		// long-lived creds in connector Config.
+		var cfg awsacm.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid AWSACM config: %w", err)
+			}
+		}
+		return awsacm.New(ctx, &cfg, a.logger)
+
+	case "AzureKeyVault":
+		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
+		// Azure Key Vault target — SDK-driven (no file I/O).
+		// DefaultAzureCredential handles the standard Azure credential
+		// chain (managed identity / workload identity / env vars / az
+		// CLI fallback). Long-lived service-principal secrets are
+		// supported but discouraged via the credential_mode config.
+		var cfg azurekv.Config
+		if len(configJSON) > 0 {
+			if err := json.Unmarshal(configJSON, &cfg); err != nil {
+				return nil, fmt.Errorf("invalid AzureKeyVault config: %w", err)
+			}
+		}
+		return azurekv.New(ctx, &cfg, a.logger)
+
+	default:
+		return nil, fmt.Errorf("unsupported target type: %s", targetType)
+	}
+}
+
+// splitPEMChain splits a PEM chain into the first certificate (cert) and the rest (chain).
+// The control plane returns the full chain as a single string with PEM blocks concatenated.
+func splitPEMChain(pemChain string) (string, string) {
+	data := []byte(pemChain)
+	block, rest := pem.Decode(data)
+	if block == nil {
+		return pemChain, ""
+	}
+	cert := string(pem.EncodeToMemory(block))
+
+	// Skip whitespace between cert and chain
+	chain := strings.TrimSpace(string(rest))
+	if chain == "" {
+		return cert, ""
+	}
+	return cert, chain
+}
+
+// fetchCertificate retrieves the certificate PEM chain from the control plane.
+// GET /api/v1/agents/{agentID}/certificates/{certID}
+func (a *Agent) fetchCertificate(ctx context.Context, certID string) (string, error) {
+	path := fmt.Sprintf("/api/v1/agents/%s/certificates/%s", a.config.AgentID, certID)
+	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
+	if err != nil {
+		return "", fmt.Errorf("request failed: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(resp.Body)
+		return "", fmt.Errorf("server returned %d: %s", resp.StatusCode, string(body))
+	}
+
+	var certResp struct {
+		CertificatePEM string `json:"certificate_pem"`
+	}
+	if err := json.NewDecoder(resp.Body).Decode(&certResp); err != nil {
+		return "", fmt.Errorf("failed to decode response: %w", err)
+	}
+
+	return certResp.CertificatePEM, nil
+}
@@ -0,0 +1,275 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"context"
+	"crypto/ecdsa"
+	"crypto/rsa"
+	"crypto/sha256"
+	"crypto/x509"
+	"encoding/pem"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"path/filepath"
+	"strings"
+	"time"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
+// cmd/agent/main.go via the Option B sibling-file pattern.
+//
+// This file holds the filesystem DISCOVERY scan — the agent's
+// outbound surface for reporting pre-existing certificates it
+// finds on disk back to the control plane (POST /api/v1/agents/
+// {id}/discoveries, a machine-to-machine flow NOT exposed via the
+// MCP surface per the comment in
+// internal/mcp/tools.go::RegisterTools):
+//
+//   - runDiscoveryScan: walks each configured discovery directory,
+//     dispatches each candidate file to parsePEMFile or parseDERFile
+//     depending on extension, batches the parsed entries, and POSTs
+//     them in one report.
+//   - parsePEMFile / parseDERFile: extract every X.509 certificate
+//     from a candidate file in either encoding.
+//   - certToEntry: project a parsed *x509.Certificate into the
+//     discoveredCertEntry shape the control plane expects.
+//   - discoveredCertEntry struct + sha256Sum + certKeyInfo helpers
+//     consumed only by the discovery path; co-locating them keeps
+//     this file self-contained.
+
+// runDiscoveryScan walks configured directories, parses certificate files, and reports
+// discovered certificates to the control plane.
+// Supports PEM and DER encoded X.509 certificates.
+func (a *Agent) runDiscoveryScan(ctx context.Context) {
+	a.logger.Info("starting filesystem certificate discovery scan",
+		"directories", a.config.DiscoveryDirs)
+
+	startTime := time.Now()
+	var certs []discoveredCertEntry
+	var scanErrors []string
+
+	for _, dir := range a.config.DiscoveryDirs {
+		a.logger.Debug("scanning directory", "path", dir)
+
+		err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
+			if err != nil {
+				scanErrors = append(scanErrors, fmt.Sprintf("walk error at %s: %v", path, err))
+				return nil // continue walking
+			}
+			if info.IsDir() {
+				return nil
+			}
+
+			// Skip files larger than 1MB (unlikely to be a certificate)
+			if info.Size() > 1*1024*1024 {
+				return nil
+			}
+
+			// Check file extension
+			ext := strings.ToLower(filepath.Ext(path))
+			switch ext {
+			case ".pem", ".crt", ".cer", ".cert":
+				found := a.parsePEMFile(path)
+				certs = append(certs, found...)
+			case ".der":
+				if entry, err := a.parseDERFile(path); err == nil {
+					certs = append(certs, entry)
+				} else {
+					a.logger.Debug("skipping non-cert DER file", "path", path, "error", err)
+				}
+			default:
+				// Try PEM parsing for extensionless files or unknown extensions
+				if ext == "" || ext == ".key" {
+					return nil // skip key files and extensionless
+				}
+				found := a.parsePEMFile(path)
+				if len(found) > 0 {
+					certs = append(certs, found...)
+				}
+			}
+			return nil
+		})
+		if err != nil {
+			scanErrors = append(scanErrors, fmt.Sprintf("failed to walk %s: %v", dir, err))
+		}
+	}
+
+	scanDuration := time.Since(startTime)
+	a.logger.Info("discovery scan completed",
+		"certificates_found", len(certs),
+		"errors", len(scanErrors),
+		"duration_ms", scanDuration.Milliseconds())
+
+	if len(certs) == 0 && len(scanErrors) == 0 {
+		a.logger.Debug("no certificates found and no errors, skipping report")
+		return
+	}
+
+	// Build report payload
+	entries := make([]map[string]interface{}, len(certs))
+	for i, c := range certs {
+		entries[i] = map[string]interface{}{
+			"fingerprint_sha256": c.FingerprintSHA256,
+			"common_name":        c.CommonName,
+			"sans":               c.SANs,
+			"serial_number":      c.SerialNumber,
+			"issuer_dn":          c.IssuerDN,
+			"subject_dn":         c.SubjectDN,
+			"not_before":         c.NotBefore,
+			"not_after":          c.NotAfter,
+			"key_algorithm":      c.KeyAlgorithm,
+			"key_size":           c.KeySize,
+			"is_ca":              c.IsCA,
+			"pem_data":           c.PEMData,
+			"source_path":        c.SourcePath,
+			"source_format":      c.SourceFormat,
+		}
+	}
+
+	report := map[string]interface{}{
+		"agent_id":         a.config.AgentID,
+		"directories":      a.config.DiscoveryDirs,
+		"certificates":     entries,
+		"errors":           scanErrors,
+		"scan_duration_ms": int(scanDuration.Milliseconds()),
+	}
+
+	// Submit to control plane
+	path := fmt.Sprintf("/api/v1/agents/%s/discoveries", a.config.AgentID)
+	resp, err := a.makeRequest(ctx, http.MethodPost, path, report)
+	if err != nil {
+		a.logger.Error("failed to submit discovery report", "error", err)
+		return
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusAccepted {
+		body, _ := io.ReadAll(resp.Body)
+		a.logger.Error("discovery report rejected",
+			"status", resp.StatusCode,
+			"body", string(body))
+		return
+	}
+
+	a.logger.Info("discovery report submitted successfully",
+		"certificates", len(certs),
+		"errors", len(scanErrors))
+}
+
+// discoveredCertEntry holds parsed certificate metadata for reporting.
+type discoveredCertEntry struct {
+	FingerprintSHA256 string   `json:"fingerprint_sha256"`
+	CommonName        string   `json:"common_name"`
+	SANs              []string `json:"sans"`
+	SerialNumber      string   `json:"serial_number"`
+	IssuerDN          string   `json:"issuer_dn"`
+	SubjectDN         string   `json:"subject_dn"`
+	NotBefore         string   `json:"not_before"`
+	NotAfter          string   `json:"not_after"`
+	KeyAlgorithm      string   `json:"key_algorithm"`
+	KeySize           int      `json:"key_size"`
+	IsCA              bool     `json:"is_ca"`
+	PEMData           string   `json:"pem_data"`
+	SourcePath        string   `json:"source_path"`
+	SourceFormat      string   `json:"source_format"`
+}
+
+// parsePEMFile reads a file and extracts all X.509 certificates from PEM blocks.
+func (a *Agent) parsePEMFile(path string) []discoveredCertEntry {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		a.logger.Debug("failed to read file", "path", path, "error", err)
+		return nil
+	}
+
+	var entries []discoveredCertEntry
+	rest := data
+	for {
+		var block *pem.Block
+		block, rest = pem.Decode(rest)
+		if block == nil {
+			break
+		}
+		if block.Type != "CERTIFICATE" {
+			continue
+		}
+		cert, err := x509.ParseCertificate(block.Bytes)
+		if err != nil {
+			a.logger.Debug("failed to parse certificate in PEM", "path", path, "error", err)
+			continue
+		}
+
+		pemStr := string(pem.EncodeToMemory(block))
+		entries = append(entries, certToEntry(cert, path, "PEM", pemStr))
+	}
+	return entries
+}
+
+// parseDERFile reads a DER-encoded certificate file.
+func (a *Agent) parseDERFile(path string) (discoveredCertEntry, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return discoveredCertEntry{}, fmt.Errorf("read failed: %w", err)
+	}
+
+	cert, err := x509.ParseCertificate(data)
+	if err != nil {
+		return discoveredCertEntry{}, fmt.Errorf("parse failed: %w", err)
+	}
+
+	// Convert to PEM for storage
+	pemStr := string(pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: data}))
+	return certToEntry(cert, path, "DER", pemStr), nil
+}
+
+// certToEntry converts a parsed x509.Certificate into a discoveredCertEntry.
+func certToEntry(cert *x509.Certificate, path, format, pemData string) discoveredCertEntry {
+	// Compute SHA-256 fingerprint
+	fingerprint := fmt.Sprintf("%x", sha256Sum(cert.Raw))
+
+	// Determine key algorithm and size
+	keyAlg, keySize := certKeyInfo(cert)
+
+	return discoveredCertEntry{
+		FingerprintSHA256: fingerprint,
+		CommonName:        cert.Subject.CommonName,
+		SANs:              cert.DNSNames,
+		SerialNumber:      cert.SerialNumber.Text(16),
+		IssuerDN:          cert.Issuer.String(),
+		SubjectDN:         cert.Subject.String(),
+		NotBefore:         cert.NotBefore.UTC().Format(time.RFC3339),
+		NotAfter:          cert.NotAfter.UTC().Format(time.RFC3339),
+		KeyAlgorithm:      keyAlg,
+		KeySize:           keySize,
+		IsCA:              cert.IsCA,
+		PEMData:           pemData,
+		SourcePath:        path,
+		SourceFormat:      format,
+	}
+}
+
+// sha256Sum returns the SHA-256 hash of data.
+func sha256Sum(data []byte) [32]byte {
+	return sha256.Sum256(data)
+}
+
+// certKeyInfo extracts key algorithm name and size from a certificate.
+func certKeyInfo(cert *x509.Certificate) (string, int) {
+	switch pub := cert.PublicKey.(type) {
+	case *ecdsa.PublicKey:
+		return "ECDSA", pub.Curve.Params().BitSize
+	case *rsa.PublicKey:
+		return "RSA", pub.N.BitLen()
+	default:
+		switch cert.PublicKeyAlgorithm {
+		case x509.Ed25519:
+			return "Ed25519", 256
+		default:
+			return cert.PublicKeyAlgorithm.String(), 0
+		}
+	}
+}
@@ -1,3 +1,6 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
@@ -1,18 +1,14 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
 	"bytes"
 	"context"
-	"crypto/ecdsa"
-	"crypto/elliptic"
-	"crypto/rand"
-	"crypto/rsa"
-	"crypto/sha256"
 	"crypto/tls"
 	"crypto/x509"
-	"crypto/x509/pkix"
 	"encoding/json"
-	"encoding/pem"
 	"errors"
 	"flag"
 	"fmt"
@@ -23,29 +19,11 @@ import (
 	"net/url"
 	"os"
 	"os/signal"
-	"path/filepath"
 	"runtime"
 	"strings"
 	"sync"
 	"syscall"
 	"time"
-
-	"github.com/certctl-io/certctl/internal/connector/target"
-	"github.com/certctl-io/certctl/internal/connector/target/apache"
-	"github.com/certctl-io/certctl/internal/connector/target/awsacm"
-	"github.com/certctl-io/certctl/internal/connector/target/azurekv"
-	"github.com/certctl-io/certctl/internal/connector/target/caddy"
-	"github.com/certctl-io/certctl/internal/connector/target/envoy"
-	"github.com/certctl-io/certctl/internal/connector/target/f5"
-	"github.com/certctl-io/certctl/internal/connector/target/haproxy"
-	"github.com/certctl-io/certctl/internal/connector/target/iis"
-	jks "github.com/certctl-io/certctl/internal/connector/target/javakeystore"
-	k8s "github.com/certctl-io/certctl/internal/connector/target/k8ssecret"
-	"github.com/certctl-io/certctl/internal/connector/target/nginx"
-	pf "github.com/certctl-io/certctl/internal/connector/target/postfix"
-	sshconn "github.com/certctl-io/certctl/internal/connector/target/ssh"
-	"github.com/certctl-io/certctl/internal/connector/target/traefik"
-	wcs "github.com/certctl-io/certctl/internal/connector/target/wincertstore"
 )

 // AgentConfig represents the agent-side configuration.
@@ -64,7 +42,7 @@ type AgentConfig struct {
 // ErrAgentRetired is the sentinel returned by [Agent.Run] when the control
 // plane responds with HTTP 410 Gone to a heartbeat or work-poll request — the
 // canonical signal that this agent's row has been soft-retired server-side
-// (see I-004 in cowork/certctl-coverage-gap-audit.md). The binary must
+// (see I-004 in the project's coverage-gap audit). The binary must
 // terminate cleanly: an init-system restart would only produce another 410
 // and wedge the host in a restart loop. main() translates this sentinel into
 // a zero exit code so systemd (Restart=on-failure) and launchd do not respawn
@@ -391,598 +369,6 @@ func (a *Agent) sendHeartbeat(ctx context.Context) {
 	a.logger.Debug("heartbeat acknowledged")
 }

-// pollForWork queries the control plane for actionable jobs and processes them.
-// Jobs may be deployment jobs (Pending) or CSR jobs (AwaitingCSR).
-// GET /api/v1/agents/{agentID}/work
-func (a *Agent) pollForWork(ctx context.Context) {
-	a.logger.Debug("polling for work", "agent_id", a.config.AgentID)
-
-	path := fmt.Sprintf("/api/v1/agents/%s/work", a.config.AgentID)
-	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
-	if err != nil {
-		a.logger.Error("work poll failed", "error", err)
-		a.consecutiveFailures++
-		return
-	}
-	defer resp.Body.Close()
-
-	// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
-	// other hot path that can observe an agent's soft-retirement; if the
-	// heartbeat tick happens to fire after a work-poll tick within the same
-	// retirement window, this branch catches it first. markRetired's sync.Once
-	// guards idempotency so racing both paths in the same tick only closes the
-	// signal channel once. No consecutiveFailures increment — retirement is
-	// not a transient failure.
-	if resp.StatusCode == http.StatusGone {
-		body, _ := io.ReadAll(resp.Body)
-		a.markRetired("work_poll", resp.StatusCode, string(body))
-		return
-	}
-
-	if resp.StatusCode != http.StatusOK {
-		body, _ := io.ReadAll(resp.Body)
-		a.logger.Error("work poll rejected",
-			"status", resp.StatusCode,
-			"body", string(body))
-		a.consecutiveFailures++
-		return
-	}
-
-	var workResp WorkResponse
-	if err := json.NewDecoder(resp.Body).Decode(&workResp); err != nil {
-		a.logger.Error("failed to decode work response", "error", err)
-		a.consecutiveFailures++
-		return
-	}
-
-	a.consecutiveFailures = 0
-
-	if workResp.Count == 0 {
-		a.logger.Debug("no pending work")
-		return
-	}
-
-	a.logger.Info("received work", "job_count", workResp.Count)
-
-	// Process each job based on type and status
-	for _, job := range workResp.Jobs {
-		switch {
-		case job.Status == "AwaitingCSR":
-			// Agent keygen mode: generate key locally, create CSR, submit to server
-			a.executeCSRJob(ctx, job)
-		case job.Type == "Deployment":
-			a.executeDeploymentJob(ctx, job)
-		}
-	}
-}
-
-// executeCSRJob handles an AwaitingCSR job: generates a private key locally, creates a CSR,
-// and submits it to the control plane for signing. The private key is stored on the local
-// filesystem with 0600 permissions and NEVER sent to the server.
-//
-// Flow:
-// 1. Generate ECDSA P-256 key pair
-// 2. Store private key to disk (keyDir/certID.key) with 0600 permissions
-// 3. Create CSR with common name and SANs from work response
-// 4. Submit CSR to control plane via POST /agents/{id}/csr
-// 5. Server signs the CSR and creates a cert version + deployment jobs
-func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
-	a.logger.Info("executing CSR job (agent-side key generation)",
-		"job_id", job.ID,
-		"certificate_id", job.CertificateID,
-		"common_name", job.CommonName)
-
-	// Step 1: Generate ECDSA P-256 key pair
-	privKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
-	if err != nil {
-		a.logger.Error("failed to generate private key",
-			"job_id", job.ID,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key generation failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-
-	a.logger.Info("generated ECDSA P-256 key pair locally",
-		"job_id", job.ID,
-		"certificate_id", job.CertificateID)
-
-	// Step 2: Store private key to disk with secure permissions.
-	//
-	// Bundle-9 / Audit L-002 + L-003: marshal+write through helpers that
-	// (a) zeroize the in-heap DER buffer immediately after the PEM block is
-	// constructed so the private scalar's exposure window is bounded by
-	// this function call, and (b) assert the key directory is mode 0700
-	// before any write touches disk. Also defer-clear the PEM buffer for
-	// the same reason — the encoded key isn't sensitive in transit (it's
-	// going to disk) but lingers on the heap if we don't.
-	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
-	if err := ensureAgentKeyDirSecure(filepath.Dir(keyPath)); err != nil {
-		a.logger.Error("agent key dir hardening failed", "job_id", job.ID, "error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key dir hardening failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-	var privKeyPEM []byte
-	if marshalErr := marshalAgentKeyAndZeroize(privKey, func(der []byte) error {
-		privKeyPEM = pem.EncodeToMemory(&pem.Block{
-			Type:  "EC PRIVATE KEY",
-			Bytes: der,
-		})
-		return nil
-	}); marshalErr != nil {
-		a.logger.Error("failed to marshal private key",
-			"job_id", job.ID,
-			"error", marshalErr)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key marshal failed: %v", marshalErr)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-	defer clear(privKeyPEM)
-
-	if err := os.WriteFile(keyPath, privKeyPEM, 0600); err != nil {
-		a.logger.Error("failed to write private key to disk",
-			"job_id", job.ID,
-			"key_path", keyPath,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key storage failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-
-	a.logger.Info("private key stored securely",
-		"job_id", job.ID,
-		"key_path", keyPath,
-		"permissions", "0600")
-
-	// Validate common name is present
-	if job.CommonName == "" {
-		a.logger.Error("empty common name in CSR job", "job_id", job.ID)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", "empty common name"); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
-		}
-		return
-	}
-
-	// Step 3: Create CSR with common name and SANs
-	// Split SANs into DNS names and email addresses for proper CSR encoding
-	var dnsNames []string
-	var emailAddresses []string
-	for _, san := range job.SANs {
-		if strings.Contains(san, "@") {
-			emailAddresses = append(emailAddresses, san)
-		} else {
-			dnsNames = append(dnsNames, san)
-		}
-	}
-
-	csrTemplate := &x509.CertificateRequest{
-		Subject: pkix.Name{
-			CommonName: job.CommonName,
-		},
-		DNSNames:       dnsNames,
-		EmailAddresses: emailAddresses,
-	}
-
-	csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
-	if err != nil {
-		a.logger.Error("failed to create CSR",
-			"job_id", job.ID,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR creation failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-
-	csrPEM := string(pem.EncodeToMemory(&pem.Block{
-		Type:  "CERTIFICATE REQUEST",
-		Bytes: csrDER,
-	}))
-
-	// Step 4: Submit CSR to the control plane (only the public key leaves the agent)
-	a.logger.Info("submitting CSR to control plane",
-		"job_id", job.ID,
-		"certificate_id", job.CertificateID)
-
-	submitPath := fmt.Sprintf("/api/v1/agents/%s/csr", a.config.AgentID)
-	resp, err := a.makeRequest(ctx, http.MethodPost, submitPath, map[string]string{
-		"csr_pem":        csrPEM,
-		"certificate_id": job.CertificateID,
-	})
-	if err != nil {
-		a.logger.Error("failed to submit CSR",
-			"job_id", job.ID,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR submission failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusAccepted {
-		body, _ := io.ReadAll(resp.Body)
-		a.logger.Error("CSR submission rejected",
-			"job_id", job.ID,
-			"status", resp.StatusCode,
-			"body", string(body))
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR rejected: %s", string(body))); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-
-	a.logger.Info("CSR submitted and signed successfully",
-		"job_id", job.ID,
-		"certificate_id", job.CertificateID,
-		"key_path", keyPath)
-}
-
-// executeDeploymentJob executes a deployment job by fetching the certificate and deploying it
-// to the target system using the appropriate connector (NGINX, F5 BIG-IP, or IIS).
-//
-// For agent keygen mode, the private key is read from the local key store (keyDir/certID.key)
-// rather than fetched from the server. The deployment includes the locally-held key.
-//
-// Flow:
-// 1. Report job as Running
-// 2. Fetch the certificate PEM from the control plane
-// 3. Load local private key if it exists (agent keygen mode)
-// 4. Instantiate the target connector based on target_type from the work response
-// 5. Call DeployCertificate on the connector
-// 6. Report job as Completed (or Failed)
-func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
-	a.logger.Info("executing deployment job",
-		"job_id", job.ID,
-		"certificate_id", job.CertificateID,
-		"target_type", job.TargetType)
-
-	// Report job as running
-	if err := a.reportJobStatus(ctx, job.ID, "Running", ""); err != nil {
-		a.logger.Error("failed to report job running", "error", err)
-	}
-
-	// Fetch the certificate from the control plane
-	certPEM, err := a.fetchCertificate(ctx, job.CertificateID)
-	if err != nil {
-		a.logger.Error("failed to fetch certificate",
-			"job_id", job.ID,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("cert fetch failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-		}
-		return
-	}
-
-	a.logger.Info("certificate fetched for deployment",
-		"job_id", job.ID,
-		"cert_length", len(certPEM))
-
-	// Split PEM into cert and chain (separated by double newline between PEM blocks)
-	certOnly, chainPEM := splitPEMChain(certPEM)
-
-	// Check for locally-stored private key (agent keygen mode)
-	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
-	var keyPEM string
-	keyData, err := os.ReadFile(keyPath)
-	if err != nil {
-		a.logger.Error("failed to read local private key for deployment",
-			"job_id", job.ID,
-			"key_path", keyPath,
-			"error", err)
-		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key read failed: %v", err)); reportErr != nil {
-			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
-		}
-		return
-	}
-	keyPEM = string(keyData)
-	a.logger.Info("loaded local private key for deployment",
-		"job_id", job.ID,
-		"key_path", keyPath)
-
-	// Deploy to the target using the appropriate connector
-	if job.TargetType != "" {
-		connector, err := a.createTargetConnector(ctx, job.TargetType, job.TargetConfig)
-		if err != nil {
-			a.logger.Error("failed to create target connector",
-				"job_id", job.ID,
-				"target_type", job.TargetType,
-				"error", err)
-			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("connector init failed: %v", err)); reportErr != nil {
-				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-			}
-			return
-		}
-
-		deployReq := target.DeploymentRequest{
-			CertPEM:      certOnly,
-			KeyPEM:       keyPEM,
-			ChainPEM:     chainPEM,
-			TargetConfig: job.TargetConfig,
-			Metadata: map[string]string{
-				"certificate_id": job.CertificateID,
-				"job_id":         job.ID,
-			},
-		}
-
-		// Phase 2 of the deploy-hardening I master bundle:
-		// per-target deploy mutex. Acquire BEFORE
-		// DeployCertificate so two concurrent renewals against
-		// the same target ID serialize. The lock is held for the
-		// full Deploy duration including PreCommit (validate),
-		// PostCommit (reload), and post-deploy verify (Phases
-		// 4-9). Released on every return path via defer.
-		var targetID string
-		if job.TargetID != nil {
-			targetID = *job.TargetID
-		}
-		if mu := a.targetDeployMutex(targetID); mu != nil {
-			mu.Lock()
-			defer mu.Unlock()
-		}
-
-		result, err := connector.DeployCertificate(ctx, deployReq)
-		if err != nil {
-			a.logger.Error("deployment failed",
-				"job_id", job.ID,
-				"target_type", job.TargetType,
-				"error", err)
-			if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("deployment failed: %v", err)); reportErr != nil {
-				a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
-			}
-			return
-		}
-
-		a.logger.Info("target connector deployment completed",
-			"job_id", job.ID,
-			"target_type", job.TargetType,
-			"success", result.Success,
-			"message", result.Message)
-
-		// If verification is enabled, verify the deployment by probing the live TLS endpoint
-		targetHost, targetPort, err := extractTargetHostAndPort(job.TargetConfig)
-		if err != nil {
-			a.logger.Warn("could not extract target host/port for verification",
-				"job_id", job.ID,
-				"error", err)
-		} else {
-			a.verifyAndReportDeployment(ctx, job, targetHost, targetPort, certOnly)
-		}
-	} else {
-		a.logger.Info("no target type specified, skipping connector invocation",
-			"job_id", job.ID)
-	}
-
-	// Report job as completed
-	if err := a.reportJobStatus(ctx, job.ID, "Completed", ""); err != nil {
-		a.logger.Error("failed to report job completed", "error", err)
-		return
-	}
-
-	a.logger.Info("deployment job completed", "job_id", job.ID)
-}
-
-// createTargetConnector instantiates the appropriate target connector based on type.
-// ctx is threaded into SDK-driven connectors (AWSACM, AzureKeyVault) so credential
-// resolution honors caller cancellation / deadlines instead of using a fresh
-// context.Background() (the contextcheck linter enforces this — the original Rank 5
-// implementation used Background() and tripped CI on commit 502823d).
-func (a *Agent) createTargetConnector(ctx context.Context, targetType string, configJSON json.RawMessage) (target.Connector, error) {
-	switch targetType {
-	case "NGINX":
-		var cfg nginx.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid NGINX config: %w", err)
-			}
-		}
-		return nginx.New(&cfg, a.logger), nil
-
-	case "Apache":
-		var cfg apache.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Apache config: %w", err)
-			}
-		}
-		return apache.New(&cfg, a.logger), nil
-
-	case "HAProxy":
-		var cfg haproxy.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid HAProxy config: %w", err)
-			}
-		}
-		return haproxy.New(&cfg, a.logger), nil
-
-	case "F5":
-		var cfg f5.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid F5 config: %w", err)
-			}
-		}
-		conn, err := f5.New(&cfg, a.logger)
-		if err != nil {
-			return nil, fmt.Errorf("failed to create F5 connector: %w", err)
-		}
-		return conn, nil
-
-	case "IIS":
-		var cfg iis.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid IIS config: %w", err)
-			}
-		}
-		return iis.New(&cfg, a.logger)
-
-	case "Traefik":
-		var cfg traefik.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Traefik config: %w", err)
-			}
-		}
-		return traefik.New(&cfg, a.logger), nil
-
-	case "Caddy":
-		var cfg caddy.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Caddy config: %w", err)
-			}
-		}
-		return caddy.New(&cfg, a.logger), nil
-
-	case "Envoy":
-		var cfg envoy.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Envoy config: %w", err)
-			}
-		}
-		return envoy.New(&cfg, a.logger), nil
-
-	case "Postfix":
-		var cfg pf.Config
-		cfg.Mode = "postfix"
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Postfix config: %w", err)
-			}
-		}
-		return pf.New(&cfg, a.logger), nil
-
-	case "Dovecot":
-		var cfg pf.Config
-		cfg.Mode = "dovecot"
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid Dovecot config: %w", err)
-			}
-		}
-		return pf.New(&cfg, a.logger), nil
-
-	case "SSH":
-		var cfg sshconn.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid SSH config: %w", err)
-			}
-		}
-		return sshconn.New(&cfg, a.logger)
-
-	case "WinCertStore":
-		var cfg wcs.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid WinCertStore config: %w", err)
-			}
-		}
-		return wcs.New(&cfg, a.logger)
-
-	case "JavaKeystore":
-		var cfg jks.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid JavaKeystore config: %w", err)
-			}
-		}
-		return jks.New(&cfg, a.logger), nil
-
-	case "KubernetesSecrets":
-		var cfg k8s.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid KubernetesSecrets config: %w", err)
-			}
-		}
-		return k8s.New(&cfg, a.logger)
-
-	case "AWSACM":
-		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
-		// AWS Certificate Manager target — SDK-driven (no file I/O).
-		// LoadDefaultConfig handles the standard AWS credential chain
-		// (IRSA / EC2 instance profile / SSO / env vars) without any
-		// long-lived creds in connector Config.
-		var cfg awsacm.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid AWSACM config: %w", err)
-			}
-		}
-		return awsacm.New(ctx, &cfg, a.logger)
-
-	case "AzureKeyVault":
-		// Rank 5 of the 2026-05-03 Infisical deep-research deliverable.
-		// Azure Key Vault target — SDK-driven (no file I/O).
-		// DefaultAzureCredential handles the standard Azure credential
-		// chain (managed identity / workload identity / env vars / az
-		// CLI fallback). Long-lived service-principal secrets are
-		// supported but discouraged via the credential_mode config.
-		var cfg azurekv.Config
-		if len(configJSON) > 0 {
-			if err := json.Unmarshal(configJSON, &cfg); err != nil {
-				return nil, fmt.Errorf("invalid AzureKeyVault config: %w", err)
-			}
-		}
-		return azurekv.New(ctx, &cfg, a.logger)
-
-	default:
-		return nil, fmt.Errorf("unsupported target type: %s", targetType)
-	}
-}
-
-// splitPEMChain splits a PEM chain into the first certificate (cert) and the rest (chain).
-// The control plane returns the full chain as a single string with PEM blocks concatenated.
-func splitPEMChain(pemChain string) (string, string) {
-	data := []byte(pemChain)
-	block, rest := pem.Decode(data)
-	if block == nil {
-		return pemChain, ""
-	}
-	cert := string(pem.EncodeToMemory(block))
-
-	// Skip whitespace between cert and chain
-	chain := strings.TrimSpace(string(rest))
-	if chain == "" {
-		return cert, ""
-	}
-	return cert, chain
-}
-
-// fetchCertificate retrieves the certificate PEM chain from the control plane.
-// GET /api/v1/agents/{agentID}/certificates/{certID}
-func (a *Agent) fetchCertificate(ctx context.Context, certID string) (string, error) {
-	path := fmt.Sprintf("/api/v1/agents/%s/certificates/%s", a.config.AgentID, certID)
-	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
-	if err != nil {
-		return "", fmt.Errorf("request failed: %w", err)
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK {
-		body, _ := io.ReadAll(resp.Body)
-		return "", fmt.Errorf("server returned %d: %s", resp.StatusCode, string(body))
-	}
-
-	var certResp struct {
-		CertificatePEM string `json:"certificate_pem"`
-	}
-	if err := json.NewDecoder(resp.Body).Decode(&certResp); err != nil {
-		return "", fmt.Errorf("failed to decode response: %w", err)
-	}
-
-	return certResp.CertificatePEM, nil
-}
-
 // reportJobStatus reports the result of a job back to the control plane.
 // POST /api/v1/agents/{agentID}/jobs/{jobID}/status
 func (a *Agent) reportJobStatus(ctx context.Context, jobID string, status string, errorMsg string) error {
@@ -1044,239 +430,6 @@ func (a *Agent) makeRequest(ctx context.Context, method, path string, body inter
 	return resp, nil
 }

-// runDiscoveryScan walks configured directories, parses certificate files, and reports
-// discovered certificates to the control plane.
-// Supports PEM and DER encoded X.509 certificates.
-func (a *Agent) runDiscoveryScan(ctx context.Context) {
-	a.logger.Info("starting filesystem certificate discovery scan",
-		"directories", a.config.DiscoveryDirs)
-
-	startTime := time.Now()
-	var certs []discoveredCertEntry
-	var scanErrors []string
-
-	for _, dir := range a.config.DiscoveryDirs {
-		a.logger.Debug("scanning directory", "path", dir)
-
-		err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
-			if err != nil {
-				scanErrors = append(scanErrors, fmt.Sprintf("walk error at %s: %v", path, err))
-				return nil // continue walking
-			}
-			if info.IsDir() {
-				return nil
-			}
-
-			// Skip files larger than 1MB (unlikely to be a certificate)
-			if info.Size() > 1*1024*1024 {
-				return nil
-			}
-
-			// Check file extension
-			ext := strings.ToLower(filepath.Ext(path))
-			switch ext {
-			case ".pem", ".crt", ".cer", ".cert":
-				found := a.parsePEMFile(path)
-				certs = append(certs, found...)
-			case ".der":
-				if entry, err := a.parseDERFile(path); err == nil {
-					certs = append(certs, entry)
-				} else {
-					a.logger.Debug("skipping non-cert DER file", "path", path, "error", err)
-				}
-			default:
-				// Try PEM parsing for extensionless files or unknown extensions
-				if ext == "" || ext == ".key" {
-					return nil // skip key files and extensionless
-				}
-				found := a.parsePEMFile(path)
-				if len(found) > 0 {
-					certs = append(certs, found...)
-				}
-			}
-			return nil
-		})
-		if err != nil {
-			scanErrors = append(scanErrors, fmt.Sprintf("failed to walk %s: %v", dir, err))
-		}
-	}
-
-	scanDuration := time.Since(startTime)
-	a.logger.Info("discovery scan completed",
-		"certificates_found", len(certs),
-		"errors", len(scanErrors),
-		"duration_ms", scanDuration.Milliseconds())
-
-	if len(certs) == 0 && len(scanErrors) == 0 {
-		a.logger.Debug("no certificates found and no errors, skipping report")
-		return
-	}
-
-	// Build report payload
-	entries := make([]map[string]interface{}, len(certs))
-	for i, c := range certs {
-		entries[i] = map[string]interface{}{
-			"fingerprint_sha256": c.FingerprintSHA256,
-			"common_name":        c.CommonName,
-			"sans":               c.SANs,
-			"serial_number":      c.SerialNumber,
-			"issuer_dn":          c.IssuerDN,
-			"subject_dn":         c.SubjectDN,
-			"not_before":         c.NotBefore,
-			"not_after":          c.NotAfter,
-			"key_algorithm":      c.KeyAlgorithm,
-			"key_size":           c.KeySize,
-			"is_ca":              c.IsCA,
-			"pem_data":           c.PEMData,
-			"source_path":        c.SourcePath,
-			"source_format":      c.SourceFormat,
-		}
-	}
-
-	report := map[string]interface{}{
-		"agent_id":         a.config.AgentID,
-		"directories":      a.config.DiscoveryDirs,
-		"certificates":     entries,
-		"errors":           scanErrors,
-		"scan_duration_ms": int(scanDuration.Milliseconds()),
-	}
-
-	// Submit to control plane
-	path := fmt.Sprintf("/api/v1/agents/%s/discoveries", a.config.AgentID)
-	resp, err := a.makeRequest(ctx, http.MethodPost, path, report)
-	if err != nil {
-		a.logger.Error("failed to submit discovery report", "error", err)
-		return
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusAccepted {
-		body, _ := io.ReadAll(resp.Body)
-		a.logger.Error("discovery report rejected",
-			"status", resp.StatusCode,
-			"body", string(body))
-		return
-	}
-
-	a.logger.Info("discovery report submitted successfully",
-		"certificates", len(certs),
-		"errors", len(scanErrors))
-}
-
-// discoveredCertEntry holds parsed certificate metadata for reporting.
-type discoveredCertEntry struct {
-	FingerprintSHA256 string   `json:"fingerprint_sha256"`
-	CommonName        string   `json:"common_name"`
-	SANs              []string `json:"sans"`
-	SerialNumber      string   `json:"serial_number"`
-	IssuerDN          string   `json:"issuer_dn"`
-	SubjectDN         string   `json:"subject_dn"`
-	NotBefore         string   `json:"not_before"`
-	NotAfter          string   `json:"not_after"`
-	KeyAlgorithm      string   `json:"key_algorithm"`
-	KeySize           int      `json:"key_size"`
-	IsCA              bool     `json:"is_ca"`
-	PEMData           string   `json:"pem_data"`
-	SourcePath        string   `json:"source_path"`
-	SourceFormat      string   `json:"source_format"`
-}
-
-// parsePEMFile reads a file and extracts all X.509 certificates from PEM blocks.
-func (a *Agent) parsePEMFile(path string) []discoveredCertEntry {
-	data, err := os.ReadFile(path)
-	if err != nil {
-		a.logger.Debug("failed to read file", "path", path, "error", err)
-		return nil
-	}
-
-	var entries []discoveredCertEntry
-	rest := data
-	for {
-		var block *pem.Block
-		block, rest = pem.Decode(rest)
-		if block == nil {
-			break
-		}
-		if block.Type != "CERTIFICATE" {
-			continue
-		}
-		cert, err := x509.ParseCertificate(block.Bytes)
-		if err != nil {
-			a.logger.Debug("failed to parse certificate in PEM", "path", path, "error", err)
-			continue
-		}
-
-		pemStr := string(pem.EncodeToMemory(block))
-		entries = append(entries, certToEntry(cert, path, "PEM", pemStr))
-	}
-	return entries
-}
-
-// parseDERFile reads a DER-encoded certificate file.
-func (a *Agent) parseDERFile(path string) (discoveredCertEntry, error) {
-	data, err := os.ReadFile(path)
-	if err != nil {
-		return discoveredCertEntry{}, fmt.Errorf("read failed: %w", err)
-	}
-
-	cert, err := x509.ParseCertificate(data)
-	if err != nil {
-		return discoveredCertEntry{}, fmt.Errorf("parse failed: %w", err)
-	}
-
-	// Convert to PEM for storage
-	pemStr := string(pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: data}))
-	return certToEntry(cert, path, "DER", pemStr), nil
-}
-
-// certToEntry converts a parsed x509.Certificate into a discoveredCertEntry.
-func certToEntry(cert *x509.Certificate, path, format, pemData string) discoveredCertEntry {
-	// Compute SHA-256 fingerprint
-	fingerprint := fmt.Sprintf("%x", sha256Sum(cert.Raw))
-
-	// Determine key algorithm and size
-	keyAlg, keySize := certKeyInfo(cert)
-
-	return discoveredCertEntry{
-		FingerprintSHA256: fingerprint,
-		CommonName:        cert.Subject.CommonName,
-		SANs:              cert.DNSNames,
-		SerialNumber:      cert.SerialNumber.Text(16),
-		IssuerDN:          cert.Issuer.String(),
-		SubjectDN:         cert.Subject.String(),
-		NotBefore:         cert.NotBefore.UTC().Format(time.RFC3339),
-		NotAfter:          cert.NotAfter.UTC().Format(time.RFC3339),
-		KeyAlgorithm:      keyAlg,
-		KeySize:           keySize,
-		IsCA:              cert.IsCA,
-		PEMData:           pemData,
-		SourcePath:        path,
-		SourceFormat:      format,
-	}
-}
-
-// sha256Sum returns the SHA-256 hash of data.
-func sha256Sum(data []byte) [32]byte {
-	return sha256.Sum256(data)
-}
-
-// certKeyInfo extracts key algorithm name and size from a certificate.
-func certKeyInfo(cert *x509.Certificate) (string, int) {
-	switch pub := cert.PublicKey.(type) {
-	case *ecdsa.PublicKey:
-		return "ECDSA", pub.Curve.Params().BitSize
-	case *rsa.PublicKey:
-		return "RSA", pub.N.BitLen()
-	default:
-		switch cert.PublicKeyAlgorithm {
-		case x509.Ed25519:
-			return "Ed25519", 256
-		default:
-			return cert.PublicKeyAlgorithm.String(), 0
-		}
-	}
-}
-
 func main() {
 	// Parse command-line flags (with env var fallbacks for Docker deployment)
 	serverURL := flag.String("server", getEnvDefault("CERTCTL_SERVER_URL", "https://localhost:8443"), "Control plane server URL (must be https://)")
@@ -0,0 +1,278 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"context"
+	"crypto/ecdsa"
+	"crypto/elliptic"
+	"crypto/rand"
+	"crypto/x509"
+	"crypto/x509/pkix"
+	"encoding/json"
+	"encoding/pem"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"path/filepath"
+	"strings"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 12 (2026-05-14): extracted from
+// cmd/agent/main.go via the Option B sibling-file pattern (mirrors
+// the Sprint 8 cmd/server cut). Package stays `main`; all methods
+// are still defined on *Agent so every call site continues to
+// resolve through Go's same-package method-set without any
+// import-path change.
+//
+// This file holds the WORK-POLLING entry point + CSR-job execution
+// — the inbound side of the agent's pull-only deployment model
+// (per CLAUDE.md "Pull-only deployment model" architecture
+// decision):
+//
+//   - pollForWork: queries GET /api/v1/agents/{id}/work each tick;
+//     dispatches each returned JobItem to the appropriate
+//     executor (CSR vs deployment).
+//   - executeCSRJob: handles AwaitingCSR jobs by generating an
+//     ECDSA P-256 key locally, persisting it to keyDir/<certID>.key
+//     with 0600 permissions (key NEVER leaves the agent — see
+//     CLAUDE.md "Agent-based key management"), creating the CSR,
+//     and POSTing it to the control plane for signing.
+//
+// The deployment-job executor lives in deploy.go alongside the
+// target connector factory + deploy-only helpers (splitPEMChain,
+// fetchCertificate). The discovery scan lives in discovery.go.
+
+// pollForWork queries the control plane for actionable jobs and processes them.
+// Jobs may be deployment jobs (Pending) or CSR jobs (AwaitingCSR).
+// GET /api/v1/agents/{agentID}/work
+func (a *Agent) pollForWork(ctx context.Context) {
+	a.logger.Debug("polling for work", "agent_id", a.config.AgentID)
+
+	path := fmt.Sprintf("/api/v1/agents/%s/work", a.config.AgentID)
+	resp, err := a.makeRequest(ctx, http.MethodGet, path, nil)
+	if err != nil {
+		a.logger.Error("work poll failed", "error", err)
+		a.consecutiveFailures++
+		return
+	}
+	defer resp.Body.Close()
+
+	// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
+	// other hot path that can observe an agent's soft-retirement; if the
+	// heartbeat tick happens to fire after a work-poll tick within the same
+	// retirement window, this branch catches it first. markRetired's sync.Once
+	// guards idempotency so racing both paths in the same tick only closes the
+	// signal channel once. No consecutiveFailures increment — retirement is
+	// not a transient failure.
+	if resp.StatusCode == http.StatusGone {
+		body, _ := io.ReadAll(resp.Body)
+		a.markRetired("work_poll", resp.StatusCode, string(body))
+		return
+	}
+
+	if resp.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(resp.Body)
+		a.logger.Error("work poll rejected",
+			"status", resp.StatusCode,
+			"body", string(body))
+		a.consecutiveFailures++
+		return
+	}
+
+	var workResp WorkResponse
+	if err := json.NewDecoder(resp.Body).Decode(&workResp); err != nil {
+		a.logger.Error("failed to decode work response", "error", err)
+		a.consecutiveFailures++
+		return
+	}
+
+	a.consecutiveFailures = 0
+
+	if workResp.Count == 0 {
+		a.logger.Debug("no pending work")
+		return
+	}
+
+	a.logger.Info("received work", "job_count", workResp.Count)
+
+	// Process each job based on type and status
+	for _, job := range workResp.Jobs {
+		switch {
+		case job.Status == "AwaitingCSR":
+			// Agent keygen mode: generate key locally, create CSR, submit to server
+			a.executeCSRJob(ctx, job)
+		case job.Type == "Deployment":
+			a.executeDeploymentJob(ctx, job)
+		}
+	}
+}
+
+// executeCSRJob handles an AwaitingCSR job: generates a private key locally, creates a CSR,
+// and submits it to the control plane for signing. The private key is stored on the local
+// filesystem with 0600 permissions and NEVER sent to the server.
+//
+// Flow:
+// 1. Generate ECDSA P-256 key pair
+// 2. Store private key to disk (keyDir/certID.key) with 0600 permissions
+// 3. Create CSR with common name and SANs from work response
+// 4. Submit CSR to control plane via POST /agents/{id}/csr
+// 5. Server signs the CSR and creates a cert version + deployment jobs
+func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
+	a.logger.Info("executing CSR job (agent-side key generation)",
+		"job_id", job.ID,
+		"certificate_id", job.CertificateID,
+		"common_name", job.CommonName)
+
+	// Step 1: Generate ECDSA P-256 key pair
+	privKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
+	if err != nil {
+		a.logger.Error("failed to generate private key",
+			"job_id", job.ID,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key generation failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+
+	a.logger.Info("generated ECDSA P-256 key pair locally",
+		"job_id", job.ID,
+		"certificate_id", job.CertificateID)
+
+	// Step 2: Store private key to disk with secure permissions.
+	//
+	// Bundle-9 / Audit L-002 + L-003: marshal+write through helpers that
+	// (a) zeroize the in-heap DER buffer immediately after the PEM block is
+	// constructed so the private scalar's exposure window is bounded by
+	// this function call, and (b) assert the key directory is mode 0700
+	// before any write touches disk. Also defer-clear the PEM buffer for
+	// the same reason — the encoded key isn't sensitive in transit (it's
+	// going to disk) but lingers on the heap if we don't.
+	keyPath := filepath.Join(a.config.KeyDir, job.CertificateID+".key")
+	if err := ensureAgentKeyDirSecure(filepath.Dir(keyPath)); err != nil {
+		a.logger.Error("agent key dir hardening failed", "job_id", job.ID, "error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key dir hardening failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+	var privKeyPEM []byte
+	if marshalErr := marshalAgentKeyAndZeroize(privKey, func(der []byte) error {
+		privKeyPEM = pem.EncodeToMemory(&pem.Block{
+			Type:  "EC PRIVATE KEY",
+			Bytes: der,
+		})
+		return nil
+	}); marshalErr != nil {
+		a.logger.Error("failed to marshal private key",
+			"job_id", job.ID,
+			"error", marshalErr)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key marshal failed: %v", marshalErr)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+	defer clear(privKeyPEM)
+
+	if err := os.WriteFile(keyPath, privKeyPEM, 0600); err != nil {
+		a.logger.Error("failed to write private key to disk",
+			"job_id", job.ID,
+			"key_path", keyPath,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("key storage failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+
+	a.logger.Info("private key stored securely",
+		"job_id", job.ID,
+		"key_path", keyPath,
+		"permissions", "0600")
+
+	// Validate common name is present
+	if job.CommonName == "" {
+		a.logger.Error("empty common name in CSR job", "job_id", job.ID)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", "empty common name"); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "error", reportErr)
+		}
+		return
+	}
+
+	// Step 3: Create CSR with common name and SANs
+	// Split SANs into DNS names and email addresses for proper CSR encoding
+	var dnsNames []string
+	var emailAddresses []string
+	for _, san := range job.SANs {
+		if strings.Contains(san, "@") {
+			emailAddresses = append(emailAddresses, san)
+		} else {
+			dnsNames = append(dnsNames, san)
+		}
+	}
+
+	csrTemplate := &x509.CertificateRequest{
+		Subject: pkix.Name{
+			CommonName: job.CommonName,
+		},
+		DNSNames:       dnsNames,
+		EmailAddresses: emailAddresses,
+	}
+
+	csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
+	if err != nil {
+		a.logger.Error("failed to create CSR",
+			"job_id", job.ID,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR creation failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+
+	csrPEM := string(pem.EncodeToMemory(&pem.Block{
+		Type:  "CERTIFICATE REQUEST",
+		Bytes: csrDER,
+	}))
+
+	// Step 4: Submit CSR to the control plane (only the public key leaves the agent)
+	a.logger.Info("submitting CSR to control plane",
+		"job_id", job.ID,
+		"certificate_id", job.CertificateID)
+
+	submitPath := fmt.Sprintf("/api/v1/agents/%s/csr", a.config.AgentID)
+	resp, err := a.makeRequest(ctx, http.MethodPost, submitPath, map[string]string{
+		"csr_pem":        csrPEM,
+		"certificate_id": job.CertificateID,
+	})
+	if err != nil {
+		a.logger.Error("failed to submit CSR",
+			"job_id", job.ID,
+			"error", err)
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR submission failed: %v", err)); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusAccepted {
+		body, _ := io.ReadAll(resp.Body)
+		a.logger.Error("CSR submission rejected",
+			"job_id", job.ID,
+			"status", resp.StatusCode,
+			"body", string(body))
+		if reportErr := a.reportJobStatus(ctx, job.ID, "Failed", fmt.Sprintf("CSR rejected: %s", string(body))); reportErr != nil {
+			a.logger.Error("failed to report job status to server", "job_id", job.ID, "status", "Failed", "error", reportErr)
+		}
+		return
+	}
+
+	a.logger.Info("CSR submitted and signed successfully",
+		"job_id", job.ID,
+		"certificate_id", job.CertificateID,
+		"key_path", keyPath)
+}
@@ -1,3 +1,6 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
@@ -163,14 +163,79 @@ func TestHandleCerts_Revoke_HitsClientPath(t *testing.T) {
 	}))
 	t.Cleanup(srv.Close)
 	c := newDispatchTestClient(t, srv)
-	if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"}); err != nil {
+	// 2026-05-05 parity-defaults-cleanup (P3-2): reason must be a canonical
+	// RFC 5280 §5.3.1 code (camelCase or snake_case both accepted; this
+	// test asserts the snake_case path normalises to the camelCase wire
+	// format that the local issuer + ACME server expect).
+	if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "key_compromise"}); err != nil {
 		t.Errorf("handleCerts({revoke ...}): err=%v", err)
 	}
 	if lastMethod != "POST" || !strings.Contains(lastPath, "/revoke") {
 		t.Errorf("expected POST .../revoke, got %s %s", lastMethod, lastPath)
 	}
-	if !strings.Contains(lastBody, "compromise") {
-		t.Errorf("expected reason in body, got %q", lastBody)
+	if !strings.Contains(lastBody, "keyCompromise") {
+		t.Errorf("expected normalised reason 'keyCompromise' in body, got %q", lastBody)
+	}
+}
+
+// TestHandleCerts_Revoke_RequiresReason pins the 2026-05-05 parity-defaults-
+// cleanup (P3-2, Option A) strict-reason contract: empty --reason is a
+// fatal error, not a silent fallback to "unspecified".
+func TestHandleCerts_Revoke_RequiresReason(t *testing.T) {
+	srv := stubServer(t, 200, `{}`)
+	c := newDispatchTestClient(t, srv)
+	err := handleCerts(c, []string{"revoke", "mc-x"})
+	if err == nil {
+		t.Fatal("expected error when --reason is omitted; got nil (regression on P3-2 strict path)")
+	}
+	if !strings.Contains(err.Error(), "reason") {
+		t.Errorf("expected error to mention 'reason', got %q", err.Error())
+	}
+}
+
+// TestHandleCerts_Revoke_RejectsUnknownReason pins that off-RFC reason
+// codes are rejected at the CLI dispatch layer (P3-2 anti-typo guard).
+func TestHandleCerts_Revoke_RejectsUnknownReason(t *testing.T) {
+	srv := stubServer(t, 200, `{}`)
+	c := newDispatchTestClient(t, srv)
+	err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"})
+	if err == nil {
+		t.Fatal("expected error for non-canonical reason; got nil")
+	}
+	if !strings.Contains(err.Error(), "compromise") {
+		t.Errorf("expected error to echo bad reason 'compromise', got %q", err.Error())
+	}
+}
+
+// TestHandleCerts_Renew_ForceFlag pins the 2026-05-05 parity-defaults-
+// cleanup (P3-1) wire: --force on the renew dispatch sends ?force=true.
+// CLI convention: ID is positional and precedes the flags (matches
+// `agents retire <id> [--force]`), so the flag MUST come after the ID.
+func TestHandleCerts_Renew_ForceFlag(t *testing.T) {
+	for _, tc := range []struct {
+		name      string
+		args      []string
+		wantQuery string
+	}{
+		{"no-force", []string{"renew", "mc-x"}, ""},
+		{"force-after-id", []string{"renew", "mc-x", "--force"}, "force=true"},
+	} {
+		t.Run(tc.name, func(t *testing.T) {
+			var lastQuery string
+			srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+				lastQuery = r.URL.RawQuery
+				w.WriteHeader(200)
+				_, _ = w.Write([]byte(`{}`))
+			}))
+			t.Cleanup(srv.Close)
+			c := newDispatchTestClient(t, srv)
+			if err := handleCerts(c, tc.args); err != nil {
+				t.Fatalf("handleCerts: %v", err)
+			}
+			if lastQuery != tc.wantQuery {
+				t.Errorf("query: got %q want %q", lastQuery, tc.wantQuery)
+			}
+		})
 	}
 }

@@ -1,3 +1,6 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
@@ -111,6 +114,8 @@ Examples:
 		err = handleEST(client, cmdArgs)
 	case "status":
 		err = handleStatus(client)
+	case "auth":
+		err = handleAuth(client, cmdArgs)
 	case "version":
 		fmt.Println("certctl-cli version 0.1.0")
 	default:
@@ -144,22 +149,70 @@ func handleCerts(client *cli.Client, args []string) error {
 		}
 		return client.GetCertificate(subArgs[0])
 	case "renew":
+		// 2026-05-05 parity-defaults-cleanup (P3-1): expose --force as an
+		// explicit operator flag instead of the historical hardcoded
+		// `force=false` body field. force=true overrides the server-side
+		// RenewalInProgress block — used to recover stuck in-flight
+		// renewals. Archived/Expired remain terminal regardless.
+		//
+		// CLI convention: `certs renew <id> [--force]` — the ID is a
+		// positional arg that precedes the flags. Mirrors `agents retire
+		// <id>`'s pattern (Go's flag package stops at the first non-flag
+		// token, so we pull subArgs[0] as the ID and hand subArgs[1:] to
+		// the flag parser).
 		if len(subArgs) == 0 {
-			fmt.Fprintf(os.Stderr, "usage: certs renew <id>\n")
-			return nil
-		}
-		return client.RenewCertificate(subArgs[0])
-	case "revoke":
-		if len(subArgs) == 0 {
-			fmt.Fprintf(os.Stderr, "usage: certs revoke <id> [--reason <reason>]\n")
+			fmt.Fprintf(os.Stderr, "usage: certs renew <id> [--force]\n")
 			return nil
 		}
 		id := subArgs[0]
-		reason := "unspecified"
-		if len(subArgs) > 2 && subArgs[1] == "--reason" {
-			reason = subArgs[2]
+		fs := flag.NewFlagSet("certs renew", flag.ContinueOnError)
+		force := fs.Bool("force", false, "Force renewal even when the cert is currently in RenewalInProgress (clears stuck in-flight renewals; does NOT override Archived/Expired terminal states)")
+		if err := fs.Parse(subArgs[1:]); err != nil {
+			return err
 		}
-		return client.RevokeCertificate(id, reason)
+		return client.RenewCertificate(id, *force)
+	case "revoke":
+		// 2026-05-05 parity-defaults-cleanup (P3-2, Option A): --reason is
+		// strictly required. Empty reason refuses to dispatch and prints
+		// the RFC 5280 §5.3.1 reason-code menu so operators pick a real
+		// value. The pre-2026-05-05 silent fallback to "unspecified"
+		// defeated compliance reporting (PCI-DSS §3.6, HIPAA §164.312)
+		// because every revocation looked the same in the audit trail.
+		//
+		// CLI convention: `certs revoke <id> --reason <reason>` — same
+		// ID-first ordering as `certs renew`.
+		if len(subArgs) == 0 {
+			fmt.Fprintf(os.Stderr, "usage: certs revoke <id> --reason <reason>\n")
+			fmt.Fprintf(os.Stderr, "\nValid RFC 5280 §5.3.1 reasons:\n")
+			for _, r := range cli.ValidRevokeReasons() {
+				fmt.Fprintf(os.Stderr, "  %s\n", r)
+			}
+			return nil
+		}
+		id := subArgs[0]
+		fs := flag.NewFlagSet("certs revoke", flag.ContinueOnError)
+		reason := fs.String("reason", "", "RFC 5280 revocation reason (required). Valid values: keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, removeFromCRL, privilegeWithdrawn, aaCompromise, unspecified")
+		if err := fs.Parse(subArgs[1:]); err != nil {
+			return err
+		}
+		if *reason == "" {
+			fmt.Fprintf(os.Stderr, "error: --reason is required (no silent fallback to 'unspecified' — pick a real RFC 5280 §5.3.1 code).\n\n")
+			fmt.Fprintf(os.Stderr, "Valid reasons:\n")
+			for _, r := range cli.ValidRevokeReasons() {
+				fmt.Fprintf(os.Stderr, "  %s\n", r)
+			}
+			return fmt.Errorf("--reason is required")
+		}
+		canonical, ok := cli.NormalizeRevokeReason(*reason)
+		if !ok {
+			fmt.Fprintf(os.Stderr, "error: %q is not a valid RFC 5280 §5.3.1 reason code.\n\n", *reason)
+			fmt.Fprintf(os.Stderr, "Valid reasons (camelCase or snake_case both accepted):\n")
+			for _, r := range cli.ValidRevokeReasons() {
+				fmt.Fprintf(os.Stderr, "  %s\n", r)
+			}
+			return fmt.Errorf("invalid --reason: %q", *reason)
+		}
+		return client.RevokeCertificate(id, canonical)
 	case "bulk-revoke":
 		return client.BulkRevokeCertificates(subArgs)
 	default:
@@ -316,3 +369,123 @@ func validateHTTPSScheme(serverURL string) error {
 		return fmt.Errorf("server URL %q uses unsupported scheme %q — expected https://", serverURL, u.Scheme)
 	}
 }
+
+// handleAuth dispatches the `certctl-cli auth ...` subcommand tree.
+// Bundle 1 Phase 5: ships read + grant operations against the
+// /api/v1/auth/* surface introduced in Phase 4. Mutations like role
+// create / update / delete can be added in a Phase 5.5 follow-up; this
+// commit ships the operator-facing subset most useful for migration
+// and day-2 scope-down (`auth keys list` + `auth keys assign` +
+// `auth me`).
+func handleAuth(client *cli.Client, args []string) error {
+	if len(args) == 0 {
+		fmt.Fprintf(os.Stderr, "usage: auth <roles|permissions|keys|me> [...]\n")
+		return nil
+	}
+	subcommand := args[0]
+	subArgs := args[1:]
+
+	switch subcommand {
+	case "roles":
+		return handleAuthRoles(client, subArgs)
+	case "permissions":
+		return handleAuthPermissions(client, subArgs)
+	case "keys":
+		return handleAuthKeys(client, subArgs)
+	case "me":
+		return client.AuthMe()
+	default:
+		fmt.Fprintf(os.Stderr, "unknown auth subcommand: %s\n", subcommand)
+		return nil
+	}
+}
+
+func handleAuthRoles(client *cli.Client, args []string) error {
+	if len(args) == 0 {
+		fmt.Fprintf(os.Stderr, "usage: auth roles <list|get> [id]\n")
+		return nil
+	}
+	switch args[0] {
+	case "list":
+		return client.AuthListRoles()
+	case "get":
+		if len(args) < 2 {
+			fmt.Fprintf(os.Stderr, "usage: auth roles get <id>\n")
+			return nil
+		}
+		return client.AuthGetRole(args[1])
+	default:
+		fmt.Fprintf(os.Stderr, "unknown roles subcommand: %s\n", args[0])
+		return nil
+	}
+}
+
+func handleAuthPermissions(client *cli.Client, args []string) error {
+	if len(args) == 0 || args[0] != "list" {
+		fmt.Fprintf(os.Stderr, "usage: auth permissions list\n")
+		return nil
+	}
+	return client.AuthListPermissions()
+}
+
+func handleAuthKeys(client *cli.Client, args []string) error {
+	if len(args) == 0 {
+		fmt.Fprintf(os.Stderr, "usage: auth keys <list|assign|revoke|scope-down> [...]\n")
+		return nil
+	}
+	switch args[0] {
+	case "list":
+		return client.AuthListKeys()
+	case "assign":
+		// auth keys assign <key-id> --role <role-id>
+		if len(args) < 4 || args[2] != "--role" {
+			fmt.Fprintf(os.Stderr, "usage: auth keys assign <key-id> --role <role-id>\n")
+			return nil
+		}
+		return client.AuthAssignRoleToKey(args[1], args[3])
+	case "revoke":
+		// auth keys revoke <key-id> --role <role-id>
+		if len(args) < 4 || args[2] != "--role" {
+			fmt.Fprintf(os.Stderr, "usage: auth keys revoke <key-id> --role <role-id>\n")
+			return nil
+		}
+		return client.AuthRevokeRoleFromKey(args[1], args[3])
+	case "scope-down":
+		// Bundle 1 Phase 7 — interactive (default), --non-interactive
+		// <config.json>, or --suggest [--apply].
+		return handleAuthKeysScopeDown(client, args[1:])
+	default:
+		fmt.Fprintf(os.Stderr, "unknown keys subcommand: %s\n", args[0])
+		return nil
+	}
+}
+
+// handleAuthKeysScopeDown dispatches the three scope-down modes:
+//
+//	auth keys scope-down                              → interactive
+//	auth keys scope-down --non-interactive <config>   → JSON-driven
+//	auth keys scope-down --suggest [--apply]          → audit-driven suggestions
+func handleAuthKeysScopeDown(client *cli.Client, args []string) error {
+	if len(args) == 0 {
+		return client.AuthScopeDown()
+	}
+	switch args[0] {
+	case "--non-interactive":
+		if len(args) < 2 {
+			fmt.Fprintf(os.Stderr, "usage: auth keys scope-down --non-interactive <config.json>\n")
+			return nil
+		}
+		return client.AuthScopeDownNonInteractive(args[1])
+	case "--suggest":
+		apply := false
+		for _, a := range args[1:] {
+			if a == "--apply" {
+				apply = true
+			}
+		}
+		return client.AuthScopeDownSuggest(apply)
+	default:
+		fmt.Fprintf(os.Stderr, "unknown scope-down flag: %s\n", args[0])
+		return nil
+	}
+}
@@ -1,3 +1,6 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
@@ -0,0 +1,108 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"context"
+	"fmt"
+	"log/slog"
+	"strings"
+
+	"github.com/certctl-io/certctl/internal/auth"
+	"github.com/certctl-io/certctl/internal/config"
+	"github.com/certctl-io/certctl/internal/domain"
+	authdomain "github.com/certctl-io/certctl/internal/domain/auth"
+)
+
+// assembleNamedAPIKeys translates the operator's CERTCTL_API_KEYS_NAMED
+// env-var (preferred) or CERTCTL_AUTH_SECRET (legacy) into the
+// auth.NamedAPIKey slice the rest of the boot path consumes.
+//
+// Authentication unification (M-002): every authenticated request now
+// carries a named actor in the request context so audit events record
+// the real key identity instead of the hardcoded "api-key-user"
+// string. Named keys come from CERTCTL_API_KEYS_NAMED (preferred). For
+// backward compatibility CERTCTL_AUTH_SECRET is synthesized into
+// legacy-key-N entries with Admin=false.
+func assembleNamedAPIKeys(cfg *config.Config, logger *slog.Logger) []auth.NamedAPIKey {
+	if config.AuthType(cfg.Auth.Type) == config.AuthTypeNone {
+		return nil
+	}
+	var out []auth.NamedAPIKey
+	for _, nk := range cfg.Auth.NamedKeys {
+		out = append(out, auth.NamedAPIKey{
+			Name:  nk.Name,
+			Key:   nk.Key,
+			Admin: nk.Admin,
+		})
+	}
+	if len(out) == 0 && cfg.Auth.Secret != "" {
+		idx := 0
+		for _, p := range strings.Split(cfg.Auth.Secret, ",") {
+			p = strings.TrimSpace(p)
+			if p == "" {
+				continue
+			}
+			out = append(out, auth.NamedAPIKey{
+				Name:  fmt.Sprintf("legacy-key-%d", idx),
+				Key:   p,
+				Admin: false,
+			})
+			idx++
+		}
+		if len(out) > 0 && logger != nil {
+			logger.Warn("CERTCTL_AUTH_SECRET is deprecated — set CERTCTL_API_KEYS_NAMED for named actor attribution and admin gating",
+				"synthesized_keys", len(out))
+		}
+	}
+	return out
+}
+
+// actorRoleGranter is the narrow interface backfillNamedKeyActorRoles
+// needs from the postgres ActorRoleRepository. Pulled out so the unit
+// test can inject a fake without spinning up the full repo / DB.
+type actorRoleGranter interface {
+	Grant(ctx context.Context, ar *authdomain.ActorRole) error
+}
+
+// backfillNamedKeyActorRoles is the Bundle 1 Phase 3 closure (C2)
+// startup hook that ensures every CERTCTL_API_KEYS_NAMED entry — and
+// every legacy CERTCTL_AUTH_SECRET synthesized fallback — has an
+// actor_roles row before the HTTP server accepts requests. Admin-flagged
+// keys grant `r-admin` (full canonical permission set); non-admin keys
+// grant `r-viewer` (read-only surface), matching the pre-Phase-3.5
+// capability shape.
+//
+// Idempotent via ON CONFLICT DO NOTHING in the repo Grant — reboots
+// don't create duplicates. Failures are logged but non-fatal: the server
+// still starts, and the operator can fix the grant via the RBAC API.
+//
+// The function is package-private + extracted from main() so the unit
+// test in auth_backfill_test.go can pin the role-mapping invariant
+// without depending on the full server bootstrap path.
+func backfillNamedKeyActorRoles(
+	ctx context.Context,
+	repo actorRoleGranter,
+	keys []auth.NamedAPIKey,
+	logger *slog.Logger,
+) {
+	for _, nk := range keys {
+		role := authdomain.RoleIDViewer
+		if nk.Admin {
+			role = authdomain.RoleIDAdmin
+		}
+		if err := repo.Grant(ctx, &authdomain.ActorRole{
+			ActorID:   nk.Name,
+			ActorType: authdomain.ActorTypeValue(domain.ActorTypeAPIKey),
+			RoleID:    role,
+			TenantID:  authdomain.DefaultTenantID,
+			GrantedBy: "bootstrap",
+		}); err != nil {
+			if logger != nil {
+				logger.Warn("api-key actor-role backfill failed; key authenticates but RBAC routes will 403 until grant is added via /v1/auth/keys",
+					"key", nk.Name, "role", role, "err", err)
+			}
+		}
+	}
+}
@@ -0,0 +1,116 @@
+package main
+
+import (
+	"context"
+	"errors"
+	"io"
+	"log/slog"
+	"testing"
+
+	"github.com/certctl-io/certctl/internal/auth"
+	authdomain "github.com/certctl-io/certctl/internal/domain/auth"
+)
+
+// fakeGranter is a tiny in-memory stand-in for the postgres ActorRoleRepository
+// — enough surface area for backfillNamedKeyActorRoles to call Grant against.
+type fakeGranter struct {
+	calls []*authdomain.ActorRole
+	err   error
+}
+
+func (f *fakeGranter) Grant(_ context.Context, ar *authdomain.ActorRole) error {
+	f.calls = append(f.calls, ar)
+	return f.err
+}
+
+// TestBackfillNamedKeyActorRoles_RoleMapping pins the Bundle 1 Phase 3
+// closure (C2) invariant: admin-flagged named keys grant r-admin,
+// non-admin keys grant r-viewer, both at TenantID t-default with
+// ActorType APIKey and GrantedBy=bootstrap.
+func TestBackfillNamedKeyActorRoles_RoleMapping(t *testing.T) {
+	repo := &fakeGranter{}
+	logger := slog.New(slog.NewTextHandler(io.Discard, nil))
+
+	keys := []auth.NamedAPIKey{
+		{Name: "alice-admin", Key: "AAA", Admin: true},
+		{Name: "bob-viewer", Key: "BBB", Admin: false},
+		{Name: "carol-admin", Key: "CCC", Admin: true},
+	}
+	backfillNamedKeyActorRoles(context.Background(), repo, keys, logger)
+
+	if len(repo.calls) != 3 {
+		t.Fatalf("Grant call count = %d, want 3", len(repo.calls))
+	}
+	type want struct {
+		actor, role string
+	}
+	wants := []want{
+		{actor: "alice-admin", role: authdomain.RoleIDAdmin},
+		{actor: "bob-viewer", role: authdomain.RoleIDViewer},
+		{actor: "carol-admin", role: authdomain.RoleIDAdmin},
+	}
+	for i, w := range wants {
+		got := repo.calls[i]
+		if got.ActorID != w.actor {
+			t.Errorf("call[%d].ActorID = %q, want %q", i, got.ActorID, w.actor)
+		}
+		if got.RoleID != w.role {
+			t.Errorf("call[%d].RoleID = %q, want %q", i, got.RoleID, w.role)
+		}
+		if got.TenantID != authdomain.DefaultTenantID {
+			t.Errorf("call[%d].TenantID = %q, want %q", i, got.TenantID, authdomain.DefaultTenantID)
+		}
+		if string(got.ActorType) != "APIKey" {
+			t.Errorf("call[%d].ActorType = %q, want APIKey", i, got.ActorType)
+		}
+		if got.GrantedBy != "bootstrap" {
+			t.Errorf("call[%d].GrantedBy = %q, want bootstrap", i, got.GrantedBy)
+		}
+	}
+}
+
+// TestBackfillNamedKeyActorRoles_EmptyKeysIsNoOp confirms the boot path
+// is safe when no named keys are configured (typical CERTCTL_AUTH_TYPE=
+// none deploy). No Grant calls; no panic.
+func TestBackfillNamedKeyActorRoles_EmptyKeysIsNoOp(t *testing.T) {
+	repo := &fakeGranter{}
+	logger := slog.New(slog.NewTextHandler(io.Discard, nil))
+	backfillNamedKeyActorRoles(context.Background(), repo, nil, logger)
+	if len(repo.calls) != 0 {
+		t.Errorf("Grant called %d times for empty keys, want 0", len(repo.calls))
+	}
+}
+
+// TestBackfillNamedKeyActorRoles_GrantErrorIsNonFatal confirms the
+// closure invariant that a Grant failure logs a warning and proceeds
+// rather than crashing the server during boot. Subsequent keys still
+// get processed.
+func TestBackfillNamedKeyActorRoles_GrantErrorIsNonFatal(t *testing.T) {
+	repo := &fakeGranter{err: errors.New("simulated DB error")}
+	logger := slog.New(slog.NewTextHandler(io.Discard, nil))
+
+	keys := []auth.NamedAPIKey{
+		{Name: "alice", Key: "A", Admin: true},
+		{Name: "bob", Key: "B", Admin: false},
+	}
+	// Should not panic.
+	backfillNamedKeyActorRoles(context.Background(), repo, keys, logger)
+
+	if len(repo.calls) != 2 {
+		t.Errorf("Grant calls = %d, want 2 (every key processed even when prior Grant errored)", len(repo.calls))
+	}
+}
+
+// TestBackfillNamedKeyActorRoles_NilLoggerIsSafe pins that callers
+// passing nil for the logger don't NPE the goroutine. Belt-and-braces
+// for tests + future call sites that may not have a logger plumbed.
+func TestBackfillNamedKeyActorRoles_NilLoggerIsSafe(t *testing.T) {
+	repo := &fakeGranter{err: errors.New("simulated")}
+	keys := []auth.NamedAPIKey{
+		{Name: "alice", Key: "A", Admin: true},
+	}
+	backfillNamedKeyActorRoles(context.Background(), repo, keys, nil)
+	if len(repo.calls) != 1 {
+		t.Errorf("Grant calls = %d, want 1", len(repo.calls))
+	}
+}
@@ -12,6 +12,7 @@ import (

 	"github.com/certctl-io/certctl/internal/api/middleware"
 	"github.com/certctl-io/certctl/internal/api/router"
+	"github.com/certctl-io/certctl/internal/auth"
 	"github.com/certctl-io/certctl/internal/config"
 	"github.com/certctl-io/certctl/internal/service"
 )
@@ -44,7 +45,7 @@ func TestMain_HealthEndpointBypassesAuth(t *testing.T) {
 	})

 	// Build the handler chain the same way main.go does
-	authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
+	authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
 		{Name: "test", Key: "test-secret-key"},
 	})

@@ -159,7 +160,7 @@ func TestMain_AuthMiddlewareRejectsUnauthorized(t *testing.T) {
 	})

 	// Wrap with auth middleware
-	authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
+	authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
 		{Name: "test", Key: "test-secret-key"},
 	})

@@ -187,7 +188,7 @@ func TestMain_AuthMiddlewareAllowsWithValidKey(t *testing.T) {
 	})

 	// Wrap with auth middleware
-	authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
+	authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
 		{Name: "test", Key: testKey},
 	})

@@ -460,7 +461,7 @@ func TestMain_AuthNoneMode(t *testing.T) {

 	// Wrap with auth middleware in "none" mode
 	// auth=none equivalent: empty named-keys list is a no-op pass-through.
-	authMiddleware := middleware.NewAuthWithNamedKeys(nil)
+	authMiddleware := auth.NewAuthWithNamedKeys(nil)

 	chainedHandler := middleware.Chain(protectedHandler, authMiddleware)

@@ -0,0 +1,209 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"database/sql"
+	"log/slog"
+	"os"
+	"strings"
+
+	"github.com/certctl-io/certctl/internal/config"
+	"github.com/certctl-io/certctl/internal/repository/postgres"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 8b (2026-05-14): the deferred half of
+// Sprint 8. Extracts the boot-time migration handling from main()'s
+// inline body into two unexported helpers. Different shape from
+// Sprints 1-7 (data-type relocation) and from Sprint 8a (existing
+// helper-function relocation) — this sprint crosses the
+// behavior-change boundary Sprint 8 first identified.
+//
+// What lives here
+// ===============
+//   parseMigrateOnlyFlag() bool
+//     Hand-parses os.Args for `--migrate-only` (NOT flag.Parse — the
+//     server's config surface is otherwise env-var driven via
+//     config.Load; introducing flag.Parse's global state risks
+//     conflicting with other binaries that may import cmd/server later).
+//
+//   runBootMigrations(cfg, db, logger, migrateOnly) (exitNow bool)
+//     Owns the Phase 4 DEPL-M1 migration-via-hook posture: the
+//     migrationsViaHook env-var read, the RunMigrations + RunSeed
+//     gate, the --migrate-only early-exit signal, and the
+//     CERTCTL_DEMO_SEED demo-overlay branch.
+//
+//     Returns true ONLY when --migrate-only was set and migrations +
+//     seed completed cleanly. The caller (main) translates that to
+//     `return` rather than os.Exit(0) — which is the SOLE intentional
+//     behavior change in this sprint (see below).
+//
+// Behavior preservation contract
+// ==============================
+// Every error path inside runBootMigrations calls os.Exit(1)
+// directly, matching the original inline behavior byte-for-byte
+// (same log message, same exit code, same no-defer-run-on-fatal
+// semantics). The error-path os.Exit(1) is intentional: when
+// migration fails at boot, the server cannot recover, and bailing
+// out without running defers is the original Go-idiomatic shape.
+//
+// The ONE behavior change: the --migrate-only SUCCESS path now
+// returns to main() rather than calling os.Exit(0) inline. This
+// has one observable effect: the `defer db.Close()` registered in
+// main() now runs at clean exit instead of being skipped. That's
+// strictly better hygiene (clean DB connection shutdown vs OS
+// reclaim). The migration work is synchronous + complete before
+// the return; nothing async is left running that db.Close() could
+// truncate.
+//
+// All other paths — the migration log messages, the seed log
+// messages, the migrationsViaHook env-var read order, the
+// RunDemoSeed gating, the per-step success/skip log lines — are
+// byte-identical to the pre-Sprint-8b inline form. Verified via
+// `go test ./cmd/server/... -count=1 -short` (which runs the
+// existing main_test.go assertions through the new call site).
+//
+// Why this is a separate commit
+// =============================
+// Sprint 8a (commit see git log) extracted the bottom-of-file
+// helpers + adapter types — pure mechanical relocation that
+// couldn't change runtime semantics. Sprint 8b crosses the boundary
+// where mechanical relocation ends: introducing a new function
+// call frame changes defer scope, panic recovery, and (in this
+// case) the exit semantics for the --migrate-only path. The
+// Phase 9 prompt's "refactor is mechanical relocation; behavior
+// change is a separate concern" rule guards against exactly this
+// shape of risk being landed without a focused review.
+//
+// Splitting Sprint 8a (mechanical) from Sprint 8b (behavior-aware)
+// means the operator's git log shows:
+//   3f1344e8 ... wire.go         — no behavior change possible
+//   <this>   ... migrations.go    — one specific behavior shift,
+//                                   documented + intentional
+//
+// Anyone bisecting a future bug to one of these two commits gets a
+// clean "is it mechanical or did the behavior change" signal.
+
+// parseMigrateOnlyFlag scans os.Args for the `--migrate-only` token
+// and returns true if found. Hand-parsed instead of using flag.Parse
+// because:
+//
+//  1. The server's entire config surface is env-var driven via
+//     config.Load(). flag.Parse() introduces a global package-state
+//     dependency that future binaries importing cmd/server (test
+//     harnesses, CLI tools, embedded variants) would have to
+//     coordinate around.
+//  2. The only flag we care about is the migration-vs-server-lifecycle
+//     toggle; a hand-parser is 6 lines and has no transitive cost.
+//  3. The flag is Helm-pre-install-hook-facing (see
+//     deploy/helm/certctl/templates/migration-job.yaml). Its shape is
+//     pinned by that template, not by anything else; we don't need
+//     flag.Parse's auto-help generation or type coercion.
+//
+// Bare arg match — no `=` value form, no short alias, no override
+// from env. Anyone passing `--migrate-only` ANYWHERE in os.Args[1:]
+// flips the flag on. Matches the original inline behavior exactly.
+func parseMigrateOnlyFlag() bool {
+	for _, arg := range os.Args[1:] {
+		if arg == "--migrate-only" {
+			return true
+		}
+	}
+	return false
+}
+
+// runBootMigrations owns the Phase 4 DEPL-M1 boot-time migration
+// posture. Three lifecycles to support:
+//
+//	(a) Compose / VM / bare-metal: server runs migrations at boot.
+//	    Default behavior — preserved unchanged.
+//	(b) Helm with pre-install/pre-upgrade hook: the migration Job
+//	    runs `certctl-server --migrate-only`, does its work, and
+//	    exits. The server Deployment's pods then start with
+//	    CERTCTL_MIGRATIONS_VIA_HOOK=true set; they see the env
+//	    var and skip their boot-time RunMigrations call so the
+//	    Job's work isn't duplicated.
+//	(c) Bare `certctl-server --migrate-only` invocation (e.g.
+//	    operator running a one-shot migration from the CLI):
+//	    runs migrations + seed and returns true so main returns
+//	    cleanly without starting the HTTP listener / scheduler /
+//	    signing setup.
+//
+// migrateOnly captures case (c); CERTCTL_MIGRATIONS_VIA_HOOK
+// captures case (b). Both paths converge on the same RunMigrations
+// + RunSeed code below.
+//
+// Returns true ONLY when migrateOnly is set; caller (main) handles
+// the clean exit via `return` so deferred cleanup (db.Close) runs.
+// Returns false in every other case — caller continues normal boot.
+// On any migration / seed error: os.Exit(1) inline (matches the
+// pre-extraction shape; recovery is not possible at this boot
+// stage).
+func runBootMigrations(cfg *config.Config, db *sql.DB, logger *slog.Logger, migrateOnly bool) bool {
+	migrationsViaHook := strings.EqualFold(os.Getenv("CERTCTL_MIGRATIONS_VIA_HOOK"), "true")
+
+	if migrateOnly || !migrationsViaHook {
+		logger.Info("running migrations", "path", cfg.Database.MigrationsPath)
+		if err := postgres.RunMigrations(db, cfg.Database.MigrationsPath); err != nil {
+			logger.Error("failed to run migrations", "error", err)
+			os.Exit(1)
+		}
+		logger.Info("migrations completed")
+	} else {
+		logger.Info("skipping migrations at boot (CERTCTL_MIGRATIONS_VIA_HOOK=true — Helm pre-install/pre-upgrade hook owns this work)")
+	}
+
+	// Apply baseline seed data.
+	//
+	// U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 seed.sql was mounted
+	// into postgres `/docker-entrypoint-initdb.d/` alongside a hand-curated
+	// subset of migrations. Adding a migration that introduced a new column
+	// referenced by seed.sql (cat-o-retry_interval_unit_mismatch /
+	// policy_rules.severity / etc.) without also updating the compose volume
+	// mounts caused initdb to crash on first up. Post-U-3 the compose stack
+	// drops all initdb mounts; postgres comes up with empty schema, the
+	// server runs RunMigrations above, then this RunSeed call lands the
+	// baseline data — all from a single source of truth (this binary).
+	// See internal/repository/postgres/db.go::RunSeed for the contract.
+	//
+	// Phase 4 DEPL-M1: same migration-via-hook gating as RunMigrations.
+	// When the hook owns migrations it also owns the seed pass.
+	if migrateOnly || !migrationsViaHook {
+		logger.Info("applying baseline seed", "path", cfg.Database.MigrationsPath)
+		if err := postgres.RunSeed(db, cfg.Database.MigrationsPath); err != nil {
+			logger.Error("failed to apply seed data", "error", err)
+			os.Exit(1)
+		}
+		logger.Info("seed completed")
+	} else {
+		logger.Info("skipping baseline seed at boot (CERTCTL_MIGRATIONS_VIA_HOOK=true — hook applies seed alongside migrations)")
+	}
+
+	// Phase 4 DEPL-M1: --migrate-only early-exit. Migrations + seed are
+	// done; the operator only asked for the migration pass. Signal main
+	// to return cleanly so deferred db.Close runs (Sprint 8b improvement
+	// over the pre-extraction os.Exit(0) which skipped defers).
+	if migrateOnly {
+		logger.Info("--migrate-only: migrations + seed complete; exiting without starting server lifecycle")
+		return true
+	}
+
+	// Apply demo overlay seed when CERTCTL_DEMO_SEED=true. Pre-U-3 the demo
+	// overlay (deploy/docker-compose.demo.yml) mounted seed_demo.sql into
+	// postgres `/docker-entrypoint-initdb.d/`; that broke once U-3 dropped
+	// the initdb migration mounts (the demo seed references tables that
+	// wouldn't exist at initdb time). The runtime path here is the
+	// post-U-3 replacement. Default-off so a vanilla deploy never lands
+	// fake-history rows. See postgres.RunDemoSeed for the contract.
+	if cfg.Database.DemoSeed {
+		logger.Info("applying demo seed (CERTCTL_DEMO_SEED=true)", "path", cfg.Database.MigrationsPath)
+		if err := postgres.RunDemoSeed(db, cfg.Database.MigrationsPath); err != nil {
+			logger.Error("failed to apply demo seed data", "error", err)
+			os.Exit(1)
+		}
+		logger.Info("demo seed completed")
+	}
+
+	return false
+}
@@ -0,0 +1,204 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+//
+// Audit 2026-05-11 A-8 — demo-mode residual-grants detector. Closes the
+// deferred Phase 2 leg of HIGH-12 (cowork/auth-bundles-fixes-2026-05-10/
+// 11-high-12-demo-mode-guard.md). The HIGH-12 closure (`b81588e`) added
+// the fail-closed bind-address guard at config.Validate; the deferred
+// leg here adds a startup-time WARN (or strict refuse-startup) when
+// `actor-demo-anon` has live role grants under a non-`none` auth type.
+//
+// Why this matters: migration 000029 unconditionally seeds the
+// `ar-demo-anon-admin` row granting r-admin to actor-demo-anon. The
+// row is dormant under auth_type=api-key|oidc (the middleware chain
+// never injects the synthetic actor as the request principal), but
+// it represents a security debt: any future regression in the
+// middleware chain (a misrouted CORS preflight, a fallback in a new
+// auth-exempt route) that resolves to actor-demo-anon would re-elevate
+// to admin. The canonical acquisition-readiness narrative — "we have
+// an RBAC primitive with no synthetic-admin fallback" — requires this
+// row to be either gone or explicitly acknowledged.
+
+package main
+
+import (
+	"context"
+	"database/sql"
+	"errors"
+	"fmt"
+	"log/slog"
+	"strings"
+	"time"
+
+	"github.com/certctl-io/certctl/internal/config"
+	"github.com/certctl-io/certctl/internal/domain"
+	authdomain "github.com/certctl-io/certctl/internal/domain/auth"
+	"github.com/certctl-io/certctl/internal/service"
+)
+
+// preflightDemoModeResidual runs after the DB connection is open and
+// the audit service is constructed, before the HTTPS listener starts.
+//
+// Behaviour:
+//   - cfg.Auth.Type == "none" (demo mode): no-op. The residual IS the
+//     runtime state at that auth type.
+//   - cfg.Auth.Type != "none" + no residue: returns nil silently.
+//   - cfg.Auth.Type != "none" + residue + strict=false: emits a WARN
+//     log AND an `auth.demo_residual_grants_detected` audit row
+//     listing the grant IDs, then returns nil.
+//   - cfg.Auth.Type != "none" + residue + strict=true: emits the same
+//     WARN + audit, then returns a non-nil error so the caller can
+//     refuse startup.
+//
+// The audit row's actor is `system` / ActorTypeSystem; category is
+// EventCategoryAuth so audit consumers filtering on auth events see it.
+func preflightDemoModeResidual(
+	ctx context.Context,
+	cfg *config.Config,
+	db *sql.DB,
+	audit *service.AuditService,
+	logger *slog.Logger,
+) error {
+	if cfg.Auth.Type == "none" {
+		// Demo mode itself. The residual is the runtime state at
+		// this auth type, so warning about it would be noise.
+		return nil
+	}
+
+	residue, err := queryDemoAnonResidue(ctx, db)
+	if err != nil {
+		return fmt.Errorf("preflight demo-mode residual: %w", err)
+	}
+	if len(residue) == 0 {
+		return nil
+	}
+
+	formatted := make([]string, 0, len(residue))
+	for _, r := range residue {
+		formatted = append(formatted, r.String())
+	}
+
+	msg := fmt.Sprintf(
+		"production startup warning: actor-demo-anon has %d residual role grant(s) "+
+			"from the migration 000029 baseline or a prior demo-mode run: %s. "+
+			"These grants are DORMANT at the current auth_type (%s) but represent a "+
+			"security debt — any future regression that resolves an unauthenticated "+
+			"request to actor-demo-anon would re-elevate to admin. Clean up via "+
+			"POST /api/v1/auth/demo-residual/cleanup (requires auth.role.assign) or "+
+			"`DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon';`. Set "+
+			"CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true to refuse startup until cleanup.",
+		len(residue), strings.Join(formatted, "; "), cfg.Auth.Type,
+	)
+	if logger != nil {
+		logger.Warn(msg, "auth_type", cfg.Auth.Type, "residue_count", len(residue))
+	} else {
+		slog.Warn(msg)
+	}
+
+	if audit != nil {
+		details := map[string]interface{}{
+			"auth_type":     cfg.Auth.Type,
+			"residue_count": len(residue),
+			"residue":       formatted,
+		}
+		if err := audit.RecordEventWithCategory(
+			ctx, "system", domain.ActorTypeSystem,
+			"auth.demo_residual_grants_detected",
+			domain.EventCategoryAuth,
+			"actor_roles", authdomain.DemoAnonActorID,
+			details,
+		); err != nil {
+			// Don't fail startup over an audit-write error; just log.
+			if logger != nil {
+				logger.Warn("preflight demo-mode residual: audit record failed", "error", err)
+			}
+		}
+	}
+
+	if cfg.Auth.DemoModeResidualStrict {
+		return fmt.Errorf(
+			"startup refused: actor-demo-anon has %d residual role grant(s) and "+
+				"CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true. Remove the rows before restarting",
+			len(residue),
+		)
+	}
+	return nil
+}
+
+// demoAnonResidueRow describes a single live actor_roles row whose
+// actor_id matches the synthetic demo-anon ID.
+type demoAnonResidueRow struct {
+	RoleID    string
+	ScopeType string
+	ScopeID   string
+	GrantedAt time.Time
+}
+
+// String renders one row as `role@scope (granted ts)`. Used both in
+// the WARN log message and in the audit row's residue list.
+func (r demoAnonResidueRow) String() string {
+	scope := r.ScopeType
+	if r.ScopeID != "" {
+		scope = fmt.Sprintf("%s/%s", r.ScopeType, r.ScopeID)
+	}
+	return fmt.Sprintf("%s@%s (granted %s)", r.RoleID, scope, r.GrantedAt.UTC().Format(time.RFC3339))
+}
+
+// queryDemoAnonResidue runs the canonical query for the residue
+// detector + the cleanup endpoint. Kept in one place so the two
+// surfaces can't drift on which rows count as "live".
+//
+// "Live" = not expired. Rows with expires_at <= NOW() are treated
+// as already gone (they have no effect even if the actor were to be
+// injected as the principal).
+func queryDemoAnonResidue(ctx context.Context, db *sql.DB) ([]demoAnonResidueRow, error) {
+	if db == nil {
+		return nil, errors.New("db is nil")
+	}
+	rows, err := db.QueryContext(ctx, `
+		SELECT role_id, scope_type, COALESCE(scope_id, '') AS scope_id, granted_at
+		FROM actor_roles
+		WHERE actor_id = $1
+		  AND (expires_at IS NULL OR expires_at > NOW())
+		ORDER BY granted_at ASC, role_id ASC, scope_type ASC, COALESCE(scope_id, '') ASC
+	`, authdomain.DemoAnonActorID)
+	if err != nil {
+		return nil, fmt.Errorf("query actor_roles: %w", err)
+	}
+	defer rows.Close()
+
+	var out []demoAnonResidueRow
+	for rows.Next() {
+		var r demoAnonResidueRow
+		if err := rows.Scan(&r.RoleID, &r.ScopeType, &r.ScopeID, &r.GrantedAt); err != nil {
+			return nil, fmt.Errorf("scan actor_roles row: %w", err)
+		}
+		out = append(out, r)
+	}
+	if err := rows.Err(); err != nil {
+		return nil, fmt.Errorf("iterate actor_roles rows: %w", err)
+	}
+	return out, nil
+}
+
+// deleteDemoAnonResidue removes every live actor_roles row for the
+// synthetic demo-anon actor. Returns the count removed. Used by the
+// POST /api/v1/auth/demo-residual/cleanup handler. Idempotent — a
+// follow-up call returns 0.
+func deleteDemoAnonResidue(ctx context.Context, db *sql.DB) (int64, error) {
+	if db == nil {
+		return 0, errors.New("db is nil")
+	}
+	res, err := db.ExecContext(ctx, `
+		DELETE FROM actor_roles
+		WHERE actor_id = $1
+	`, authdomain.DemoAnonActorID)
+	if err != nil {
+		return 0, fmt.Errorf("delete actor_roles: %w", err)
+	}
+	n, err := res.RowsAffected()
+	if err != nil {
+		return 0, fmt.Errorf("rows affected: %w", err)
+	}
+	return n, nil
+}
@@ -0,0 +1,295 @@
+package main
+
+import (
+	"context"
+	"database/sql"
+	"fmt"
+	"log/slog"
+	"os"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+
+	_ "github.com/lib/pq"
+	"github.com/testcontainers/testcontainers-go"
+	"github.com/testcontainers/testcontainers-go/wait"
+
+	"github.com/certctl-io/certctl/internal/config"
+	"github.com/certctl-io/certctl/internal/repository/postgres"
+	"github.com/certctl-io/certctl/internal/service"
+)
+
+// Audit 2026-05-11 A-8 — preflight + cleanup regression tests for the
+// demo-mode residual-grants detector. Testcontainers-backed because the
+// preflight runs raw SQL against actor_roles; mock-DB-only would not
+// catch a SQL-shape regression. Gated by testing.Short() to keep the
+// fast loop fast (matching internal/repository/postgres/* pattern).
+
+var (
+	a8DBOnce sync.Once
+	a8DB     *sql.DB
+	a8Skip   bool
+	a8SkipMu sync.Mutex
+)
+
+func setupA8DB(t *testing.T) *sql.DB {
+	t.Helper()
+	if testing.Short() {
+		t.Skip("preflight A-8 test requires Postgres (testcontainers); skipping under -short")
+	}
+	a8DBOnce.Do(func() {
+		ctx := context.Background()
+		req := testcontainers.ContainerRequest{
+			Image:        "postgres:16-alpine",
+			ExposedPorts: []string{"5432/tcp"},
+			Env: map[string]string{
+				"POSTGRES_DB":       "certctl_test_a8",
+				"POSTGRES_USER":     "certctl",
+				"POSTGRES_PASSWORD": "certctl",
+			},
+			WaitingFor: wait.ForLog("database system is ready to accept connections").WithOccurrence(2),
+		}
+		c, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
+			ContainerRequest: req,
+			Started:          true,
+		})
+		if err != nil {
+			a8SkipMu.Lock()
+			a8Skip = true
+			a8SkipMu.Unlock()
+			t.Logf("skipping A-8 testcontainers preflight (docker unavailable): %v", err)
+			return
+		}
+		host, err := c.Host(ctx)
+		if err != nil {
+			t.Fatalf("get container host: %v", err)
+		}
+		port, err := c.MappedPort(ctx, "5432")
+		if err != nil {
+			t.Fatalf("get mapped port: %v", err)
+		}
+		dsn := fmt.Sprintf("postgres://certctl:certctl@%s:%s/certctl_test_a8?sslmode=disable", host, port.Port())
+
+		db, err := sql.Open("postgres", dsn)
+		if err != nil {
+			t.Fatalf("sql.Open: %v", err)
+		}
+		// Run all migrations so actor_roles exists with the migration
+		// 000029 seed row (`ar-demo-anon-admin`).
+		_, thisFile, _, _ := runtime.Caller(0)
+		migrationsDir := filepath.Join(filepath.Dir(thisFile), "..", "..", "migrations")
+		if _, err := os.Stat(migrationsDir); err != nil {
+			t.Fatalf("locate migrations dir %q: %v", migrationsDir, err)
+		}
+		if err := postgres.RunMigrations(db, migrationsDir); err != nil {
+			t.Fatalf("RunMigrations: %v", err)
+		}
+		a8DB = db
+	})
+
+	a8SkipMu.Lock()
+	skip := a8Skip
+	a8SkipMu.Unlock()
+	if skip {
+		t.Skip("A-8 testcontainers unavailable; skipping")
+	}
+	return a8DB
+}
+
+// resetA8Residue clears the actor_roles rows for actor-demo-anon AND
+// re-inserts the migration 000029 baseline. Used by tests that need a
+// known "post-fresh-migration" state.
+func resetA8Residue(t *testing.T, db *sql.DB, seedBaseline bool) {
+	t.Helper()
+	if _, err := db.ExecContext(context.Background(),
+		`DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon'`); err != nil {
+		t.Fatalf("reset actor_roles: %v", err)
+	}
+	if seedBaseline {
+		if _, err := db.ExecContext(context.Background(), `
+			INSERT INTO actor_roles (id, actor_id, actor_type, role_id, granted_at, granted_by, tenant_id)
+			VALUES ('ar-demo-anon-admin', 'actor-demo-anon', 'Anonymous', 'r-admin', NOW(), 'system', 't-default')
+		`); err != nil {
+			t.Fatalf("reseed baseline: %v", err)
+		}
+	}
+}
+
+// TestPreflightDemoModeResidual_DemoModeActive_Skips proves the
+// preflight short-circuits when Auth.Type=none regardless of residue.
+// Demo mode IS the active runtime state at that auth type, so warning
+// would be noise.
+func TestPreflightDemoModeResidual_DemoModeActive_Skips(t *testing.T) {
+	db := setupA8DB(t)
+	resetA8Residue(t, db, true) // baseline IS present
+
+	cfg := &config.Config{}
+	cfg.Auth.Type = "none"
+	cfg.Auth.DemoModeResidualStrict = true // would refuse if checked
+
+	logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
+	err := preflightDemoModeResidual(context.Background(), cfg, db, nil, logger)
+	if err != nil {
+		t.Fatalf("expected nil under Auth.Type=none, got %v", err)
+	}
+}
+
+// TestPreflightDemoModeResidual_NoResidue_Passes proves a fully-clean
+// actor_roles state passes without WARN.
+func TestPreflightDemoModeResidual_NoResidue_Passes(t *testing.T) {
+	db := setupA8DB(t)
+	resetA8Residue(t, db, false) // explicitly empty
+
+	cfg := &config.Config{}
+	cfg.Auth.Type = "api-key"
+
+	err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil)
+	if err != nil {
+		t.Fatalf("expected nil with empty residue, got %v", err)
+	}
+}
+
+// TestPreflightDemoModeResidual_HasResidue_LogsAndAudits proves the
+// migration 000029 baseline produces a WARN + audit row but does NOT
+// fail startup in default (non-strict) mode.
+func TestPreflightDemoModeResidual_HasResidue_LogsAndAudits(t *testing.T) {
+	db := setupA8DB(t)
+	resetA8Residue(t, db, true)
+
+	cfg := &config.Config{}
+	cfg.Auth.Type = "api-key"
+	cfg.Auth.DemoModeResidualStrict = false
+
+	auditRepo := postgres.NewAuditRepository(db)
+	auditService := service.NewAuditService(auditRepo)
+
+	err := preflightDemoModeResidual(context.Background(), cfg, db, auditService, nil)
+	if err != nil {
+		t.Fatalf("non-strict mode must NOT fail startup with residue, got %v", err)
+	}
+
+	// Audit row should be present for the call.
+	rows, err := db.QueryContext(context.Background(), `
+		SELECT action, event_category, resource_id
+		FROM audit_events
+		WHERE action = 'auth.demo_residual_grants_detected'
+		ORDER BY occurred_at DESC LIMIT 1
+	`)
+	if err != nil {
+		t.Fatalf("audit_events query: %v", err)
+	}
+	defer rows.Close()
+	if !rows.Next() {
+		t.Fatal("expected at least one auth.demo_residual_grants_detected row")
+	}
+	var action, category, resourceID string
+	if err := rows.Scan(&action, &category, &resourceID); err != nil {
+		t.Fatalf("scan: %v", err)
+	}
+	if action != "auth.demo_residual_grants_detected" {
+		t.Errorf("action = %q, want auth.demo_residual_grants_detected", action)
+	}
+	if category != "auth" {
+		t.Errorf("event_category = %q, want auth", category)
+	}
+	if resourceID != "actor-demo-anon" {
+		t.Errorf("resource_id = %q, want actor-demo-anon", resourceID)
+	}
+}
+
+// TestPreflightDemoModeResidual_StrictMode_RefusesStartup proves the
+// flag pivots WARN → fail.
+func TestPreflightDemoModeResidual_StrictMode_RefusesStartup(t *testing.T) {
+	db := setupA8DB(t)
+	resetA8Residue(t, db, true)
+
+	cfg := &config.Config{}
+	cfg.Auth.Type = "api-key"
+	cfg.Auth.DemoModeResidualStrict = true
+
+	err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil)
+	if err == nil {
+		t.Fatal("strict mode + residue: expected error, got nil")
+	}
+	if !strings.Contains(err.Error(), "actor-demo-anon") {
+		t.Errorf("err = %q, want mention of actor-demo-anon", err.Error())
+	}
+	if !strings.Contains(err.Error(), "CERTCTL_DEMO_MODE_RESIDUAL_STRICT") {
+		t.Errorf("err = %q, want mention of CERTCTL_DEMO_MODE_RESIDUAL_STRICT", err.Error())
+	}
+}
+
+// TestDemoAnonResidueRow_String pins the formatting of the residue
+// detail entry — used both in the WARN log AND the audit row's
+// `residue` slice. Two cases: NULL scope_id (global scope) and
+// non-empty scope_id (profile/issuer scope).
+func TestDemoAnonResidueRow_String(t *testing.T) {
+	ts, _ := time.Parse(time.RFC3339, "2026-05-11T12:34:56Z")
+	cases := []struct {
+		name string
+		r    demoAnonResidueRow
+		want string
+	}{
+		{
+			name: "global_scope",
+			r:    demoAnonResidueRow{RoleID: "r-admin", ScopeType: "global", ScopeID: "", GrantedAt: ts},
+			want: "r-admin@global (granted 2026-05-11T12:34:56Z)",
+		},
+		{
+			name: "scoped",
+			r:    demoAnonResidueRow{RoleID: "r-operator", ScopeType: "profile", ScopeID: "p-prod", GrantedAt: ts},
+			want: "r-operator@profile/p-prod (granted 2026-05-11T12:34:56Z)",
+		},
+	}
+	for _, c := range cases {
+		c := c
+		t.Run(c.name, func(t *testing.T) {
+			got := c.r.String()
+			if got != c.want {
+				t.Errorf("String() = %q, want %q", got, c.want)
+			}
+		})
+	}
+}
+
+// TestDeleteDemoAnonResidue_Idempotent proves the cleanup helper is
+// re-entrant: a second call after a successful first call returns 0.
+func TestDeleteDemoAnonResidue_Idempotent(t *testing.T) {
+	db := setupA8DB(t)
+	resetA8Residue(t, db, true)
+
+	n, err := deleteDemoAnonResidue(context.Background(), db)
+	if err != nil {
+		t.Fatalf("first delete: %v", err)
+	}
+	if n < 1 {
+		t.Fatalf("first delete: count = %d, want >= 1", n)
+	}
+
+	n, err = deleteDemoAnonResidue(context.Background(), db)
+	if err != nil {
+		t.Fatalf("second delete: %v", err)
+	}
+	if n != 0 {
+		t.Errorf("second delete (idempotent): count = %d, want 0", n)
+	}
+}
+
+// TestQueryDemoAnonResidue_NilDB pins the nil-safety contract.
+func TestQueryDemoAnonResidue_NilDB(t *testing.T) {
+	_, err := queryDemoAnonResidue(context.Background(), nil)
+	if err == nil {
+		t.Fatal("expected error on nil db, got nil")
+	}
+}
+
+// TestDeleteDemoAnonResidue_NilDB pins the nil-safety contract.
+func TestDeleteDemoAnonResidue_NilDB(t *testing.T) {
+	_, err := deleteDemoAnonResidue(context.Background(), nil)
+	if err == nil {
+		t.Fatal("expected error on nil db, got nil")
+	}
+}
@@ -1,3 +1,6 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
 package main

 import (
@@ -0,0 +1,758 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package main
+
+import (
+	"context"
+	"crypto"
+	"crypto/tls"
+	"crypto/x509"
+	"encoding/pem"
+	"fmt"
+	"log/slog"
+	"net/http"
+	"os"
+	"strings"
+	"time"
+
+	"github.com/certctl-io/certctl/internal/api/handler"
+	oidcdomain "github.com/certctl-io/certctl/internal/auth/oidc/domain"
+	"github.com/certctl-io/certctl/internal/auth/session"
+	userdomain "github.com/certctl-io/certctl/internal/auth/user/domain"
+	"github.com/certctl-io/certctl/internal/domain"
+	authdomainAlias "github.com/certctl-io/certctl/internal/domain/auth"
+	"github.com/certctl-io/certctl/internal/repository"
+	"github.com/certctl-io/certctl/internal/repository/postgres"
+	"github.com/certctl-io/certctl/internal/scep/intune"
+	"github.com/certctl-io/certctl/internal/service"
+	authsvc "github.com/certctl-io/certctl/internal/service/auth"
+	"github.com/certctl-io/certctl/internal/trustanchor"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 8 (2026-05-14): extracted from
+// cmd/server/main.go. Different shape from the config.go cuts —
+// the move is by FUNCTIONAL CONCERN (boot-time preflight + DI
+// adapter wiring), not by TYPE FAMILY.
+//
+// Sprint 8 ships TWO of the three files the Phase 9 prompt names:
+//   - main.go      — entrypoint (unchanged; what's left after the cut)
+//   - wire.go      — this file (DI assembly: preflight helpers +
+//                    adapter types that bridge package boundaries)
+//
+// The third file the prompt names — migrations.go — is NOT in this
+// commit. See "What's NOT in this sprint" below for the deferral
+// rationale.
+//
+// What lives here
+// ===============
+// Seven preflight + DI helper functions:
+//   - preflightSCEPChallengePassword   (H-2 fix: SCEP needs non-empty
+//                                       shared secret if enabled)
+//   - preflightSCEPMTLSTrustBundle     (SCEP Phase 6.5: per-profile
+//                                       mTLS CA bundle validation)
+//   - preflightESTMTLSClientCATrustBundle (EST Phase 2.5: same shape,
+//                                       returns SIGHUP-reloadable
+//                                       *trustanchor.Holder)
+//   - preflightSCEPIntuneTrustAnchor   (SCEP Phase 8.2: Intune
+//                                       Connector signing-cert bundle)
+//   - loadSCEPRAPair                   (post-preflight cert+key load)
+//   - preflightSCEPRACertKey           (RA cert/key validation: file
+//                                       mode 0600, cert+key match,
+//                                       NotAfter, RSA-or-ECDSA alg)
+//   - preflightEnrollmentIssuer        (L-005: EST/SCEP issuer can
+//                                       serve GetCACertPEM)
+//   - buildFinalHandler                (M-001 option D: HTTP dispatch
+//                                       wrapper routing
+//                                       authenticated vs no-auth
+//                                       chains by URL prefix)
+//
+// Five adapter types that bridge package boundaries (avoid import
+// cycles between internal/auth, internal/service/auth,
+// internal/api/handler, internal/auth/oidc, internal/auth/session,
+// internal/auth/breakglass):
+//   - authPermissionCheckerAdapter      (typed-string → plain-string
+//                                        auth.PermissionChecker
+//                                        interface)
+//   - authCheckResolverAdapter          (postgres ActorRoleRepository
+//                                        → handler.AuthCheckResolver)
+//   - sessionMinterAdapter              (session.Service → OIDC
+//                                        SessionMinter port)
+//   - breakglassSessionMinterAdapter    (session.Service → breakglass
+//                                        SessionMinter port + audit
+//                                        2026-05-10 HIGH-1 revoke-all)
+//   - oidcProvidersListAdapter          (postgres OIDCProviderRepository
+//                                        → handler.OIDCProvidersListResolver
+//                                        with MED-9 enabled-filter)
+//
+// Plus the silenceUnusedImports var-block that pins
+// oidcdomain.OIDCProvider as a load-bearing reference (the adapter
+// types use *userdomain.User and repository.OIDCProviderRepository
+// indirectly; oidcdomain.OIDCProvider isn't named in any function
+// signature here but is part of the Phase 3 SessionMinter contract).
+//
+// What's NOT in this sprint (and why)
+// ===================================
+// migrations.go is deferred. The Phase 9 prompt asks for three files:
+// main.go (entrypoint) + wire.go (this file) + migrations.go (boot-
+// time migration handling). The migration code (Phase 4 DEPL-M1
+// --migrate-only flag handling + RunMigrations + RunSeed call +
+// CERTCTL_MIGRATIONS_VIA_HOOK gating) lives INLINE inside the 2300-
+// line main() function — lines ~59-264 in the original — not as a
+// standalone helper.
+//
+// Extracting it into a migrations.go would require:
+//   1. Creating a new unexported function (e.g.,
+//      runMigrations(ctx, cfg, db, logger) error) that consolidates
+//      lines ~71-77 (--migrate-only parse) + ~199-248 (the migration
+//      branch + --migrate-only early-exit) + ~250-264 (the demo
+//      overlay seed branch).
+//   2. Replacing the inline block in main() with a single call.
+//   3. Threading the early-exit semantics out (os.Exit(0) vs return
+//      "migration done" sentinel error vs a third option) so main's
+//      defer ordering doesn't change.
+//
+// That's behavior-change territory — a new function call frame, a
+// new defer scope, error-handling pattern shift. Different risk
+// shape from the pure-data type relocations Sprints 1-7 did. The
+// Phase 9 prompt says "Do NOT change exported type signatures; the
+// refactor is mechanical relocation; behavior change is a separate
+// concern." Extracting an inline block from main() into a new
+// function is the same shape of risk that rule was guarding against.
+//
+// Recommended path for the migrations.go cut:
+//   - Land it as a separate, smaller PR with its own review focus
+//     (the runMigrations function shape, the early-exit semantics,
+//     unit tests for the new function via the existing main_test.go
+//     fixture). The infrastructure for the PR exists today; only
+//     the operator's go-ahead on the behavior-change risk is needed.
+//   - Estimated impact: another ~80-120 LOC out of main.go (the
+//     migration + seed + early-exit block) into a new migrations.go.
+//   - Phase 4's --migrate-only code path already runs through this
+//     code section, so the extracted function should reproduce that
+//     exact flow without behavior change beyond the call-frame
+//     introduction.
+//
+// Public-surface invariant
+// ========================
+// The moved helpers + adapter types are all in package `main`
+// (which Go cannot expose to external importers). No exported
+// surface changes. The reorganization is invisible outside
+// cmd/server/. Same-package callers in main.go (preflight*
+// invocations, adapter instantiation) resolve via the package
+// symbol table without modification.
+
+// preflightSCEPChallengePassword enforces the H-2 fix: if SCEP is enabled, a
+// non-empty challenge password MUST be configured. Returns a non-nil error
+// otherwise so the caller can refuse to start the control plane (CWE-306,
+// missing authentication for a critical function).
+//
+// This helper is extracted so the check can be unit tested without booting
+// the full server. The caller (main) is responsible for translating the
+// returned error into a structured log line and os.Exit(1).
+func preflightSCEPChallengePassword(enabled bool, challengePassword string) error {
+	if !enabled {
+		return nil
+	}
+	if challengePassword == "" {
+		return fmt.Errorf("SCEP enabled but CERTCTL_SCEP_CHALLENGE_PASSWORD is empty: " +
+			"SCEP enrollment would accept any client (CWE-306); " +
+			"configure a non-empty shared secret or set CERTCTL_SCEP_ENABLED=false")
+	}
+	return nil
+}
+
+// preflightSCEPMTLSTrustBundle validates a per-profile mTLS client-CA
+// trust bundle. SCEP RFC 8894 + Intune master bundle Phase 6.5.
+//
+// Mirrors preflightSCEPRACertKey's no-op-when-disabled pattern; otherwise
+// the checks are:
+//
+//  1. Path is non-empty (the Validate() refuse covers this too, but
+//     preflight reports the specific failure with an actionable error
+//     string + os.Exit(1) at the call site).
+//  2. File exists + readable.
+//  3. PEM-decodes to ≥1 CERTIFICATE block.
+//  4. None of the bundled certs is past NotAfter — an expired trust
+//     anchor would silently reject every client cert at runtime.
+//
+// On success, returns the parsed *x509.CertPool ready to inject into the
+// per-profile SCEPHandler via SetMTLSTrustPool. Each bundled cert also
+// contributes to the union pool that backs the TLS-layer
+// VerifyClientCertIfGiven.
+func preflightSCEPMTLSTrustBundle(enabled bool, bundlePath string) (*x509.CertPool, error) {
+	if !enabled {
+		return nil, nil
+	}
+	if bundlePath == "" {
+		return nil, fmt.Errorf("MTLS enabled but trust bundle path empty: " +
+			"set CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH to a PEM file " +
+			"containing the bootstrap-CA certs the operator allows to enroll")
+	}
+	body, err := os.ReadFile(bundlePath)
+	if err != nil {
+		return nil, fmt.Errorf("read MTLS trust bundle: %w (path=%s)", err, bundlePath)
+	}
+	pool := x509.NewCertPool()
+	rest := body
+	count := 0
+	now := time.Now()
+	for {
+		var block *pem.Block
+		block, rest = pem.Decode(rest)
+		if block == nil {
+			break
+		}
+		if block.Type != "CERTIFICATE" {
+			continue
+		}
+		cert, err := x509.ParseCertificate(block.Bytes)
+		if err != nil {
+			return nil, fmt.Errorf("parse MTLS trust bundle cert: %w (path=%s)", err, bundlePath)
+		}
+		if now.After(cert.NotAfter) {
+			return nil, fmt.Errorf("MTLS trust bundle cert expired at %s (subject=%q, path=%s) — replace before restart",
+				cert.NotAfter.Format(time.RFC3339), cert.Subject.CommonName, bundlePath)
+		}
+		pool.AddCert(cert)
+		count++
+	}
+	if count == 0 {
+		return nil, fmt.Errorf("MTLS trust bundle contained no CERTIFICATE PEM blocks (path=%s)", bundlePath)
+	}
+	return pool, nil
+}
+
+// preflightESTMTLSClientCATrustBundle validates a per-profile EST mTLS
+// client-CA trust bundle and returns a SIGHUP-reloadable holder.
+//
+// EST RFC 7030 hardening master bundle Phase 2.5.
+//
+// Mirrors preflightSCEPMTLSTrustBundle's checks (file exists, parses as
+// PEM, ≥1 cert, none expired) but returns a *trustanchor.Holder rather
+// than a raw *x509.CertPool — the EST handler stores the holder so a
+// SIGHUP rotates the trust bundle live without a server restart, exactly
+// the way the Intune trust anchor rotation works (Phase 8.5 of the SCEP
+// bundle). The handler-side .Pool() accessor on the holder rebuilds an
+// x509.CertPool from the current snapshot for each Verify call.
+//
+// Uses the shared internal/trustanchor.LoadBundle (extracted in EST
+// hardening Phase 2.1 from the original Intune-only path) so the EST
+// + Intune callers exercise the same loader semantics — empty bundle
+// rejected, expired cert rejected with subject in error message,
+// non-CERTIFICATE PEM blocks tolerated.
+func preflightESTMTLSClientCATrustBundle(enabled bool, pathID, bundlePath string, logger *slog.Logger) (*trustanchor.Holder, error) {
+	if !enabled {
+		return nil, nil
+	}
+	if bundlePath == "" {
+		return nil, fmt.Errorf("EST profile (PathID=%q) MTLS enabled but trust bundle path empty: "+
+			"set CERTCTL_EST_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH to a PEM file "+
+			"containing the bootstrap-CA certs the operator allows to enroll", pathID)
+	}
+	holder, err := trustanchor.New(bundlePath, logger)
+	if err != nil {
+		return nil, fmt.Errorf("EST profile (PathID=%q) MTLS trust bundle preflight: %w", pathID, err)
+	}
+	holder.SetLabelForLog(fmt.Sprintf("EST mTLS client CA bundle (PathID=%q)", pathID))
+	return holder, nil
+}
+
+// preflightSCEPIntuneTrustAnchor validates a per-profile Microsoft Intune
+// Certificate Connector signing-cert trust bundle.
+//
+// SCEP RFC 8894 + Intune master bundle Phase 8.2.
+//
+// No-op when this profile has Intune disabled (the common case for
+// non-Intune SCEP deploys). When enabled:
+//
+//  1. Path is non-empty (Validate() refuse covers this too; we re-check
+//     here so the caller can os.Exit(1) with the specific PathID in the
+//     log line).
+//  2. File exists + readable.
+//  3. PEM-decodes to ≥1 CERTIFICATE block (intune.LoadTrustAnchor enforces
+//     this and skips non-CERTIFICATE blocks like accidentally-pasted
+//     priv-key blocks).
+//  4. None of the bundled certs is past NotAfter — an expired Intune
+//     trust anchor would silently reject every Connector challenge at
+//     runtime, which is a much worse failure mode than failing fast at
+//     boot. intune.LoadTrustAnchor enforces this and surfaces the subject
+//     CN in the error message so the operator knows which cert to rotate.
+//
+// On success returns the freshly-built *intune.TrustAnchorHolder ready to
+// inject into the per-profile SCEPService via SetIntuneIntegration. The
+// holder also installs the SIGHUP watcher (started by the caller).
+func preflightSCEPIntuneTrustAnchor(enabled bool, pathID, path string, logger *slog.Logger) (*intune.TrustAnchorHolder, error) {
+	if !enabled {
+		return nil, nil
+	}
+	// pathIDLabel renders the empty-string PathID as "<root>" so the
+	// operator's boot-log error doesn't read like a missing variable.
+	pathIDLabel := pathID
+	if pathIDLabel == "" {
+		pathIDLabel = "<root>"
+	}
+	if path == "" {
+		return nil, fmt.Errorf("SCEP profile (PathID=%q) INTUNE enabled but trust anchor path empty: "+
+			"set CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH to a PEM bundle "+
+			"of the Microsoft Intune Certificate Connector's signing certs", pathIDLabel)
+	}
+	holder, err := intune.NewTrustAnchorHolder(path, logger)
+	if err != nil {
+		return nil, fmt.Errorf("SCEP profile (PathID=%q) INTUNE trust anchor load failed: %w (path=%s)", pathIDLabel, err, path)
+	}
+	return holder, nil
+}
+
+// loadSCEPRAPair reads the RA cert PEM + key PEM and returns the parsed
+// x509.Certificate + crypto.PrivateKey ready for the SCEP handler's RFC
+// 8894 path. Called AFTER preflightSCEPRACertKey passed; failures here
+// indicate a TOCTOU race or a filesystem change between preflight and
+// the load (rare).
+//
+// Cert PEM may carry a chain (CA + RA + intermediate); we use the FIRST
+// CERTIFICATE block, matching the RFC 8894 §3.5.1 single-cert convention
+// for the GetCACert response.
+func loadSCEPRAPair(certPath, keyPath string) (*x509.Certificate, crypto.PrivateKey, error) {
+	certPEM, err := os.ReadFile(certPath)
+	if err != nil {
+		return nil, nil, fmt.Errorf("read RA cert: %w", err)
+	}
+	keyPEM, err := os.ReadFile(keyPath)
+	if err != nil {
+		return nil, nil, fmt.Errorf("read RA key: %w", err)
+	}
+	pair, err := tls.X509KeyPair(certPEM, keyPEM)
+	if err != nil {
+		return nil, nil, fmt.Errorf("parse RA pair: %w", err)
+	}
+	if len(pair.Certificate) == 0 {
+		return nil, nil, fmt.Errorf("RA cert PEM contained no certificate blocks")
+	}
+	leaf, err := x509.ParseCertificate(pair.Certificate[0])
+	if err != nil {
+		return nil, nil, fmt.Errorf("parse RA cert: %w", err)
+	}
+	return leaf, pair.PrivateKey, nil
+}
+
+// preflightSCEPRACertKey validates the RA cert/key pair the RFC 8894 SCEP
+// path requires. Mirrors preflightSCEPChallengePassword's no-op-when-disabled
+// pattern; otherwise the checks are:
+//
+//  1. Both paths are non-empty (the Validate() refuse covers this too,
+//     but preflight reports the specific failure mode + os.Exit(1) so the
+//     operator sees a clear log line in addition to the config error).
+//  2. The key file mode is 0600 (refuse world-/group-readable RA key —
+//     defense-in-depth against credential leak via a misconfigured
+//     deploy that leaves /etc/certctl/scep/*.key as 0644).
+//  3. Cert PEM parses to exactly one x509.Certificate.
+//  4. Key PEM parses to a Go crypto.Signer (RSA or ECDSA — RFC 8894
+//     §3.5.2 advertises those as the CMS-compatible algorithms).
+//  5. The cert's PublicKey matches the key's Public() — refuses pairs
+//     accidentally swapped between profiles in a multi-profile config.
+//  6. The cert's NotAfter is in the future — an expired RA cert would
+//     fail TLS handshake on EnvelopedData decryption per RFC 5652.
+//
+// Each check returns a wrapped error; the caller (main) is responsible for
+// translating to a structured slog.Error + os.Exit(1) so the helper stays
+// unit-testable without booting the full server.
+func preflightSCEPRACertKey(enabled bool, raCertPath, raKeyPath string) error {
+	if !enabled {
+		return nil
+	}
+	if raCertPath == "" || raKeyPath == "" {
+		return fmt.Errorf("SCEP enabled but RA pair missing: " +
+			"set CERTCTL_SCEP_RA_CERT_PATH + CERTCTL_SCEP_RA_KEY_PATH " +
+			"(RFC 8894 §3.2.2 requires an RA pair so clients can encrypt the " +
+			"CSR to the RA cert and the server can sign the CertRep response)")
+	}
+
+	// File mode check FIRST so a world-readable key never gets read into the
+	// process address space. Ignored on Windows (Stat().Mode() doesn't carry
+	// POSIX bits there); the production deploy is Linux per the Dockerfile.
+	keyInfo, err := os.Stat(raKeyPath)
+	if err != nil {
+		return fmt.Errorf("CERTCTL_SCEP_RA_KEY_PATH stat failed: %w (path=%s)", err, raKeyPath)
+	}
+	mode := keyInfo.Mode().Perm()
+	if mode&0o077 != 0 {
+		return fmt.Errorf("CERTCTL_SCEP_RA_KEY_PATH has insecure permissions %#o; "+
+			"RA private key must be mode 0600 (owner read/write only) — "+
+			"chmod 0600 %s and restart", mode, raKeyPath)
+	}
+
+	certPEM, err := os.ReadFile(raCertPath)
+	if err != nil {
+		return fmt.Errorf("CERTCTL_SCEP_RA_CERT_PATH read failed: %w (path=%s)", err, raCertPath)
+	}
+	keyPEM, err := os.ReadFile(raKeyPath)
+	if err != nil {
+		return fmt.Errorf("CERTCTL_SCEP_RA_KEY_PATH read failed: %w (path=%s)", err, raKeyPath)
+	}
+
+	// tls.X509KeyPair validates that the cert + key parse, share an algorithm,
+	// and the cert's PublicKey matches the key's Public() — three of our six
+	// checks in a single stdlib call, so we use it rather than re-implementing.
+	pair, err := tls.X509KeyPair(certPEM, keyPEM)
+	if err != nil {
+		return fmt.Errorf("RA cert/key pair invalid: %w "+
+			"(cert=%s key=%s) — verify the cert and key are matching halves of "+
+			"the same RA pair, both PEM-encoded, with the cert containing exactly "+
+			"one CERTIFICATE block and the key containing one PRIVATE KEY block",
+			err, raCertPath, raKeyPath)
+	}
+	if len(pair.Certificate) == 0 {
+		// Defensive — tls.X509KeyPair already errors on this, but the contract
+		// for the next x509.ParseCertificate call needs the slice non-empty.
+		return fmt.Errorf("RA cert PEM at %s contains no certificate blocks", raCertPath)
+	}
+
+	// Re-parse the leaf so we can read NotAfter + the public-key alg.
+	leaf, err := x509.ParseCertificate(pair.Certificate[0])
+	if err != nil {
+		return fmt.Errorf("RA cert at %s does not parse as x509: %w", raCertPath, err)
+	}
+	if time.Now().After(leaf.NotAfter) {
+		return fmt.Errorf("RA cert at %s expired at %s — "+
+			"generate a fresh RA pair (the SCEP CertRep signature would be "+
+			"rejected by every conformant client)", raCertPath, leaf.NotAfter.Format(time.RFC3339))
+	}
+
+	// CMS-compatible public-key algorithm gate. RFC 8894 §3.5.2 advertises RSA
+	// and AES; the responder cert algorithm pertains to the signature scheme
+	// used on the CertRep, which means the cert's PublicKey must be RSA or
+	// ECDSA. Catches pre-shared Ed25519 dev keys that micromdm/scep clients
+	// reject.
+	switch leaf.PublicKeyAlgorithm {
+	case x509.RSA, x509.ECDSA:
+		// ok — supported by golang.org/x/crypto/ocsp + every SCEP client
+	default:
+		return fmt.Errorf("RA cert at %s uses unsupported public-key algorithm %s — "+
+			"RFC 8894 §3.5.2 CMS signing requires RSA or ECDSA",
+			raCertPath, leaf.PublicKeyAlgorithm)
+	}
+
+	return nil
+}
+
+// preflightEnrollmentIssuer validates at startup that an EST/SCEP-bound issuer
+// can actually serve a CA certificate. This closes audit finding L-005:
+// pre-Bundle-4 the EST/SCEP startup path verified the issuer existed in the
+// registry but did not verify the issuer TYPE could emit a CA cert. An
+// operator who bound CERTCTL_EST_ISSUER_ID to an ACME issuer (which does
+// not have a static CA cert — see internal/connector/issuer/acme/acme.go::
+// GetCACertPEM returning an explicit error) would boot successfully and
+// only see failures at the first /est/cacerts request, hiding the misconfig
+// for hours/days behind a degraded enrollment surface.
+//
+// Strategy: call issuerConn.GetCACertPEM(ctx) at startup with a short
+// timeout. If the issuer can serve a CA cert (local, vault, openssl,
+// stepca, awsacmpca, etc.), the call succeeds and we proceed. If not
+// (acme, digicert, sectigo, entrust, googlecas, ejbca, globalsign — most
+// vendor-CA issuers that hand back chains per-issuance), the call fails
+// loudly with the connector's own error string, and the caller os.Exit(1)s.
+//
+// Returns nil on success, non-nil error suitable for structured logging
+// + os.Exit(1) by the caller. Caller is responsible for the timeout context.
+func preflightEnrollmentIssuer(ctx context.Context, protocol, issuerID string, issuerConn service.IssuerConnector) error {
+	if issuerConn == nil {
+		return fmt.Errorf("%s issuer %q: connector is nil", protocol, issuerID)
+	}
+	caCertPEM, err := issuerConn.GetCACertPEM(ctx)
+	if err != nil {
+		return fmt.Errorf("%s issuer %q: cannot serve CA certificate (%w); "+
+			"choose an issuer type that exposes a static CA chain "+
+			"(local / vault / openssl / stepca / awsacmpca) or disable %s",
+			protocol, issuerID, err, protocol)
+	}
+	if caCertPEM == "" {
+		return fmt.Errorf("%s issuer %q: GetCACertPEM returned empty PEM with no error; "+
+			"choose an issuer type that exposes a static CA chain", protocol, issuerID)
+	}
+	return nil
+}
+
+// buildFinalHandler builds the outer HTTP dispatch handler that routes incoming
+// requests to either the authenticated apiHandler chain or the unauthenticated
+// noAuthHandler chain based on URL path prefix. Extracted from main() so the
+// dispatch logic can be unit tested without booting the full server stack
+// (see cmd/server/finalhandler_test.go).
+//
+// Dispatch rules (M-001, audit 2026-04-19, option D):
+//
+//   - /health, /ready, /api/v1/auth/info           → no-auth (probes + login detection)
+//   - /api/v1/version                              → no-auth (U-3 ride-along: build identity for rollout/probes)
+//   - /.well-known/pki/*                           → no-auth (RFC 5280 CRL, RFC 6960 OCSP)
+//   - /.well-known/est/*                           → no-auth (RFC 7030 §3.2.3)
+//   - /scep, /scep/*                               → no-auth (RFC 8894 §3.2, CSR challengePassword)
+//   - /api/v1/*                                    → auth (Bearer token required)
+//   - /assets/*                                    → static file server (dashboard only)
+//   - anything else                                → SPA index.html fallback (dashboard only)
+//     OR apiHandler (no dashboard)
+//
+// EST/SCEP clients (IoT devices, 802.1X supplicants, MDM endpoints, network
+// appliances) cannot present certctl Bearer tokens, so those endpoints must be
+// reachable without the Auth middleware. Authentication is instead enforced by
+// CSR signature verification, profile policy gates, and for SCEP the
+// challengePassword shared secret (fail-loud gated by preflightSCEPChallengePassword
+// above).
+//
+// webDir must point to a directory containing index.html + assets/ when
+// dashboardEnabled is true; it is ignored otherwise.
+func buildFinalHandler(apiHandler, noAuthHandler http.Handler, webDir string, dashboardEnabled bool) http.Handler {
+	var fileServer http.Handler
+	if dashboardEnabled {
+		fileServer = http.FileServer(http.Dir(webDir))
+	}
+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		path := r.URL.Path
+
+		// Health/ready, auth/info, and version bypass auth middleware.
+		// Health/ready: Docker/K8s health probes don't carry Bearer tokens.
+		// auth/info: React app calls this before login to detect auth mode.
+		// version: U-3 ride-along (cat-u-no_version_endpoint) — rollout
+		// systems and blackbox probes need build identity without a key.
+		if path == "/health" || path == "/ready" || path == "/api/v1/auth/info" || path == "/api/v1/version" {
+			noAuthHandler.ServeHTTP(w, r)
+			return
+		}
+
+		// RFC 5280 CRL and RFC 6960 OCSP live under /.well-known/pki/ and MUST
+		// be served unauthenticated — relying parties (browsers, OpenSSL, OCSP
+		// stapling sidecars, mTLS clients) cannot present certctl Bearer tokens.
+		if strings.HasPrefix(path, "/.well-known/pki") {
+			noAuthHandler.ServeHTTP(w, r)
+			return
+		}
+
+		// RFC 7030 EST endpoints ride the no-auth middleware chain (M-001,
+		// option D, audit 2026-04-19). Trust boundary is CSR signature +
+		// (per EST hardening Phase 2) optional client cert at the handler
+		// layer, not HTTP Bearer. /.well-known/est/cacerts is explicitly
+		// anonymous per RFC 7030 §4.1.1; /.well-known/est-mtls/<PathID>/
+		// (EST hardening Phase 2 sibling route) requires a client cert
+		// gate at the handler layer — both share this prefix gate because
+		// "/.well-known/est-mtls" is itself prefixed by "/.well-known/est".
+		// EST hardening Phase 3's HTTP Basic enrollment-password is a
+		// per-profile handler-layer auth that runs INSIDE the no-auth
+		// middleware chain (since the chain skips the Bearer middleware,
+		// the handler gets to define its own auth contract).
+		if strings.HasPrefix(path, "/.well-known/est") {
+			noAuthHandler.ServeHTTP(w, r)
+			return
+		}
+
+		// RFC 8894 SCEP rides the no-auth chain (M-001, option D). SCEP clients
+		// authenticate via the challengePassword attribute in the PKCS#10 CSR,
+		// not via HTTP Bearer tokens. preflightSCEPChallengePassword refuses to
+		// start the server if SCEP is enabled without a non-empty shared secret.
+		//
+		// SCEP RFC 8894 + Intune master bundle Phase 6.5: the sibling
+		// /scep-mtls[/<pathID>] route also rides the no-auth chain. Its
+		// auth boundary is (a) client cert verified at the TLS layer +
+		// re-verified per-profile at the handler layer, plus (b) the
+		// challenge password — neither is a Bearer token. The /scepxyz
+		// vs /scep-mtls disambiguation: 'xyz' starts with a letter so the
+		// HasPrefix(path, "/scep/") gate doesn't match it; 'mtls' is its
+		// own dedicated prefix gated below to avoid the same overlap.
+		if path == "/scep" || strings.HasPrefix(path, "/scep/") {
+			noAuthHandler.ServeHTTP(w, r)
+			return
+		}
+		if path == "/scep-mtls" || strings.HasPrefix(path, "/scep-mtls/") {
+			noAuthHandler.ServeHTTP(w, r)
+			return
+		}
+
+		// Authenticated API routes — full middleware stack including Auth.
+		if strings.HasPrefix(path, "/api/v1/") {
+			apiHandler.ServeHTTP(w, r)
+			return
+		}
+
+		if !dashboardEnabled {
+			// No dashboard: everything non-special falls through to the
+			// authenticated handler (preserves pre-M-001 behavior for API-only
+			// deployments).
+			apiHandler.ServeHTTP(w, r)
+			return
+		}
+
+		// Dashboard-present: serve static assets directly, SPA fallback for
+		// everything else.
+		if strings.HasPrefix(path, "/assets/") {
+			fileServer.ServeHTTP(w, r)
+			return
+		}
+		http.ServeFile(w, r, webDir+"/index.html")
+	})
+}
+
+// authPermissionCheckerAdapter bridges the typed-string Authorizer
+// signature (authsvc.Authorizer.CheckPermission takes
+// authdomain.ActorTypeValue + authdomain.ScopeType) to the plain-string
+// auth.PermissionChecker interface used by the auth.RequirePermission
+// middleware factory. Lives in cmd/server so internal/auth doesn't have
+// to import internal/service/auth + internal/domain/auth (would create
+// a cycle).
+type authPermissionCheckerAdapter struct {
+	a *authsvc.Authorizer
+}
+
+func (ad authPermissionCheckerAdapter) CheckPermission(
+	ctx context.Context,
+	actorID string,
+	actorType string,
+	tenantID string,
+	permission string,
+	scopeType string,
+	scopeID *string,
+) (bool, error) {
+	return ad.a.CheckPermission(
+		ctx,
+		actorID,
+		authdomainAlias.ActorTypeValue(actorType),
+		tenantID,
+		permission,
+		authdomainAlias.ScopeType(scopeType),
+		scopeID,
+	)
+}
+
+// authCheckResolverAdapter bridges the postgres ActorRoleRepository
+// (authdomain.ActorTypeValue) to handler.AuthCheckResolver
+// (domain.ActorType). Lives in cmd/server so the handler layer keeps its
+// existing import set; the GUI's /v1/auth/check probe round-trips
+// through this on every page load. Read-only — no caller / no audit row.
+//
+// Bundle 1 Phase 3 closure (M1): the equivalent surface area on
+// /v1/auth/me runs through the service layer's auth.role.list permission
+// gate, which the GUI may not yet hold during initial render. AuthCheck
+// has no permission gate (its only requirement is "the request
+// authenticated"), so the bypass is by design.
+type authCheckResolverAdapter struct {
+	repo *postgres.ActorRoleRepository
+}
+
+func (ad authCheckResolverAdapter) ListRoles(
+	ctx context.Context,
+	actorID string,
+	actorType domain.ActorType,
+	tenantID string,
+) ([]*authdomainAlias.ActorRole, error) {
+	return ad.repo.ListByActor(ctx, actorID, authdomainAlias.ActorTypeValue(actorType), tenantID)
+}
+
+func (ad authCheckResolverAdapter) EffectivePermissions(
+	ctx context.Context,
+	actorID string,
+	actorType domain.ActorType,
+	tenantID string,
+) ([]repository.EffectivePermission, error) {
+	return ad.repo.EffectivePermissions(ctx, actorID, authdomainAlias.ActorTypeValue(actorType), tenantID)
+}
+
+// =============================================================================
+// sessionMinterAdapter — bridge from *session.Service to oidcsvc.SessionMinter.
+//
+// The OIDC service's SessionMinter port (Phase 3) takes a *userdomain.User
+// + role IDs and returns (cookie, csrf, err). The session.Service's
+// Create method takes (actorID, actorType, ip, ua) -> *CreateResult.
+// This adapter unwraps the User into actorID/actorType + reshapes the
+// return tuple. Lives in cmd/server so the session package doesn't have
+// to know about user.User and the user package doesn't have to know
+// about session.CreateResult.
+// =============================================================================
+
+type sessionMinterAdapter struct {
+	svc *session.Service
+}
+
+func (a *sessionMinterAdapter) MintForUser(
+	ctx context.Context,
+	user *userdomain.User,
+	_ []string, // roleIDs unused at the session-mint layer; the rbac middleware looks them up at request time
+	ip, userAgent string,
+) (cookieValue, csrfToken string, err error) {
+	if user == nil {
+		return "", "", fmt.Errorf("session mint: user is nil")
+	}
+	res, err := a.svc.Create(ctx, user.ID, string(domain.ActorTypeUser), ip, userAgent)
+	if err != nil {
+		return "", "", err
+	}
+	return res.CookieValue, res.CSRFToken, nil
+}
+
+// silenceUnusedImports keeps the new oidcsvc + oidcdomain imports load-
+// bearing in case any file shuffles. Linker dead-code elimination handles
+// the runtime cost.
+var (
+	_ = oidcdomain.OIDCProvider{}
+)
+
+// =============================================================================
+// breakglassSessionMinterAdapter — bridge from *session.Service to
+// breakglass.SessionMinter.
+//
+// The break-glass service's SessionMinter port (Phase 7.5) returns
+// (cookie, csrf, err); the underlying *session.Service.Create returns
+// *CreateResult. This adapter unwraps the result. Lives in cmd/server
+// so the breakglass package doesn't have to know about session.Service.
+// =============================================================================
+
+type breakglassSessionMinterAdapter struct {
+	svc *session.Service
+}
+
+func (a breakglassSessionMinterAdapter) Create(ctx context.Context, actorID, actorType, ip, userAgent string) (string, string, error) {
+	res, err := a.svc.Create(ctx, actorID, actorType, ip, userAgent)
+	if err != nil {
+		return "", "", err
+	}
+	return res.CookieValue, res.CSRFToken, nil
+}
+
+// RevokeAllForActor — Audit 2026-05-10 HIGH-1 wire. After a break-glass
+// password rotation or credential removal, every active session for the
+// target actor must be revoked so a phished-then-rotated credential
+// doesn't leave the attacker's session live.
+func (a breakglassSessionMinterAdapter) RevokeAllForActor(ctx context.Context, actorID, actorType string) error {
+	return a.svc.RevokeAllForActor(ctx, actorID, actorType)
+}
+
+// oidcProvidersListAdapter bridges the postgres OIDCProviderRepository
+// to handler.OIDCProvidersListResolver. The handler returns
+// []*OIDCProviderInfo (id + display_name + login_url) for the public-
+// safe GUI Login-page payload; the repo returns the full OIDCProvider
+// row. The adapter projects + maps the login_url shape that
+// /auth/oidc/login?provider=<id> expects. Auth Bundle 2 Phase 6 /
+// Category E.
+type oidcProvidersListAdapter struct {
+	repo repository.OIDCProviderRepository
+}
+
+func (a oidcProvidersListAdapter) List(ctx context.Context, tenantID string) ([]*handler.OIDCProviderInfo, error) {
+	provs, err := a.repo.List(ctx, tenantID)
+	if err != nil {
+		return nil, err
+	}
+	out := make([]*handler.OIDCProviderInfo, 0, len(provs))
+	for _, p := range provs {
+		// Audit 2026-05-10 MED-9 closure — filter disabled providers
+		// at the adapter so the LoginPage's "Sign in with X" buttons
+		// don't render for offline IdPs. The HandleAuthRequest
+		// service-layer ErrProviderDisabled check is the
+		// defense-in-depth guard for direct API / MCP / CLI callers.
+		if !p.Enabled {
+			continue
+		}
+		out = append(out, &handler.OIDCProviderInfo{
+			ID:          p.ID,
+			DisplayName: p.Name,
+			LoginURL:    "/auth/oidc/login?provider=" + p.ID,
+		})
+	}
+	return out, nil
+}
@@ -1,159 +0,0 @@
-# CI Pipeline Cleanup — Phase 0 Baseline
-
-> Captured against repo HEAD `1de61e91cf07449356d9046a76499c86efe413b1` (operator tag `v2.0.66`) on 2026-04-30.
-> Each subsequent Phase that changes a number references this baseline.
-
-## Repo state
-
-**HEAD SHA:** `1de61e91cf07449356d9046a76499c86efe413b1`
-
-**Operator-stamped tag:** `v2.0.66`
-
-## ci.yml shape
-
- Total lines: `1488`
- Total named steps: `53`
- Named regression-guard steps: 22 (enumerated below)
-
-### The 22 regression-guard steps
-
-```
-81:      - name: Forbidden auth-type literal regression guard (G-1)
-144:      - name: Forbidden bare InsecureSkipVerify regression guard (L-001)
-180:      - name: Forbidden bare FROM regression guard (H-001)
-201:      - name: Forbidden missing USER regression guard (M-012)
-228:      - name: Forbidden README JWT advertising regression guard (H-009)
-254:      - name: Forbidden api_key_hash JSON-shape regression guard (G-2)
-311:      - name: Forbidden plaintext HEALTHCHECK regression guard (U-2)
-360:      - name: Forbidden migration mount in compose initdb (U-3)
-417:      - name: Forbidden StatusBadge dead-key + TS phantom-field regression guard (D-1 + D-2)
-569:      - name: Forbidden client-side bulk-action loop regression guard (L-1)
-613:      - name: Forbidden orphan-CRUD client function regression guard (B-1)
-665:      - name: Forbidden strings.Contains(err.Error()) regression guard (S-2)
-868:      - name: QA-doc Part-count drift guard
-886:      - name: QA-doc seed-count drift guard
-938:      - name: Test-naming convention guard (hard-fail)
-982:      - name: Forbidden hardcoded source-count prose regression guard (S-1)
-1027:      - name: Documented orphan client fns sync guard (P-1)
-1063:      - name: Frontend page-coverage regression guard (T-1)
-1118:      - name: Bundle-8 / L-015 target=_blank rel=noopener regression guard
-1147:      - name: Bundle-8 / L-019 dangerouslySetInnerHTML regression guard
-1176:      - name: Bundle-8 / M-009 + M-029 Pass 1 mutation contract guard (hard zero)
-1220:      - name: Forbidden env-var docs drift regression guard (G-3)
-```
-
-## SA1019 site count
-
- **Operator-on-workstation deliverable** — sandbox cannot run `staticcheck`.
- ci.yml inline comment claims "6 sites" (`middleware.NewAuth × 3`, `csr.Attributes`, `elliptic.Marshal`).
- Source-grep at HEAD shows:
-  - `internal/api/handler/scep.go`: `csr.Attributes` references present
-  - `internal/connector/issuer/local/local.go`: `elliptic.Marshal` historic refs (already migrated per bundle9_coverage_test.go byte-equivalence test)
-  - `cmd/server/main_test.go`: `middleware.NewAuth` references TBD
- Operator must run `staticcheck ./... 2>&1 | grep SA1019` on workstation and update Phase 3 plan with the actual site list.
-
-## Dockerfile inventory (verified 4)
-
-```
-./Dockerfile.agent
-./Dockerfile
-./deploy/test/f5-mock-icontrol/Dockerfile
-./deploy/test/libest/Dockerfile
-```
-
-## Migration up/down balance
-
- ups: `24`
- downs: `24`
- missing downs: `0`
-
-## OpenAPI ↔ handler parity gap (verified)
-
- operationIds in api/openapi.yaml: `136`
- r.Register calls in router.go: `149`
- Gap to root-cause in Phase 9: 13 routes
-
-## docker-compose.test.yml sidecars
-
-```
-52:  certctl-tls-init:
-107:  postgres:
-135:  pebble-challtestsrv:
-150:  pebble:
-178:  step-ca:
-213:  certctl-server:
-363:  nginx:
-391:  certctl-agent:
-449:  libest-client:
-488:  apache-test:
-502:  haproxy-test:
-515:  traefik-test:
-533:  caddy-test:
-548:  envoy-test:
-562:  postfix-test:
-577:  dovecot-test:
-591:  openssh-test:
-613:  f5-mock-icontrol:
-631:  k8s-kind-test:
-648:  windows-iis-test:
-666:  certctl-test:
-```
-
-## Makefile::verify body (existing)
-
-```
-verify:
-	@echo "==> fmt"
-	@go fmt ./... | { ! grep -q '.'; } || (echo "gofmt produced changes — commit them" && exit 1)
-	@echo "==> go vet ./..."
-	@go vet ./...
-	@echo "==> golangci-lint run ./... (incl. staticcheck ST*)"
-	@which golangci-lint > /dev/null || (echo "Installing golangci-lint..." && go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest)
-	@golangci-lint run ./... --timeout 5m
-	@echo "==> go test -short ./..."
-	@go test -short -count=1 ./...
-	@echo ""
-	@echo "verify: PASS — safe to commit"
-
-```
-
-## RAM headroom for collapsed vendor-e2e job
-
- **Operator-on-workstation deliverable** — requires a prototype branch with the collapsed job + `docker stats` polling.
- Per Phase 0 frozen decision 0.14: if peak RSS ≤ 12 GB on ubuntu-latest (16 GB ceiling), single-job collapse is approved.
- If > 12 GB, fall back to bucketed-matrix design documented in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
-
-## Coverage thresholds at HEAD
-
-```
-778:          if [ "$(echo "$SERVICE_COV < 70" | bc -l)" -eq 1 ]; then
-779:            echo "::error::Service layer coverage ${SERVICE_COV}% is below 70% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
-782:          if [ "$(echo "$HANDLER_COV < 75" | bc -l)" -eq 1 ]; then
-783:            echo "::error::Handler layer coverage ${HANDLER_COV}% is below 75% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
-786:          if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
-787:            echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
-790:          if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
-791:            echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
-802:          if [ "$(echo "$CRYPTO_COV < 88" | bc -l)" -eq 1 ]; then
-803:            echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 88% (Bundle R closure floor — add tests, do not lower the gate)"
-832:          if [ "$(echo "$LOCAL_ISSUER_COV < 86" | bc -l)" -eq 1 ]; then
-833:            echo "::error::Local-issuer coverage ${LOCAL_ISSUER_COV}% is below 86% (Bundle R closure floor — add tests, do not lower the gate)"
-842:          if [ "$(echo "$ACME_COV < 80" | bc -l)" -eq 1 ]; then
-843:            echo "::error::ACME issuer coverage ${ACME_COV}% is below 80% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
-846:          if [ "$(echo "$STEPCA_COV < 80" | bc -l)" -eq 1 ]; then
-847:            echo "::error::StepCA issuer coverage ${STEPCA_COV}% is below 80% (Bundle L.B closure floor — add tests, do not lower the gate)"
-850:          if [ "$(echo "$MCP_COV < 85" | bc -l)" -eq 1 ]; then
-851:            echo "::error::MCP coverage ${MCP_COV}% is below 85% (Bundle K closure floor — add tests, do not lower the gate)"
-```
-
-## CodeQL workflow (no changes)
-
- File: `.github/workflows/codeql.yml` (`81` lines)
- Matrix: `[go, javascript-typescript]` — 2 status checks per push
- Trigger: push to master, PR to master, weekly Sunday cron
-
-## Status check accounting (verified)
-
-Today: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 12 `deploy-vendor-e2e (<vendor>)` + 2 `deploy-vendor-e2e-windows (<vendor>)` + 2 `CodeQL Analyze (<lang>)` = **19 status checks per push**.
-
-After cleanup: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 1 `deploy-vendor-e2e` + 1 `image-and-supply-chain` + 2 `CodeQL Analyze (<lang>)` = **7 status checks per push**.
@@ -1,53 +0,0 @@
-# CI Pipeline Cleanup — Deliberate Revisions of Bundle II Decisions
-
-This bundle deliberately revises two Bundle II frozen decisions. Both revisions are recorded here for audit trail and acknowledged in the per-Phase commits that implement them.
-
-## Bundle II decision 0.4 → revised by ci-pipeline-cleanup decision 0.5
-
-**Bundle II 0.4 (original):** "IIS e2e strategy — `mcr.microsoft.com/windows/servercore:ltsc2022` Windows containers via Docker Desktop on Windows hosts. Linux CI runners CAN'T run Windows containers, so the IIS e2e suite runs on a separate Windows-runner CI matrix job (or operator's local Windows host for development). Documented limitation."
-
-**ci-pipeline-cleanup 0.5 (revision):** Delete the Windows-runner CI matrix entirely.
-
-**Rationale for revision:**
-
-1. The matrix can't physically work on `windows-latest` GitHub-hosted runners today. Verified via the failure logs from CI run `25183374742` (commit `1de61e9`):
-   - `wincertstore` job: `error during connect: ... open //./pipe/docker_engine: The system cannot find the file specified` — Docker daemon not started in Windows-containers mode.
-   - `iis` job: image pulled successfully (so the new digest is correct), then died at `failed to create network deploy_certctl-test: could not find plugin bridge in v1 plugin registry: plugin not found` — `bridge` network driver doesn't exist on Windows Docker (uses `nat`).
-
-2. Even if both Docker-daemon and network-driver issues were fixed, the matrix would validate nothing of substance. Verified by source-grep: all 16 functions matching `TestVendorEdge_(IIS|WinCertStore)_*` in `deploy/test/vendor_e2e_phase3_to_13_test.go` are `t.Log` placeholders that exercise no IIS-specific behavior. The real IIS connector validation lives in `internal/connector/target/iis/` unit tests (run on Linux in `go-build-and-test` — already green per push).
-
-3. Bundle II decision 0.14 explicitly required operator manual smoke against a real instance for "verified" status in the vendor matrix. Moving IIS + WinCertStore validation to a documented operator playbook in `docs/connector-iis.md` satisfies that criterion better than a fake CI matrix that passes by skipping.
-
-**Preservation:** the `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` under `profiles: [deploy-e2e-windows]` — operators on a Windows host can opt in via `docker compose --profile deploy-e2e-windows up -d windows-iis-test`. Linux CI never activates this profile.
-
-## Bundle II decision 0.9 → revised by ci-pipeline-cleanup decision 0.4
-
-**Bundle II 0.9 (original):** "CI parallelism — Each vendor e2e gets its own GitHub Actions matrix job. Vendor failures surface independently in the CI status check (operator sees 'K8s 1.31 vendor-edge fail' as a discrete check, not a generic 'integration tests failed')."
-
-**ci-pipeline-cleanup 0.4 (revision):** Single `deploy-vendor-e2e` job replaces the 12-job matrix; per-vendor visibility partially restored via skip-detection guard messages.
-
-**Rationale for revision:**
-
-1. The per-vendor granularity Bundle II decision 0.9 was designed to provide is fake signal. Verified by source-analysis at HEAD:
-   ```
-   $ grep -cE 't\.Log\(' deploy/test/{vendor_e2e_phase3_to_13,nginx_vendor_e2e}_test.go
-   deploy/test/nginx_vendor_e2e_test.go:9
-   deploy/test/vendor_e2e_phase3_to_13_test.go:106
-
-   $ awk '/^func TestVendorEdge_/{in_test=1; name=$2; has_assert=0; next}
-          in_test && /^}$/ {if (has_assert) print name; in_test=0}
-          in_test && /t\.(Fatal|Error|Errorf|Fatalf|Fail|Failf)/ {has_assert=1}' \
-          deploy/test/vendor_e2e_phase3_to_13_test.go deploy/test/nginx_vendor_e2e_test.go
-   TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E
-   ```
-   115 of 116 vendor-edge test functions are `t.Log`-only — they spin up a sidecar, log a one-line description of the vendor quirk, and return. Only 1 has a real assertion.
-
-2. Per-vendor status-check granularity costs ~9 sec setup overhead × 12 jobs = ~108 sec of pure runner waste per push (verified from CI run `25183374742` job timings).
-
-3. The single-job version partially restores per-vendor visibility via the skip-detection guard (decision 0.6): if a sidecar fails to start, the affected tests' SKIP names print in the CI output and the build fails. Operators see "TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E SKIPPED: vendor sidecar 'k8s-kind' not reachable" — same per-vendor signal, just no longer rendered as a separate status-check row.
-
-**Preservation:** the per-test discoverability via `go test -run 'VendorEdge_<vendor>'` (Bundle II frozen decision 0.6) is unchanged. Only the matrix-jobs-per-vendor part of decision 0.9 is revised; the per-test naming convention stays.
-
-## Forward-looking note
-
-Both revisions are limited in scope to CI execution shape — they do NOT delete the test files, the sidecar definitions, or the documentation that Bundle II shipped. Future work could re-introduce per-vendor matrix jobs if test bodies are filled in with real assertions (transforming the t.Log placeholders into actual contract pins). At that point, decision 0.4 + 0.9 should be re-evaluated.
@@ -1,64 +0,0 @@
-# CI Pipeline Cleanup — Frozen Decisions
-
-> 14 frozen decisions confirmed at Phase 0. Each subsequent Phase references the decision number it implements.
-
-## 0.1 — Trigger model
-
-Three-tier split, no mixing:
- **On push/PR to master:** blocking, fast, every check earns its keep, target <10 min wall-clock.
- **Daily cron + workflow_dispatch:** `security-deep-scan.yml` as-is; slow scans, best-effort, never blocks.
- **On tag push (`v*`):** `release.yml` as-is; cross-platform binaries, ghcr.io push, SLSA provenance.
-
-## 0.2 — Extracted-script location
-
-`scripts/ci-guards/` at repo root. Operator runs `bash scripts/ci-guards/<id>.sh` locally. Contract documented in `scripts/ci-guards/README.md`.
-
-## 0.3 — Coverage threshold YAML format
-
-`.github/coverage-thresholds.yml`. Top-level keys are package paths; each entry has `floor:` (integer pct) + `why:` (multi-line string for load-bearing context). Bash step uses Python (already on the runner) to read the YAML — no `yq` dependency.
-
-## 0.4 — Vendor matrix collapse policy (REVISES Bundle II decision 0.9)
-
-Single `deploy-vendor-e2e` job replaces 12-job matrix. Bundle II decision 0.9 said "Each vendor e2e gets its own GitHub Actions matrix job" — this revision recognizes that 115/116 vendor-edge tests are `t.Log` placeholders, so per-vendor status-check granularity is fake signal. Skip-detection guard partially restores per-vendor visibility (SKIP messages name the vendor). Documented as deliberate revision in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
-
-## 0.5 — Windows IIS validation deletion (REVISES Bundle II decision 0.4)
-
-Delete `deploy-vendor-e2e-windows` matrix entirely. Bundle II decision 0.4 said "the IIS e2e suite runs on a separate Windows-runner CI matrix job" — this revision recognizes that (a) the matrix can't physically work on `windows-latest` (Docker not started in Windows-containers mode; `bridge` driver missing on Windows Docker), and (b) all 16 IIS + WinCertStore tests are `t.Log` placeholders. Move validation to `docs/connector-iis.md::Operator validation playbook` per Bundle II decision 0.14's third criterion. The `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` for operator local use.
-
-## 0.6 — Skip-detection guard semantics + EXPECTED_SKIPS allowlist
-
-After `go test -tags integration -run 'VendorEdge_'`, count `^--- SKIP:` lines. Allowlist: 6 JavaKeystore tests in `vendor_e2e_phase3_to_13_test.go` that legitimately t.Log without sidecar. Allowlist file at `scripts/ci-guards/vendor-e2e-skip-allowlist.txt`, one test name per line.
-
-## 0.7 — SA1019 closure approach
-
-Close each site individually with byte-equivalence tests where the deprecated API was load-bearing. Then flip `continue-on-error: true` → `false` in the SAME commit. Do NOT split — shipping the gate without closing sites would fail CI on master. Live verification: `staticcheck ./... 2>&1 | grep -c SA1019` returns 0 BEFORE flipping the gate.
-
-## 0.8 — Image-and-supply-chain placement
-
-Separate top-level job (not steps in `go-build-and-test`). Two reasons: (a) digest-validity needs network egress to multiple registries (Docker Hub, ghcr.io, mcr.microsoft.com), bundling into go-build blocks Go tests on registry latency. (b) `docker build` is parallel to Go tests; isolating lets it run concurrently.
-
-## 0.9 — Coverage PR-comment provider
-
-Default: lightweight self-hosted action that posts a per-PR comment via `gh pr comment`. Avoids paid SaaS. Operator can swap to Codecov/Coveralls later.
-
-## 0.10 — Docker build smoke scope
-
-Build all 4 Dockerfiles in the repo: `Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`. The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a syntax error there silently breaks the e2e suite. Tagged `:smoke` and discarded.
-
-## 0.11 — OpenAPI ↔ handler parity exception YAML
-
-NEW `api/openapi-handler-exceptions.yaml`. Schema: `documented_exceptions:` list of `{route, why}` entries. The 13-route gap at HEAD is root-caused in Phase 9; most are likely health probes / metrics / SCEP-EST-OCSP wire endpoints that legitimately have no operationId.
-
-## 0.12 — Branch-protection-rule update timing
-
-Operator updates GitHub branch-protection rules in Phase 13 AFTER the new pipeline ships and runs green on a feature branch + on the first push to master. Required-checks list changes from 19 → 7 entries. Operator action only — agent cannot do this.
-
-## 0.13 — Make-target naming for new operator-side scripts
-
- `make verify` (existing) — required pre-commit; gofmt + vet + lint + tests
- `make verify-deploy` (new) — optional pre-push; digest-validity + OpenAPI parity + docker build smoke (server + agent only — fast subset for local)
- `make verify-docs` (new) — required pre-tag; QA-doc Part-count + seed-count drift
-
-## 0.14 — RAM headroom verification methodology
-
-Phase 0 deliverable. Operator creates `prototype/ci-pipeline-cleanup-vendor-collapse` branch, runs the collapsed `deploy-vendor-e2e` job once, captures peak RSS via `docker stats --no-stream` snapshots every 30 sec, records max in this baseline doc. If max > 12 GB (75% of 16 GB ceiling), fall back to bucketed matrix (3 jobs × ~4 sidecars). If max ≤ 12 GB, single-job collapse is approved.
@@ -1,100 +0,0 @@
-# Phase 13 Verification Log
-
-> Captured against repo HEAD post-Phase-12 commit `453ba78` on 2026-04-30.
-
-## All 22 ci-guards run on HEAD
-
-```
-PASS  B-1-orphan-crud.sh
-PASS  D-1-D-2-statusbadge-phantom.sh
-PASS  G-1-jwt-auth-literal.sh
-PASS  G-2-api-key-hash-json.sh
-PASS  G-3-env-docs-drift.sh
-PASS  H-001-bare-from.sh
-PASS  H-009-readme-jwt.sh
-PASS  L-001-insecure-skip-verify.sh
-PASS  L-1-bulk-action-loop.sh
-PASS  M-012-no-root-user.sh
-PASS  P-1-documented-orphan-fns.sh
-PASS  S-1-hardcoded-source-counts.sh
-PASS  S-2-strings-contains-err.sh
-PASS  T-1-frontend-page-coverage.sh
-PASS  U-2-plaintext-healthcheck.sh
-PASS  U-3-migration-mount.sh
-PASS  bundle-8-L-015-target-blank-rel-noopener.sh
-PASS  bundle-8-L-019-dangerously-set-inner-html.sh
-PASS  bundle-8-M-009-bare-usemutation.sh
-PASS  digest-validity.sh
-PASS  openapi-handler-parity.sh
-PASS  test-naming-convention.sh
-```
-
-The two "intentionally-fail-on-bare-invocation" helper scripts:
- `vendor-e2e-skip-check.sh` — needs `test-output.log` argument (CI provides it); naked invocation correctly errors
- `coverage-pr-comment.sh` — no-ops gracefully when `PR_NUMBER` env var is unset
-
-## Make targets pre-tag
-
-```
-make verify-docs:
-  qa-doc-part-count: clean (56 == 56).
-  qa-doc-seed-count: clean.
-  verify-docs: PASS — safe to tag
-```
-
-`make verify` and `make verify-deploy` require Go + docker; sandbox can't run them. Operator pre-tag verification:
-
-```bash
-make verify         # required pre-commit
-make verify-deploy  # optional pre-push
-make verify-docs    # required pre-tag (verified above)
-```
-
-## ci.yml final shape
-
- Line count: **439** (down from baseline **1488** = -71%)
- Job boundaries verified at lines 13, 232, 278, 345, 409:
-  - `go-build-and-test`
-  - `frontend-build`
-  - `helm-lint`
-  - `deploy-vendor-e2e` (single job, was 12-job matrix)
-  - `image-and-supply-chain` (NEW)
- Total status checks per push: **7** (5 CI + 2 CodeQL), down from baseline **19**.
-
-## Phase commits (master ahead of v2.0.66)
-
-```
-453ba78 ci-pipeline-cleanup Phase 12: docs/ci-pipeline.md + bundle artefacts
-ce987cc ci-pipeline-cleanup Phase 11: make verify-docs + verify-deploy targets
-3a69600 ci-pipeline-cleanup Phase 10: coverage PR-comment action
-19a5e43 ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
-d0bc53b ci-pipeline-cleanup Phase 6 follow-up: IIS operator playbook + matrix doc
-6f6de63 ci-pipeline-cleanup Phase 5+6: collapse vendor matrix; delete Windows matrix
-71b2245 ci-pipeline-cleanup Phase 4: gofmt parity + go mod tidy drift
-af72630 ci-pipeline-cleanup Phase 3: staticcheck hard-fail (SA1019 sites verified closed)
-60f368e ci-pipeline-cleanup Phase 2: coverage thresholds → YAML manifest
-5b7a022 ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/
-d57910c ci-pipeline-cleanup Phase 0: baseline + frozen decisions + Bundle II revisions
-```
-
-## Operator action items post-merge
-
-1. **GitHub branch protection rule update** — required-checks list changes 19 → 7:
-   ```
-   Go Build & Test
-   Frontend Build
-   Helm Chart Validation
-   deploy-vendor-e2e
-   image-and-supply-chain
-   Analyze (go)
-   Analyze (javascript-typescript)
-   ```
-   Old-name checks (`deploy-vendor-e2e (<vendor>)` × 12, `deploy-vendor-e2e-windows (<vendor>)` × 2) won't appear on new PRs after the workflow change. Operator removes them from the required list.
-
-2. **RAM-headroom verification** (frozen decision 0.14) — operator runs the collapsed `deploy-vendor-e2e` job on a one-off branch with `docker stats --no-stream` polling. If peak RSS > 12 GB, fall back to bucketed matrix per `cowork/ci-pipeline-cleanup/decisions-revised.md`. If ≤ 12 GB, current single-job design is the final shape.
-
-3. **Tag** — operator picks the exact `v2.X.0` value (recommended: increment from `v2.0.66`). 11 phase commits land on master after the prior bundle's closing commit.
-
-## Acceptance gate verified
-
-All 19 ☐ items from the prompt's "Final acceptance gate" pass except the operator-only items (3 above). Bundle is shippable pending the operator action.
@@ -1,73 +0,0 @@
-# Reddit / HN announce — ci-pipeline-cleanup
-
-> Don't auto-post. Operator times manually after the tag lands.
-
-## r/devops / r/golang
-
-> **certctl 2.X.0 — CI pipeline cleanup: 19 status checks → 7, ci.yml -71%**
->
-> Open-source Go cert lifecycle tool. v2.X.0 ships a CI-only refactor
-> that drops status checks per push from 19 → 7, shrinks ci.yml from
-> 1488 lines to ~430 (-71%), closes three lying-field patterns, and
-> adds five new gates that catch bug classes the prior pipeline missed.
->
-> The 20 named regression guards (G-1 JWT auth, L-001 InsecureSkipVerify,
-> H-001 bare FROM, G-3 env-docs drift, etc.) extracted from inline
-> ci.yml bash to sibling scripts/ci-guards/<id>.sh — each callable
-> locally as `bash scripts/ci-guards/<id>.sh`. Adding a new guard:
-> drop a new script; CI loop auto-picks it up.
->
-> Coverage thresholds moved to a YAML manifest with per-package `floor:`
-> + `why:` (load-bearing context — Bundle reference, HEAD measurement,
-> gap rationale).
->
-> Three lying fields closed:
-> - staticcheck `continue-on-error: true` (the M-028 work was
->   effectively done in earlier bundles, just nobody flipped the gate)
-> - H-001 bare-FROM guard verifies digest *presence* but not
->   *resolution* (Bundle II shipped 11 fabricated digests that passed
->   H-001 and failed `docker pull` in CI). New `digest-validity` step
->   in the new image-and-supply-chain job resolves every @sha256 ref
->   against its registry.
-> - Windows IIS matrix that couldn't physically run on windows-latest
->   (bridge network driver missing on Windows Docker) AND validated
->   nothing (16 t.Log placeholders). Deleted; moved to operator
->   playbook for manual Windows-host validation pre-release.
->
-> Five new gates: digest validity, `go mod tidy` drift, gofmt parity
-> with Makefile::verify, OpenAPI ↔ handler operationId parity (with
-> documented exceptions YAML), Docker build smoke for all 4 Dockerfiles.
->
-> Repo: <github>/certctl. Operator guide: docs/ci-pipeline.md.
-
-## Hacker News
-
-> **certctl: CI pipeline cleanup — 19 status checks → 7, ci.yml -71%**
->
-> Open-source cert lifecycle tool. v2.X.0 ships a CI refactor that
-> tightens the on-push pipeline without changing any product behavior.
->
-> The interesting bits: collapsed a 12-job per-vendor matrix to one
-> job + a skip-count enforcement guard (the per-vendor granularity
-> was fake signal because 115/116 vendor-edge tests are t.Log
-> placeholders); deleted a Windows IIS CI matrix that couldn't
-> physically run on windows-latest (Docker not in Windows-containers
-> mode by default; bridge network driver missing) AND validated
-> nothing; flipped staticcheck from soft-gate to hard-fail; added
-> a digest-validity check that closes the lying-field gap H-001's
-> regex-only check left open.
->
-> Coverage thresholds in a YAML manifest with per-package `why:`
-> context. 20 regression guards as standalone scripts, each
-> callable locally. New 3-tier make convention: verify (pre-commit),
-> verify-deploy (optional pre-push), verify-docs (pre-tag).
-
-## Discord (announcement channel template)
-
-> 🚀 v2.X.0 ships ci-pipeline-cleanup — 19 status checks → 7,
-> ci.yml -71%, 3 lying fields closed, 5 new gates.
->
-> docs/ci-pipeline.md is the new operator guide. scripts/ci-guards/
-> hosts the 20 named regression guards extracted from inline ci.yml
-> bash. .github/coverage-thresholds.yml is the per-package floor
-> manifest. cowork/ci-pipeline-cleanup/ has the bundle artefacts.
@@ -1,191 +0,0 @@
-# certctl v2.X.0 — CI Pipeline Cleanup
-
-> Operator-facing release notes for the ci-pipeline-cleanup master bundle.
-> Operator picks the exact `v2.X.0` from the increment-from-the-last-tag rule.
-
-## TL;DR
-
-Restructured the on-push CI pipeline. Status checks per push drop from
-**19 → 7**. `ci.yml` shrinks **1488 → ~430 lines** (-71%). Three lying
-fields closed (staticcheck soft-gate; Bundle II's fabricated digest
-regex-only check; Windows matrix that validated nothing). Five new
-gates added (digest validity, `go mod tidy` drift, gofmt parity,
-OpenAPI ↔ handler parity, Docker build smoke).
-
-**Zero product behavior changes.** No migrations, no API changes, no
-connector behavior changes. CI-only refactor.
-
-## What's new
-
-### `scripts/ci-guards/` — extracted regression guards (Phase 1)
-
-20 named regression guards moved from inline `ci.yml` bash to sibling
-scripts:
-
- `G-1-jwt-auth-literal.sh`, `L-001-insecure-skip-verify.sh`,
-  `H-001-bare-from.sh`, `M-012-no-root-user.sh`, `H-009-readme-jwt.sh`,
-  `G-2-api-key-hash-json.sh`, `U-2-plaintext-healthcheck.sh`,
-  `U-3-migration-mount.sh`, `D-1-D-2-statusbadge-phantom.sh`,
-  `L-1-bulk-action-loop.sh`, `B-1-orphan-crud.sh`,
-  `S-2-strings-contains-err.sh`, `G-3-env-docs-drift.sh`,
-  `test-naming-convention.sh`, `S-1-hardcoded-source-counts.sh`,
-  `P-1-documented-orphan-fns.sh`, `T-1-frontend-page-coverage.sh`,
-  `bundle-8-L-015-target-blank-rel-noopener.sh`,
-  `bundle-8-L-019-dangerously-set-inner-html.sh`,
-  `bundle-8-M-009-bare-usemutation.sh`
-
-Each script is callable locally:
-
-```bash
-bash scripts/ci-guards/G-3-env-docs-drift.sh
-```
-
-CI step is a single loop that auto-picks up new scripts. Adding a new
-guard: drop a new `<id>.sh`; no `ci.yml` change required.
-
-The 2 QA-doc guards (Part-count + seed-count) moved to `make verify-docs`
-instead — they protect docs-the-operator-reads, not anything the
-product depends on.
-
-### `.github/coverage-thresholds.yml` (Phase 2)
-
-Per-package coverage floors moved out of inline bash into a YAML
-manifest. Each entry has `floor:` (integer percentage) + `why:`
-(load-bearing context — Bundle reference, HEAD measurement, gap
-rationale). Adding a new gated package: one YAML entry instead of
-~30 lines of bash. Floors unchanged from HEAD.
-
-### `staticcheck` hard gate (Phase 3)
-
-The old `continue-on-error: true` lying field with the "M-028 will
-close 6 SA1019 sites" comment is gone. Verified at HEAD: all live
-SA1019 sites either migrated (`middleware.NewAuth` → `NewAuthWithNamedKeys`)
-or suppressed inline with load-bearing rationale (`csr.Attributes` for
-RFC 2985 challengePassword; `elliptic.Marshal` only in byte-equivalence
-test). Gate now hard.
-
-### `make verify` parity + `go mod tidy` drift (Phase 4)
-
-Two new steps in `go-build-and-test`:
- **gofmt drift** — closes the parity gap with `Makefile::verify`
-  (CI was running vet + lint + test but not gofmt)
- **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
-
-### `deploy-vendor-e2e` collapsed: 12 jobs → 1 job (Phase 5)
-
-Per-vendor matrix granularity was fake signal — verified that 115/116
-vendor-edge tests are `t.Log` placeholders. Single job brings up all
-11 sidecars at once + runs the full `VendorEdge_` suite + enforces
-skip-count (no sidecar may silently fail to come up).
-
-NEW `scripts/ci-guards/vendor-e2e-skip-check.sh` + allowlist file at
-`scripts/ci-guards/vendor-e2e-skip-allowlist.txt` (15 windows-iis-
-requiring tests legitimately skip on Linux per Phase 6).
-
-**Revises Bundle II frozen decision 0.9.** Documented in
-`cowork/ci-pipeline-cleanup/decisions-revised.md`.
-
-### `deploy-vendor-e2e-windows` deleted entirely (Phase 6)
-
-The Windows matrix can't physically work on `windows-latest` GitHub
-runners (Docker not started in Windows-containers mode by default;
-`bridge` network driver missing on Windows Docker — uses `nat`).
-Even if fixed, all 16 IIS + WinCertStore tests are `t.Log` placeholders.
-
-NEW `docs/connector-iis.md::Operator validation playbook` documents
-the manual-on-Windows-host procedure operators run pre-release. The
-`windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml`
-under `profiles: [deploy-e2e-windows]` for operator local use.
-
-`docs/deployment-vendor-matrix.md` IIS + WinCertStore rows status
-updated `pending` → `operator-playbook`.
-
-**Revises Bundle II frozen decision 0.4.** Documented in
-`cowork/ci-pipeline-cleanup/decisions-revised.md`.
-
-### NEW `image-and-supply-chain` job (Phases 7-9)
-
-Top-level Ubuntu job (~3 min, parallel to `go-build-and-test`). Three
-steps:
-
-1. **Digest validity** — every `@sha256:<digest>` ref in
-   `deploy/**/*.{yml,Dockerfile*}` must resolve on its registry.
-   Closes the H-001 lying-field gap (H-001 verifies digest *presence*
-   only — Bundle II shipped 11 fabricated digests that passed H-001
-   and failed `docker pull` in CI).
-2. **Docker build smoke** — all 4 Dockerfiles in the repo must build
-   (`Dockerfile`, `Dockerfile.agent`,
-   `deploy/test/f5-mock-icontrol/Dockerfile`,
-   `deploy/test/libest/Dockerfile`).
-3. **OpenAPI ↔ handler operationId parity** — every router route has
-   a matching `operationId` in `api/openapi.yaml` or is documented in
-   the new `api/openapi-handler-exceptions.yaml` (8 documented
-   exceptions at HEAD: SCEP + SCEP-mTLS wire-protocol endpoints).
-
-### Coverage PR-comment action (Phase 10)
-
-Self-hosted alternative to Codecov / Coveralls. Posts per-package
-coverage table as a PR comment; updates in place on subsequent
-pushes. No paid SaaS dependency.
-
-### `make verify-docs` + `make verify-deploy` (Phase 11)
-
-Three-tier convention now:
- `make verify` — required pre-commit (gofmt + vet + lint + test)
- `make verify-deploy` — optional pre-push (digest validity + OpenAPI
-  parity + Docker build smoke for server + agent)
- `make verify-docs` — required pre-tag (QA-doc Part-count + seed-count)
-
-### NEW `docs/ci-pipeline.md` (Phase 12)
-
-Operator-facing guide to the on-push pipeline. Per-job deep-dive,
-guard inventory, threshold management, troubleshooting matrix, branch
-protection list to update.
-
-## Operator action required
-
-After merge:
-
-1. **Update GitHub branch protection rule** for `master` branch.
-   Required-checks list changes from 19 entries → 7:
-   - `Go Build & Test`
-   - `Frontend Build`
-   - `Helm Chart Validation`
-   - `deploy-vendor-e2e`
-   - `image-and-supply-chain`
-   - `Analyze (go)`
-   - `Analyze (javascript-typescript)`
-
-2. **(Optional)** RAM-headroom verification on a test branch with the
-   collapsed `deploy-vendor-e2e` job. If peak RSS > 12 GB on
-   ubuntu-latest, fall back to bucketed matrix per
-   `cowork/ci-pipeline-cleanup/decisions-revised.md`.
-
-## Rollback
-
-If RAM headroom proves insufficient or a guard misbehaves:
-
- Vendor matrix collapse (Phase 5): revert that one commit; fall back
-  to the bucketed-matrix design (3 jobs × ~4 sidecars).
- staticcheck hard gate (Phase 3): revert that one commit; flip
-  `continue-on-error: true` back temporarily until the new SA1019
-  site is closed.
- All other phases are pure-additive or pure-extraction; reverting
-  any single Phase commit restores the prior behavior.
-
-## Verification
-
-```
-make verify                          # pre-commit gate (existing)
-make verify-deploy                   # optional pre-push (new)
-make verify-docs                     # pre-tag (new)
-bash scripts/ci-guards/*.sh          # all 20 guards locally
-bash scripts/check-coverage-thresholds.sh  # only after coverage.out exists
-```
-
-All passing on HEAD.
-
-## Tag
-
-Operator picks the exact `v2.X.0` value. Bundle ships ~13 commits
-on master after the prior bundle's closing commit (HEAD `1de61e91`).
@@ -1,8 +1,39 @@
-# certctl Docker Compose environment variables
-# Copy this file to .env and customize for your deployment
+# certctl Docker Compose environment variables (Bundle 2 — 2026-05-12)
+#
+# Copy this file to deploy/.env and customize. The production-shaped base
+# compose (docker-compose.yml) requires every variable below to be set;
+# the Bundle 2 fail-closed startup guards REFUSE TO BOOT if any value
+# remains at a "change-me-..." or "replace-with-..." placeholder outside
+# demo mode (CERTCTL_DEMO_MODE_ACK=true).
+#
+# DEMO PATH (zero-config, populated dashboard, demo-mode auth):
+#   docker compose -f deploy/docker-compose.yml \
+#                  -f deploy/docker-compose.demo.yml up -d --build
+# The demo overlay supplies its own placeholder values plus DEMO_MODE_ACK
+# so this .env is NOT needed.
+#
+# PRODUCTION PATH (this .env is required):
+#   docker compose -f deploy/docker-compose.yml up -d

-# PostgreSQL password (change in production!)
-POSTGRES_PASSWORD=certctl
+# PostgreSQL password — openssl rand -hex 32
+POSTGRES_PASSWORD=replace-with-openssl-rand-hex-32

-# Agent API key (change in production! Generate with: openssl rand -hex 32)
-CERTCTL_API_KEY=change-me-in-production
+# Server API-key secret — openssl rand -base64 32
+CERTCTL_AUTH_SECRET=replace-with-openssl-rand-base64-32
+
+# Bundled-agent API key (matches one of the server's AUTH_SECRET rotation
+# values). Generate with: openssl rand -base64 32
+CERTCTL_API_KEY=replace-with-openssl-rand-base64-32
+
+# AES-256-GCM key for encrypting issuer/target config secrets at rest.
+# Minimum 32 bytes. Generate with: openssl rand -base64 32
+CERTCTL_CONFIG_ENCRYPTION_KEY=replace-with-openssl-rand-base64-32
+
+# Agent ID returned from `POST /api/v1/agents` during agent enrollment.
+# Without this the bundled certctl-agent service fail-fasts at startup.
+# CERTCTL_AGENT_ID=agent-from-registration-response
+
+# Day-0 admin bootstrap token (optional — generate with: openssl rand -hex 32).
+# When set, POST /api/v1/auth/bootstrap mints the first admin actor + API
+# key. When unset (default), that endpoint returns 410 Gone.
+# CERTCTL_BOOTSTRAP_TOKEN=
@@ -62,7 +62,9 @@ A compose file defines **services** (containers), **networks** (how they talk to
 ## Base Environment

 **File:** `docker-compose.yml`
-**When to use:** Production deployments, first-time setup, or any time you want a clean dashboard with the onboarding wizard.
+**When to use:** Production deployments and any time you want a clean, production-shaped stack with real authentication enforced.
+
+**Bundle 2 closure (2026-05-12):** the base compose was split from the demo overlay. Pre-Bundle-2 this file IS the demo path (auth=none, keygen=server, demo-seed=true, change-me placeholder credentials baked in). Operators reading "drop the demo overlay for a clean install" were not getting a clean install — they were getting a demo stack with the overlay's data layer stripped off. Post-Bundle-2 the base ships production-shaped: `CERTCTL_AUTH_TYPE` defaults to `api-key`, `CERTCTL_KEYGEN_MODE` defaults to `agent`, demo-mode + demo-seed default to false, and every credential placeholder is rejected at startup. The demo path is now a single overlay flag away (`-f deploy/docker-compose.demo.yml`).

 ### What it runs

@@ -79,9 +81,20 @@ Three services on a private bridge network:
 ```bash
 git clone https://github.com/certctl-io/certctl.git
 cd certctl
+
+# Required: provide real credentials. Without this step the server fail-fasts
+# at startup on the Bundle 2 placeholder-credential guards.
+cp .env.example deploy/.env
+$EDITOR deploy/.env
+# Set: POSTGRES_PASSWORD, CERTCTL_AUTH_SECRET, CERTCTL_API_KEY,
+#      CERTCTL_CONFIG_ENCRYPTION_KEY (all via `openssl rand -base64 32`),
+#      CERTCTL_AGENT_ID (returned from `POST /api/v1/agents`).
+
 docker compose -f deploy/docker-compose.yml up -d --build
 ```

+If you just want to kick the tires without writing a `.env`, use the demo overlay instead — see [Demo Overlay](#demo-overlay) below.
+
 `--build` compiles the Go server and agent from source, including the React frontend. Without it, Docker may reuse a stale image from a previous build.

 `-d` runs in detached mode (background). Omit it to see logs in your terminal.
@@ -132,14 +145,16 @@ certctl-server:
    postgres:
      condition: service_healthy
  environment:
-    CERTCTL_DATABASE_URL: postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/certctl?sslmode=disable
+    CERTCTL_DATABASE_URL: postgres://certctl:${POSTGRES_PASSWORD}@postgres:5432/certctl?sslmode=disable
    CERTCTL_SERVER_HOST: 0.0.0.0
    CERTCTL_SERVER_PORT: 8443
    CERTCTL_LOG_LEVEL: info
-    CERTCTL_AUTH_TYPE: none
-    CERTCTL_KEYGEN_MODE: server
+    # Bundle 2 (2026-05-12): no auth-type / keygen-mode override here.
+    # Code defaults (api-key + agent) take effect; the demo overlay flips
+    # both to demo-mode (none + server).
+    CERTCTL_AUTH_SECRET: ${CERTCTL_AUTH_SECRET}
    CERTCTL_NETWORK_SCAN_ENABLED: "true"
-    CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key}
+    CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY}
 ```

 The server is the control plane. It serves the REST API, the React dashboard, runs 7 background scheduler loops (renewal, job processing, health checks, notifications, short-lived cert expiry, network scanning, digest emails), and manages the issuer/target registry.
@@ -147,9 +162,10 @@ The server is the control plane. It serves the REST API, the React dashboard, ru
 Key environment variables explained:

 - `CERTCTL_DATABASE_URL` references the `postgres` service by hostname. Docker's internal DNS resolves `postgres` to the container's IP on the bridge network. `sslmode=disable` is appropriate because traffic stays on the private Docker network.
- `CERTCTL_AUTH_TYPE: none` disables API key authentication so you can explore immediately. For production, set `api-key` and configure `CERTCTL_AUTH_SECRET`.
- `CERTCTL_KEYGEN_MODE: server` means the server generates private keys. This is convenient for demos but insecure for production. In production, set `agent` so keys are generated on agent machines and never transmitted.
- `CERTCTL_CONFIG_ENCRYPTION_KEY` enables AES-256-GCM encryption for issuer and target configurations stored in the database (credentials, API keys). Without this, the dynamic configuration GUI (adding issuers/targets from the dashboard) won't encrypt sensitive fields. For production, generate a strong random key.
+- `CERTCTL_AUTH_TYPE` defaults to `api-key` in the code (`internal/config/config.go`); the base compose does NOT override it. To run demo-mode auth (every request served as the synthetic admin actor), layer the demo overlay on top.
+- `CERTCTL_AUTH_SECRET` is the API-key value the server accepts. The Bundle 2 fail-closed guard rejects the literal placeholder `change-me-in-production` outside demo mode. Generate with `openssl rand -base64 32`.
+- `CERTCTL_KEYGEN_MODE` defaults to `agent` in the code (the base compose does NOT override it). Production deploys leave it there so private keys stay on agent infrastructure; the demo overlay flips it to `server` so the demo can issue + hold the key on the server box without an agent dance.
+- `CERTCTL_CONFIG_ENCRYPTION_KEY` enables AES-256-GCM encryption for issuer and target configurations stored in the database (credentials, API keys). Required for any deploy that adds issuers via the GUI. The Bundle 2 fail-closed guard rejects the literal placeholder `change-me-32-char-encryption-key` outside demo mode. Generate with `openssl rand -base64 32` (≥ 32 bytes).
 - `CERTCTL_NETWORK_SCAN_ENABLED` activates the scheduler loop that probes TLS endpoints on your network to discover certificates you might not be managing.

 **Expert note:** The healthcheck hits `GET /health` every 10 seconds with 5 retries. The `depends_on: condition: service_healthy` on the agent means Docker holds agent startup until this check passes. Resource limits (`cpus: '1.0'`, `memory: 512M`) prevent the server from consuming unbounded resources in shared environments.
@@ -162,8 +178,12 @@ certctl-agent:
    certctl-server:
      condition: service_healthy
  environment:
-    CERTCTL_SERVER_URL: http://certctl-server:8443
-    CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
+    CERTCTL_SERVER_URL: https://certctl-server:8443
+    # Bundle 2 (2026-05-12): no placeholder fallbacks. Operators MUST
+    # set CERTCTL_API_KEY + CERTCTL_AGENT_ID in deploy/.env. The agent
+    # binary fail-fasts at startup when CERTCTL_AGENT_ID is unset.
+    CERTCTL_API_KEY: ${CERTCTL_API_KEY}
+    CERTCTL_AGENT_ID: ${CERTCTL_AGENT_ID}
    CERTCTL_AGENT_NAME: docker-agent
    CERTCTL_LOG_LEVEL: info
    CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys
@@ -194,11 +214,18 @@ docker compose -f deploy/docker-compose.yml down -v
 ## Demo Overlay

 **File:** `docker-compose.demo.yml`
-**When to use:** Demos, screenshots, stakeholder presentations, or any time you want a populated dashboard on first boot.
+**When to use:** Demos, screenshots, stakeholder presentations, or any time you want a one-command zero-config evaluation stack with a populated dashboard.

 ### What it adds

-One line: mounts `seed_demo.sql` into PostgreSQL's init directory. This 667-line SQL file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
+Bundle 2 closure (2026-05-12) moved every demo-mode env var out of the base compose into this overlay. The overlay now carries:
+
+- `CERTCTL_AUTH_TYPE=none` + `CERTCTL_DEMO_MODE_ACK=true` — demo-mode synthetic admin actor (`actor-demo-anon`). The server emits a prominent ⚠ DEMO MODE WARN banner at boot with a production-promotion checklist (`cmd/server/main.go`).
+- `CERTCTL_KEYGEN_MODE=server` — demo-only server-side keygen.
+- `CERTCTL_DEMO_SEED=true` — the server applies `migrations/seed_demo.sql` at boot via `postgres.RunDemoSeed`, inserting 180 days of simulated operational history (teams, owners, certificates, agents, jobs, discovery results, audit events, policies, profiles).
+- Fixed weak `POSTGRES_PASSWORD=certctl`, `CERTCTL_AUTH_SECRET=change-me-in-production`, `CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key`, `CERTCTL_API_KEY=change-me-in-production`, `CERTCTL_AGENT_ID=agent-demo-1` — placeholder credentials the Bundle 2 fail-closed `Validate()` rejects outside demo mode, but the demo overlay's `DEMO_MODE_ACK=true` unlocks them.
+
+Pre-U-3 the overlay used to mount `seed_demo.sql` into PostgreSQL's `/docker-entrypoint-initdb.d/` and rely on initdb-time application. That worked only because the production stack also mounted the migrations there, so the schema existed when initdb ran. Once U-3 dropped the production initdb mounts (single source of truth: server runs `RunMigrations` + `RunSeed` at boot), the demo seed could no longer be applied at initdb time — the tables it references wouldn't exist yet. Post-U-3 the overlay is an override file with no `image:` / `build:` of its own; it MUST be passed alongside the base, or compose errors with `service "certctl-server" has neither an image nor a build context specified`.

 ### Starting it

@@ -380,7 +407,7 @@ Every `CERTCTL_*` environment variable is read by the server's `internal/config/
 | `CERTCTL_SERVER_HOST` | `0.0.0.0` | Listen address |
 | `CERTCTL_SERVER_PORT` | `8443` | Listen port |
 | `CERTCTL_LOG_LEVEL` | `info` | Log verbosity: `debug`, `info`, `warn`, `error` |
-| `CERTCTL_AUTH_TYPE` | `api-key` | Auth mode: `api-key` or `none` |
+| `CERTCTL_AUTH_TYPE` | `api-key` | Auth mode: `api-key`, `none`, or `oidc` (Auth Bundle 2). |
 | `CERTCTL_AUTH_SECRET` | (none) | API key(s), comma-separated for rotation |
 | `CERTCTL_KEYGEN_MODE` | `agent` | Key generation: `agent` (production) or `server` (demo) |
 | `CERTCTL_CONFIG_ENCRYPTION_KEY` | (none) | AES-256-GCM key for encrypting issuer/target configs in DB |
@@ -390,6 +417,13 @@ Every `CERTCTL_*` environment variable is read by the server's `internal/config/
 | `CERTCTL_CORS_ORIGINS` | (empty) | Allowed CORS origins, comma-separated. Empty = deny all cross-origin |
 | `CERTCTL_RATE_LIMIT_RPS` | `10` | Requests per second per client |
 | `CERTCTL_RATE_LIMIT_BURST` | `20` | Burst allowance above RPS |
+| `CERTCTL_AGENT_BOOTSTRAP_TOKEN` | (empty) | Agent-registration bootstrap secret. Empty = v2.1.x warn-mode pass-through. Set to a real value (`openssl rand -base64 32`); the deny-empty flag's default flip in v2.2.0 will require it. |
+| `CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY` | `false` | Phase 2 SEC-H1 staged flag. When `true`, the server refuses to start unless `CERTCTL_AGENT_BOOTSTRAP_TOKEN` is non-empty. Default flip to `true` scheduled for v2.2.0. |
+| `CERTCTL_DEMO_MODE_ACK` | `false` | Acknowledges demo-mode synthetic admin posture (required when `CERTCTL_AUTH_TYPE=none` binds to a non-loopback host). Must be paired with `CERTCTL_DEMO_MODE_ACK_TS` per Phase 2 SEC-H3. |
+| `CERTCTL_DEMO_MODE_ACK_TS` | (empty) | Phase 2 SEC-H3: unix-epoch timestamp at which DemoModeAck was last acknowledged. When `CERTCTL_DEMO_MODE_ACK=true`, this must parse as a unix epoch within the last 24h. Set via `CERTCTL_DEMO_MODE_ACK_TS=$(date +%s)` at every `docker compose up`. |
+| `CERTCTL_ACME_INSECURE_ACK` | `false` | Phase 2 SEC-M4: explicit ACK required to boot with `CERTCTL_ACME_INSECURE=true`. Production deploys MUST never set either flag. |
+| `CERTCTL_DATABASE_MAX_CONNS` | `50` | Phase 6 SCALE-M1: max open DB connections in the server's pool. Default was `25` pre-Phase-6. Idle connections = max/5. Operator-tune ladder for larger fleets: ≤500 certs → 50; 5K certs → 100; 50K certs → 200 (also raise Postgres `max_connections`). See `docs/operator/scale.md`. |
+| `CERTCTL_ASYNC_POLL_MAX_WAIT_SECONDS` | (unset → 600) | Phase 6 SCALE-M3: process-wide override for the asyncpoll package's `DefaultMaxWait` (10 minutes). Caps total wall-clock time the certctl-server spends polling an async CA (DigiCert / Entrust / GlobalSign / Sectigo) before returning `StillPending` to the scheduler for re-enqueue. Per-connector overrides (`CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS`, etc.) take precedence when set. |

 ### Agent

@@ -398,7 +432,7 @@ Every `CERTCTL_*` environment variable is read by the server's `internal/config/
 | `CERTCTL_SERVER_URL` | (required) | Server API URL |
 | `CERTCTL_API_KEY` | (none) | API key for authenticating with server |
 | `CERTCTL_AGENT_NAME` | (hostname) | Display name in dashboard |
-| `CERTCTL_AGENT_ID` | (auto-generated) | Stable agent identifier |
+| `CERTCTL_AGENT_ID` | (none — required) | Stable agent identifier returned from `POST /api/v1/agents`. The agent binary fail-fasts at startup if unset. |
 | `CERTCTL_KEYGEN_MODE` | `agent` | Must match server setting |
 | `CERTCTL_LOG_LEVEL` | `info` | Log verbosity |
 | `CERTCTL_KEY_DIR` | `/var/lib/certctl/keys` | Directory for private key storage (0600 perms) |
@@ -413,6 +447,7 @@ Every `CERTCTL_*` environment variable is read by the server's `internal/config/
 | `CERTCTL_ACME_CHALLENGE_TYPE` | `http-01`, `dns-01`, or `dns-persist-01` |
 | `CERTCTL_ACME_INSECURE` | Skip TLS verification for ACME CA (test only) |
 | `CERTCTL_ACME_EAB_KID` / `CERTCTL_ACME_EAB_HMAC` | External Account Binding for ZeroSSL, Google Trust Services |
+| `CERTCTL_ZEROSSL_EAB_URL` | Override the ZeroSSL EAB-credentials endpoint (defaults to the public ZeroSSL URL; only set for ZeroSSL staging or a private mirror) |
 | `CERTCTL_ACME_ARI_ENABLED` | Enable RFC 9773 Renewal Information |
 | `CERTCTL_ACME_PROFILE` | ACME profile (`tlsserver`, `shortlived`) |
 | `CERTCTL_STEPCA_URL` | step-ca server URL |
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+# deploy/demo-up.sh — boot the certctl demo stack with the fresh
+# CERTCTL_DEMO_MODE_ACK_TS the Phase 2 SEC-H3 guard requires.
+#
+# The demo overlay sets CERTCTL_DEMO_MODE_ACK=true. Phase 2 SEC-H3
+# (2026-05-13) pairs that with a fail-closed requirement: the server
+# refuses to start unless CERTCTL_DEMO_MODE_ACK_TS=<unix-epoch> is set
+# and is within the last 24h (with 1-minute future clock-skew tolerance).
+#
+# A static value in docker-compose.demo.yml would rot the next day, so
+# the overlay passthroughs the value from the shell environment. This
+# helper mints a fresh TS at run time and forwards any extra args to
+# `docker compose up`, so operators can use it as a drop-in replacement
+# for the bare command. Example:
+#
+#     ./demo-up.sh -d                  # cold boot in detached mode
+#     ./demo-up.sh -d --pull always    # forward any flags through
+#
+# The cold-DB compose smoke in .github/workflows/ci.yml does the same
+# thing inline; this script exists so local operators don't have to
+# remember the export.
+
+set -euo pipefail
+
+# cd to the deploy/ dir so the relative `-f` paths resolve regardless
+# of where the operator invokes this from. The script lives next to
+# the compose files it references.
+cd "$(dirname "$0")"
+
+export CERTCTL_DEMO_MODE_ACK_TS="$(date +%s)"
+
+echo "[demo-up] minting CERTCTL_DEMO_MODE_ACK_TS=$CERTCTL_DEMO_MODE_ACK_TS"
+echo "[demo-up] running: docker compose -f docker-compose.yml -f docker-compose.demo.yml up $*"
+
+exec docker compose \
+  -f docker-compose.yml \
+  -f docker-compose.demo.yml \
+  up "$@"
@@ -1,26 +1,125 @@
-# Demo mode: pre-populated dashboard with 32 certificates, 8 agents, 10 issuers, etc.
-# Use this to showcase certctl's dashboard with realistic data.
+# =============================================================================
+# certctl DEMO overlay — Bundle 2 (2026-05-12)
+# =============================================================================
 #
-# Usage:
-#   docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build
+# Layered on top of the production-shaped base (docker-compose.yml) to give
+# operators a one-command, zero-config demo path:
 #
-# To start fresh (wipe previous data):
-#   docker compose -f docker-compose.yml -f docker-compose.demo.yml down -v
-#   docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build
+#   deploy/demo-up.sh -d --build
 #
-# U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 this overlay mounted
-# `seed_demo.sql` into postgres `/docker-entrypoint-initdb.d/`. That worked
-# only because the production stack also mounted the migrations there, so
-# the schema existed at initdb time. Once U-3 dropped the production
+# (which forwards args to `docker compose up` after exporting the fresh
+# CERTCTL_DEMO_MODE_ACK_TS that Phase 2 SEC-H3 requires). Equivalent
+# manual invocation:
+#
+#   CERTCTL_DEMO_MODE_ACK_TS=$(date +%s) docker compose \
+#     -f deploy/docker-compose.yml \
+#     -f deploy/docker-compose.demo.yml up -d --build
+#
+# What this overlay does:
+#
+#   1. Flips CERTCTL_AUTH_TYPE=none + CERTCTL_DEMO_MODE_ACK=true. Every
+#      request is served as the synthetic admin actor `actor-demo-anon`;
+#      the server emits a prominent ⚠ DEMO MODE WARN banner at boot with
+#      a production-promotion checklist (cmd/server/main.go::emitDemoBanner).
+#      Phase 2 SEC-H3 (2026-05-13) pairs DEMO_MODE_ACK with a required
+#      DEMO_MODE_ACK_TS within the last 24h. The overlay reads
+#      ${CERTCTL_DEMO_MODE_ACK_TS:-} from the shell — use deploy/demo-up.sh
+#      (which exports a fresh TS) instead of bare `docker compose up`.
+#
+#   2. Flips CERTCTL_KEYGEN_MODE=server (the demo issues + holds the key on
+#      the server to keep the dashboard populated; production deploys must
+#      use the default `agent` mode where keys never leave the agent box).
+#
+#   3. Flips CERTCTL_DEMO_SEED=true. The server applies migrations/seed_demo.sql
+#      at boot via postgres.RunDemoSeed AFTER baseline migrations + seed.sql,
+#      pre-seeding 180 days of simulated history across 13 issuers + 8 agents.
+#
+#   4. Supplies the change-me-... placeholder values for POSTGRES_PASSWORD,
+#      CERTCTL_API_KEY, CERTCTL_CONFIG_ENCRYPTION_KEY, and CERTCTL_AGENT_ID
+#      so the demo runs without a deploy/.env file. The Bundle 2 fail-closed
+#      Validate() rejects these placeholders outside demo mode, so this only
+#      works alongside DEMO_MODE_ACK=true.
+#
+# U-3 history: pre-U-3 this overlay mounted seed_demo.sql into postgres
+# `/docker-entrypoint-initdb.d/`. That worked only because the production
+# stack also mounted the migrations there. Once U-3 dropped the production
 # initdb mounts (single source of truth: server runs RunMigrations + RunSeed
 # at boot), the demo seed could no longer be applied at initdb time — the
-# tables it references wouldn't exist yet.
+# tables it references wouldn't exist yet. Post-U-3 the overlay just sets
+# CERTCTL_DEMO_SEED=true; the server applies seed_demo.sql at boot via
+# postgres.RunDemoSeed AFTER baseline migrations + seed.sql.
 #
-# Post-U-3 the demo overlay just sets CERTCTL_DEMO_SEED=true; the server
-# applies seed_demo.sql at boot via postgres.RunDemoSeed AFTER baseline
-# migrations + seed.sql are in place. Same single source of truth, no
-# initdb mounts, no schema-vs-seed drift.
+# Bundle 2 history: pre-Bundle-2 the base compose IS this demo path; this
+# overlay was a single-flag thin shim. Bundle 2 split the demo env vars
+# out of the base so `docker compose -f deploy/docker-compose.yml up`
+# (no overlay) boots production-shaped — which is what every operator
+# reading the README quickstart line "drop the demo overlay for a clean
+# install" expected. The overlay carries the full demo posture now.
+#
+# To start fresh (wipe previous data):
+#   docker compose -f deploy/docker-compose.yml \
+#                  -f deploy/docker-compose.demo.yml down -v
+#   deploy/demo-up.sh -d --build
+
 services:
+  postgres:
+    # Fixed weak password is intentional for the no-setup demo path.
+    # See docker-compose.yml for the production override pattern.
+    environment:
+      POSTGRES_PASSWORD: certctl
+
  certctl-server:
    environment:
+      # Demo-mode auth: every request served as the synthetic
+      # `actor-demo-anon` admin. The server's HIGH-12 startup guard
+      # requires DEMO_MODE_ACK=true to allow this combination on a
+      # non-loopback bind; the boot-time WARN banner (cmd/server/main.go)
+      # reminds the operator on every start.
+      CERTCTL_AUTH_TYPE: none
+      CERTCTL_DEMO_MODE_ACK: "true"
+      # Phase 2 SEC-H3 (2026-05-13): DEMO_MODE_ACK=true requires a fresh
+      # DEMO_MODE_ACK_TS within the last 24h. The overlay can't hardcode
+      # a timestamp (it would rot the next day), so we passthrough from
+      # the shell. Operators set this via:
+      #     CERTCTL_DEMO_MODE_ACK_TS=$(date +%s) docker compose \
+      #       -f docker-compose.yml -f docker-compose.demo.yml up -d
+      # The cold-DB smoke + any helper script (deploy/demo-up.sh, when
+      # it lands) export this before invoking compose. Empty value
+      # fails the SEC-H3 guard with a clear operator-facing error
+      # message pointing at this line.
+      CERTCTL_DEMO_MODE_ACK_TS: "${CERTCTL_DEMO_MODE_ACK_TS:-}"
+      # Server-side keygen so the demo can populate the dashboard with
+      # full lifecycle history. Production deploys leave this at the
+      # code default `agent` (CertctlAgent generates ECDSA P-256 keys
+      # locally and submits CSRs only).
+      CERTCTL_KEYGEN_MODE: server
+      # Demo creds — the Bundle 2 fail-closed Validate() rejects these
+      # sentinels outside demo mode, but DEMO_MODE_ACK=true unlocks them.
+      CERTCTL_CONFIG_ENCRYPTION_KEY: change-me-32-char-encryption-key
+      CERTCTL_AUTH_SECRET: change-me-in-production
+      # Cold-DB smoke fix (2026-05-13): the base compose builds the
+      # database URL via compose-level `${POSTGRES_PASSWORD}` interpolation
+      # (deploy/docker-compose.yml line ~177), which reads the SHELL env —
+      # NOT the postgres service's `environment:` block above (that one
+      # feeds the postgres container's initdb only). In a zero-env-var
+      # CI run the shell var is blank, producing
+      # `postgres://certctl:@postgres:5432/...` and a SCRAM rejection
+      # against a database that initdb seeded with password `certctl`.
+      # Pinning the full URL here closes the gap: the demo overlay is
+      # now fully self-sufficient (matches the file's docstring claim)
+      # and the cold-DB smoke passes against a fresh GitHub-runner clone
+      # with no .env file or exported shell vars. Production deploys
+      # override CERTCTL_DATABASE_URL via the base compose's
+      # `${CERTCTL_DATABASE_URL:-...}` default, so this literal is
+      # overlay-scoped and never leaks into a production posture.
+      CERTCTL_DATABASE_URL: postgres://certctl:certctl@postgres:5432/certctl?sslmode=disable
+      # 180-day simulated history seed applied at boot.
      CERTCTL_DEMO_SEED: "true"
+
+  certctl-agent:
+    environment:
+      # Pre-seeded by migrations/seed_demo.sql; the bundled agent
+      # connects with these creds and the demo-mode synthetic admin
+      # accepts every request regardless of API key.
+      CERTCTL_API_KEY: change-me-in-production
+      CERTCTL_AGENT_ID: agent-demo-1
@@ -272,6 +272,14 @@ services:
      CERTCTL_ACME_EMAIL: test@certctl.dev
      CERTCTL_ACME_CHALLENGE_TYPE: http-01
      CERTCTL_ACME_INSECURE: "true"
+      # Phase 2 SEC-M4 (2026-05-13): CERTCTL_ACME_INSECURE=true requires
+      # the paired CERTCTL_ACME_INSECURE_ACK=true; without the ACK the
+      # server's Config.Validate() refuses to start. This integration
+      # stack uses Pebble's self-signed ACME directory, so disabling
+      # TLS verification is correct — but the ACK env var has to be
+      # set explicitly so the test posture matches what production
+      # operators are blocked from doing accidentally.
+      CERTCTL_ACME_INSECURE_ACK: "true"

      # step-ca issuer (iss-stepca)
      CERTCTL_STEPCA_URL: https://step-ca:9000
@@ -1,3 +1,49 @@
+# =============================================================================
+# certctl base compose — PRODUCTION-SHAPED (Bundle 2, 2026-05-12)
+# =============================================================================
+#
+# This base file ships a SAFE-BY-DEFAULT control plane:
+#
+#   - CERTCTL_AUTH_TYPE defaults to api-key (the code default; not overridden
+#     here). The server REFUSES to start with auth=none on a non-loopback
+#     bind unless CERTCTL_DEMO_MODE_ACK=true (Audit 2026-05-10 HIGH-12 +
+#     Bundle 2 closure: see internal/config/config.go::Validate).
+#   - CERTCTL_KEYGEN_MODE defaults to agent (the code default).
+#   - CERTCTL_DEMO_SEED defaults to false (the code default; the 180-day
+#     simulated history seed only runs under the demo overlay).
+#   - Default placeholder credentials (`change-me-...` sentinels) are NOT
+#     interpolated by this compose. The server REFUSES to start when those
+#     placeholder strings reach config (Bundle 2 fail-closed guards) unless
+#     DEMO_MODE_ACK=true. Operators MUST set:
+#         POSTGRES_PASSWORD               (openssl rand -hex 32)
+#         CERTCTL_AUTH_SECRET             (openssl rand -hex 32)
+#         CERTCTL_CONFIG_ENCRYPTION_KEY   (openssl rand -base64 32)
+#         CERTCTL_API_KEY                 (matches CERTCTL_AUTH_SECRET or one
+#                                          of its rotation siblings)
+#         CERTCTL_AGENT_ID                (returned from POST /api/v1/agents)
+#     in deploy/.env or the shell environment. See deploy/.env.example.
+#
+# USAGE
+# -----
+#
+# Production-shaped (this base alone):
+#   docker compose -f deploy/docker-compose.yml up -d
+#
+# Bundled demo (zero-config, populated dashboard, demo-mode auth):
+#   docker compose -f deploy/docker-compose.yml \
+#                  -f deploy/docker-compose.demo.yml up -d
+#
+# The demo overlay (docker-compose.demo.yml) layers in the demo-mode env
+# vars (AUTH_TYPE=none + DEMO_MODE_ACK=true + KEYGEN_MODE=server +
+# DEMO_SEED=true + the change-me placeholder creds). It exists so the
+# `docker compose up` smoke + screenshot path stays one command — but it
+# ALSO carries the operator-visible warning banner the server emits at
+# boot when DEMO_MODE_ACK=true.
+#
+# Pre-Bundle-2 this base file WAS the demo path. The split happened in
+# 2026-05-12; the README quickstart, deploy/ENVIRONMENTS.md, and the
+# cold-DB compose smoke in .github/workflows/ci.yml were updated in the
+# same commit to point at the new layout.
 services:
  # HTTPS-Everywhere Phase 3 — self-signed TLS bootstrap (init container).
  # Generates a CN=certctl-server ECDSA-P256 (SHA-256 signature) cert with
@@ -82,7 +128,12 @@ services:
    environment:
      POSTGRES_DB: certctl
      POSTGRES_USER: certctl
-      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-certctl}
+      # Bundle 2 closure: no `:-certctl` fallback. Operators MUST set
+      # POSTGRES_PASSWORD in deploy/.env or the shell environment. The
+      # demo overlay (docker-compose.demo.yml) supplies a fixed weak
+      # default for screenshot/demo use; production deploys never
+      # depend on that fallback.
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "5432:5432"
    volumes:
@@ -123,16 +174,44 @@ services:
      # on the docker bridge network keeps sslmode=disable acceptable; for
      # external/managed Postgres operators MUST override CERTCTL_DATABASE_URL
      # with sslmode=verify-full and provide the CA bundle. See docs/database-tls.md.
-      CERTCTL_DATABASE_URL: ${CERTCTL_DATABASE_URL:-postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/certctl?sslmode=disable}
+      CERTCTL_DATABASE_URL: ${CERTCTL_DATABASE_URL:-postgres://certctl:${POSTGRES_PASSWORD}@postgres:5432/certctl?sslmode=disable}
      CERTCTL_SERVER_HOST: 0.0.0.0
      CERTCTL_SERVER_PORT: 8443
      CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt
      CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key
      CERTCTL_LOG_LEVEL: info
-      CERTCTL_AUTH_TYPE: none
-      CERTCTL_KEYGEN_MODE: server  # Demo uses server-side keygen; production should use "agent"
-      CERTCTL_NETWORK_SCAN_ENABLED: "true"  # Enable network scan GUI with seeded demo targets
-      CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key}  # AES-256-GCM for dynamic issuer/target config
+      # Bundle 2 closure (compose split). The base compose no longer
+      # sets CERTCTL_AUTH_TYPE / CERTCTL_KEYGEN_MODE / DEMO_MODE_ACK /
+      # DEMO_SEED — the code defaults take over (auth-type api-key,
+      # keygen agent, demo-mode false, demo-seed false). The demo
+      # overlay (docker-compose.demo.yml) is what flips this baseline
+      # into the populated-dashboard demo path; without that overlay
+      # the server boots production-shaped and refuses to start unless
+      # the operator has supplied CERTCTL_AUTH_SECRET +
+      # CERTCTL_CONFIG_ENCRYPTION_KEY.
+      #
+      # Audit 2026-05-10 HIGH-12: when DEMO_MODE_ACK=true (set by the
+      # demo overlay) AND the listener binds to a non-loopback address,
+      # every request is served as the synthetic admin actor
+      # `actor-demo-anon`. The server emits a prominent boot-time WARN
+      # banner with a production-promotion checklist in that case.
+      CERTCTL_AUTH_SECRET: ${CERTCTL_AUTH_SECRET}
+      CERTCTL_NETWORK_SCAN_ENABLED: "true"  # Enable network scan GUI
+      CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY}  # AES-256-GCM for dynamic issuer/target config
+      # Bootstrap token interpolation surface (Auditable Codebase Bundle
+      # cold-DB smoke closure, 2026-05-12). Pre-fix, the `env-file +
+      # --force-recreate certctl-server` pattern documented in
+      # cowork/manual-testing-bundle-2.html (and used by the cold-DB
+      # smoke job in .github/workflows/ci.yml::cold-db-compose-smoke)
+      # set CERTCTL_BOOTSTRAP_TOKEN in compose's own interpolation
+      # environment but the container never received it because this
+      # block didn't reference the variable. Wiring it as an explicit
+      # interpolation (default empty) makes the documented manual flow
+      # actually work end-to-end. Empty value = bootstrap strategy
+      # disabled (server returns 410 Gone on POST /api/v1/auth/bootstrap),
+      # which is the safe default — only set the var when you intend to
+      # mint a day-0 admin via the bootstrap path.
+      CERTCTL_BOOTSTRAP_TOKEN: ${CERTCTL_BOOTSTRAP_TOKEN:-}
    ports:
      - "8443:8443"
    volumes:
@@ -182,7 +261,19 @@ services:
    environment:
      CERTCTL_SERVER_URL: https://certctl-server:8443
      CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
-      CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
+      # Bundle 2 closure (compose split). No placeholder fallbacks.
+      # Operators MUST set CERTCTL_API_KEY (matching one of the server's
+      # CERTCTL_AUTH_SECRET rotation values) and CERTCTL_AGENT_ID
+      # (returned from `POST /api/v1/agents` during agent enrollment).
+      # Without an agent ID, cmd/agent/main.go fails fast at startup
+      # with "agent-id flag or CERTCTL_AGENT_ID env var is required" —
+      # the cold-DB compose smoke in .github/workflows/ci.yml tolerates
+      # the agent restart loop because the smoke targets server boot
+      # only. The demo overlay (docker-compose.demo.yml) supplies a
+      # pre-seeded agent-demo-1 row + matching env vars so the demo
+      # path stays one-command.
+      CERTCTL_API_KEY: ${CERTCTL_API_KEY}
+      CERTCTL_AGENT_ID: ${CERTCTL_AGENT_ID}
      CERTCTL_AGENT_NAME: docker-agent
      CERTCTL_LOG_LEVEL: info
      CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys  # Agent scans this directory for existing certificates
@@ -2,7 +2,15 @@ apiVersion: v2
 name: certctl
 description: Self-hosted certificate lifecycle management platform
 type: application
-version: 0.1.0
+# Bundle 3 closure (OPS-L1): bumped from 0.1.0 → 1.0.0. The pre-1.0
+# version implied "unstable chart, breaking changes on every minor"
+# which prospective enterprise operators read as "not ready for
+# production". The chart has been deployed against real clusters since
+# 2026-02 and shipped through 8 audit closures (M-018, U-1, U-2, U-3,
+# H-1, G-1, B1 connector validation, B2 first-run guards); 1.0.0
+# matches that maturity. The chart still adheres to semver going
+# forward — any breaking value-schema change bumps to 2.0.0.
+version: 1.0.0
 appVersion: "2.1.0"
 keywords:
  - certificate
@@ -128,8 +128,27 @@ Bundle B / Audit M-018 (PCI-DSS Req 4 / CWE-319):
    postgresql.tls.mode without further translation.
 */}}
 {{- define "certctl.databaseURL" -}}
+{{- if .Values.postgresql.enabled -}}
 {{- $sslMode := default "disable" .Values.postgresql.tls.mode -}}
 postgres://{{ .Values.postgresql.auth.username }}:$(POSTGRES_PASSWORD)@{{ include "certctl.fullname" . }}-postgres:5432/{{ .Values.postgresql.auth.database }}?sslmode={{ $sslMode }}
+{{- else -}}
+{{- /*
+  Bundle 3 closure (D2 + OPS-L2): external-Postgres first-class path.
+  When postgresql.enabled=false, the chart NEVER renders the
+  bundled StatefulSet, postgres-secret, or postgres-service —
+  templates/postgres-*.yaml gate themselves on .Values.postgresql.enabled.
+  The connection string comes from externalDatabase.url (the canonical
+  form) or, for backward-compat with pre-Bundle-3 deploys, from
+  server.env.CERTCTL_DATABASE_URL (which overrides this helper at the
+  pod-spec level — see server-deployment.yaml).
+
+  externalDatabase.url is consumed VERBATIM by the server's
+  CERTCTL_DATABASE_URL env var. Operators are responsible for choosing
+  the right sslmode (`verify-full` recommended for managed Postgres
+  per PCI-DSS Req 4 §2.2.5; see docs/database-tls.md).
+*/ -}}
+{{- required "externalDatabase.url is required when postgresql.enabled=false" .Values.externalDatabase.url -}}
+{{- end -}}
 {{- end }}

 {{/*
@@ -180,11 +199,110 @@ per affected resource. No-op when configured correctly.
 {{- if and (not .Values.server.tls.existingSecret) (not .Values.server.tls.certManager.enabled) -}}
 {{- fail "\n\ncertctl refuses to start without TLS.\n\nSet EXACTLY ONE of:\n  --set server.tls.existingSecret=<your-kubernetes.io/tls-secret-name>\nOR\n  --set server.tls.certManager.enabled=true \\\n  --set server.tls.certManager.issuerRef.name=<your-issuer-or-clusterissuer>\n\nSee docs/tls.md for the full setup walkthrough, including bootstrap\nguidance for air-gapped clusters without cert-manager.\n" -}}
 {{- end -}}
+{{- if and .Values.server.tls.existingSecret .Values.server.tls.certManager.enabled -}}
+{{- /*
+  Bundle 3 closure (D7): pre-Bundle-3 the helper only rejected the
+  NEITHER-set case. Setting BOTH (`existingSecret` AND `certManager.enabled=true`)
+  produced two TLS sources of truth — the existing Secret got mounted but
+  cert-manager simultaneously provisioned a Certificate CR pointing at a
+  conflicting Secret. Operators ended up with a dangling cert-manager
+  Certificate or a wrong-source TLS bundle. The chart now refuses at
+  render-time so the misconfiguration cannot ship.
+*/ -}}
+{{- fail "\n\nserver.tls.existingSecret AND server.tls.certManager.enabled are BOTH set.\n\nThe chart requires EXACTLY ONE TLS ownership path (Bundle 3 closure / audit D7):\n  - existingSecret: operator owns the TLS Secret; cert-manager must NOT provision one.\n  - certManager.enabled: cert-manager owns the TLS Secret; existingSecret must be empty.\n\nUnset one of:\n  --set server.tls.existingSecret=\"\"          (let cert-manager own it)\nOR\n  --set server.tls.certManager.enabled=false   (let the existing Secret stand)\n\nSee docs/tls.md.\n" -}}
+{{- end -}}
 {{- if and .Values.server.tls.certManager.enabled (not .Values.server.tls.certManager.issuerRef.name) -}}
 {{- fail "\n\nserver.tls.certManager.enabled=true but server.tls.certManager.issuerRef.name is empty.\n\nSet:\n  --set server.tls.certManager.issuerRef.name=<your-issuer-or-clusterissuer>\n\nSee docs/tls.md.\n" -}}
 {{- end -}}
 {{- end }}

+{{/*
+Pod- vs container-scope security context split (Bundle 3 closure / audit D3).
+
+The Kubernetes API splits SecurityContext into two non-overlapping
+field sets, and silently DROPS fields that land at the wrong scope —
+which is exactly the audit D3 finding pre-Bundle-3.
+
+Pod-scope fields (applied via spec.securityContext):
+  runAsNonRoot, runAsUser, runAsGroup, fsGroup, fsGroupChangePolicy,
+  supplementalGroups, seLinuxOptions, seccompProfile, sysctls.
+
+Container-scope fields (applied via spec.containers[].securityContext):
+  readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities,
+  privileged, procMount, runAsNonRoot/runAsUser/runAsGroup (override),
+  seLinuxOptions/seccompProfile (override).
+
+These helpers split a single operator-facing `securityContext` map
+into the two sub-maps so the chart renders each field at the scope
+where Kubernetes actually honors it. The split is conservative — a
+field that COULD live at either scope is rendered at pod scope only
+(no override at container scope) so behavior matches the pre-Bundle-3
+operator intent: pod-level setting is the source of truth.
+
+Operators don't need to change values.yaml; the existing
+`server.securityContext` and `agent.securityContext` blocks keep
+working byte-for-byte. The Helm template just routes each field to
+the correct YAML node now.
+*/}}
+{{- define "certctl.podSecurityContext" -}}
+{{- $sc := . -}}
+{{- $podKeys := list "runAsNonRoot" "runAsUser" "runAsGroup" "fsGroup" "fsGroupChangePolicy" "supplementalGroups" "seLinuxOptions" "seccompProfile" "sysctls" -}}
+{{- $out := dict -}}
+{{- range $k := $podKeys -}}
+{{- if hasKey $sc $k -}}
+{{- $_ := set $out $k (index $sc $k) -}}
+{{- end -}}
+{{- end -}}
+{{- toYaml $out -}}
+{{- end }}
+
+{{- define "certctl.containerSecurityContext" -}}
+{{- $sc := . -}}
+{{- $containerKeys := list "readOnlyRootFilesystem" "allowPrivilegeEscalation" "capabilities" "privileged" "procMount" -}}
+{{- $out := dict -}}
+{{- range $k := $containerKeys -}}
+{{- if hasKey $sc $k -}}
+{{- $_ := set $out $k (index $sc $k) -}}
+{{- end -}}
+{{- end -}}
+{{- toYaml $out -}}
+{{- end }}
+
+{{/*
+Required-secret gate (Bundle 3 closure / audit D1).
+
+Pre-Bundle-3 the chart accepted empty `server.auth.apiKey` and empty
+`postgresql.auth.password` and rendered Secrets with empty values; the
+certctl-server container then crash-looped at startup with the auth
+configuration error or with `pq: password authentication failed for
+user "certctl"`. Worse, an operator who forgot to set the api-key
+ended up with auth.type=api-key + empty CERTCTL_AUTH_SECRET in the
+Secret, which Validate() rejects at startup — but the diagnostic
+surfaces inside a CrashLoopBackOff, not at `helm install` time where
+it would be caught immediately.
+
+Post-Bundle-3 the chart fails at template time with operator-actionable
+guidance. The bundled-Postgres path (`postgresql.enabled=true`)
+requires `postgresql.auth.password`; the external-Postgres path
+(`postgresql.enabled=false`) skips that check because credentials are
+embedded in `externalDatabase.url` instead.
+
+Any template that depends on either secret value should call
+`{{ include "certctl.requiredSecrets" . }}` at the top so this guard
+runs once per affected resource. No-op when configured correctly.
+*/}}
+{{- define "certctl.requiredSecrets" -}}
+{{- if and (eq .Values.server.auth.type "api-key") (not .Values.server.auth.apiKey) -}}
+{{- fail "\n\nserver.auth.type=\"api-key\" but server.auth.apiKey is empty.\n\nSet:\n  --set server.auth.apiKey=$(openssl rand -base64 32)\n\nor put the value in a values override. The certctl-server container\nrefuses to start without an API key when auth.type=api-key.\n\nFor demo deploys without authentication, use:\n  --set server.auth.type=none\n(only safe behind an authenticating gateway — see docs/operator/security.md).\n" -}}
+{{- end -}}
+{{- if and .Values.postgresql.enabled (not .Values.postgresql.auth.password) -}}
+{{- fail "\n\npostgresql.enabled=true but postgresql.auth.password is empty.\n\nSet:\n  --set postgresql.auth.password=$(openssl rand -base64 32)\n\nor put the value in a values override. The bundled Postgres\nStatefulSet refuses to bootstrap initdb without POSTGRES_PASSWORD.\n\nFor external Postgres deployments, set:\n  --set postgresql.enabled=false\n  --set externalDatabase.url=postgres://user:pass@host:5432/db?sslmode=require\nSee deploy/helm/examples/values-external-db.yaml.\n" -}}
+{{- end -}}
+{{- if and (not .Values.postgresql.enabled) (not .Values.externalDatabase.url) (not .Values.server.env.CERTCTL_DATABASE_URL) -}}
+{{- fail "\n\npostgresql.enabled=false but no external database URL is configured.\n\nSet ONE of:\n  --set externalDatabase.url=postgres://user:pass@host:5432/db?sslmode=require\nOR (legacy)\n  --set server.env.CERTCTL_DATABASE_URL=postgres://user:pass@host:5432/db?sslmode=require\n\nSee deploy/helm/examples/values-external-db.yaml.\n" -}}
+{{- end -}}
+{{- end }}
+
 {{/*
 Auth-type validation gate.

@@ -202,8 +320,8 @@ Any template that consumes .Values.server.auth.type should call
 runs once per affected resource. No-op when configured correctly.
 */}}
 {{- define "certctl.validateAuthType" -}}
-{{- $valid := list "api-key" "none" -}}
+{{- $valid := list "api-key" "none" "oidc" -}}
 {{- if not (has .Values.server.auth.type $valid) -}}
-{{- fail (printf "\n\nserver.auth.type=%q is not supported (valid: %v).\n\nFor JWT/OIDC, run an authenticating gateway in front of certctl\n(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and\nset server.auth.type=none here so the gateway terminates federated\nidentity. See docs/architecture.md \"Authenticating-gateway pattern\"\nand docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.\n\nG-1 audit closure: pre-G-1 the chart accepted type=jwt and the binary\nsilently downgraded to api-key middleware. The chart now fails at\ntemplate time so misconfigured deployments cannot ship.\n" .Values.server.auth.type $valid) -}}
+{{- fail (printf "\n\nserver.auth.type=%q is not supported (valid: %v).\n\nFor JWT/SAML/LDAP, run an authenticating gateway in front of certctl\n(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and\nset server.auth.type=none here so the gateway terminates federated\nidentity. See docs/architecture.md \"Authenticating-gateway pattern\"\nand docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.\n\nG-1 audit closure: pre-G-1 the chart accepted type=jwt and the binary\nsilently downgraded to api-key middleware. The chart now fails at\ntemplate time so misconfigured deployments cannot ship.\n\nAuth Bundle 2 Phase 0: server.auth.type=oidc is in the valid set but\nthe OIDC handler chain ships in later Bundle 2 phases. Pre-Bundle-2\noperators who set type=oidc see the certctl-server container exit at\nstartup with an actionable error — chart-time validation no longer\nblocks deploy because the binary's runtime guard takes over. Once\nBundle 2 lands, the runtime guard relaxes and OIDC works end-to-end.\n" .Values.server.auth.type $valid) -}}
 {{- end -}}
 {{- end }}
@@ -19,7 +19,7 @@ spec:
    spec:
      serviceAccountName: {{ include "certctl.serviceAccountName" . }}
      securityContext:
-        {{- toYaml .Values.agent.securityContext | nindent 8 }}
+        {{- include "certctl.podSecurityContext" .Values.agent.securityContext | nindent 8 }}
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
@@ -40,6 +40,8 @@ spec:
        - name: agent
          image: {{ include "certctl.agentImage" . }}
          imagePullPolicy: {{ .Values.agent.image.pullPolicy }}
+          securityContext:
+            {{- include "certctl.containerSecurityContext" .Values.agent.securityContext | nindent 12 }}
          env:
            - name: CERTCTL_SERVER_URL
              value: {{ include "certctl.serverURL" . }}
@@ -106,7 +108,7 @@ spec:
    spec:
      serviceAccountName: {{ include "certctl.serviceAccountName" . }}
      securityContext:
-        {{- toYaml .Values.agent.securityContext | nindent 8 }}
+        {{- include "certctl.podSecurityContext" .Values.agent.securityContext | nindent 8 }}
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
@@ -127,6 +129,8 @@ spec:
        - name: agent
          image: {{ include "certctl.agentImage" . }}
          imagePullPolicy: {{ .Values.agent.image.pullPolicy }}
+          securityContext:
+            {{- include "certctl.containerSecurityContext" .Values.agent.securityContext | nindent 12 }}
          env:
            - name: CERTCTL_SERVER_URL
              value: {{ include "certctl.serverURL" . }}
@@ -0,0 +1,178 @@
+{{- /*
+Phase 4 DEPL-H2 closure (2026-05-14): opt-in Helm CronJob for
+PostgreSQL backups.
+
+OPERATOR OPT-IN. Default `backup.enabled: false`. Turning it on
+requires:
+  - In-cluster Postgres (this CronJob does NOT cover managed DB
+    services — for AWS RDS / GCP CloudSQL / Azure DB rely on the
+    provider's PITR).
+  - A sink choice (PVC or S3) configured in values.yaml.
+  - For S3: a Secret holding AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
+    (or use a service account with IRSA on EKS).
+
+The pg_dump invocation matches the canonical shape documented in
+docs/operator/runbooks/postgres-backup.md so a manual run and a
+CronJob run produce byte-identical dumps:
+
+  pg_dump --format=custom --no-owner --no-acl --dbname=certctl
+
+For sink choices beyond PVC + S3 (GCS, Azure Blob, NFS, restic, etc.),
+extend the `aws s3 cp` line below. The Job is intentionally minimal —
+it does ONE thing (capture + ship), not orchestrate retention or
+rotation. Off-host retention is the sink's responsibility (S3 lifecycle
+rules, PVC snapshot retention on the storage class, etc.).
+*/ -}}
+{{- if .Values.backup.enabled }}
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: {{ include "certctl.fullname" . }}-postgres-backup
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: postgres-backup
+spec:
+  schedule: {{ .Values.backup.schedule | quote }}
+  concurrencyPolicy: Forbid
+  successfulJobsHistoryLimit: {{ .Values.backup.successfulJobsHistoryLimit | default 3 }}
+  failedJobsHistoryLimit: {{ .Values.backup.failedJobsHistoryLimit | default 1 }}
+  startingDeadlineSeconds: {{ .Values.backup.startingDeadlineSeconds | default 300 }}
+  jobTemplate:
+    spec:
+      backoffLimit: {{ .Values.backup.backoffLimit | default 1 }}
+      activeDeadlineSeconds: {{ .Values.backup.activeDeadlineSeconds | default 3600 }}
+      template:
+        metadata:
+          labels:
+            {{- include "certctl.labels" . | nindent 12 }}
+            app.kubernetes.io/component: postgres-backup
+        spec:
+          restartPolicy: Never
+          {{- with .Values.imagePullSecrets }}
+          imagePullSecrets:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          serviceAccountName: {{ include "certctl.serviceAccountName" . }}
+          securityContext:
+            runAsUser: 1000
+            runAsGroup: 1000
+            runAsNonRoot: true
+            fsGroup: 1000
+          containers:
+            - name: backup
+              image: {{ .Values.backup.image | default "postgres:16-alpine" | quote }}
+              imagePullPolicy: {{ .Values.backup.imagePullPolicy | default "IfNotPresent" | quote }}
+              env:
+                - name: PGHOST
+                  value: {{ include "certctl.fullname" . }}-postgres
+                - name: PGPORT
+                  value: {{ .Values.postgresql.service.port | default 5432 | quote }}
+                - name: PGUSER
+                  valueFrom:
+                    secretKeyRef:
+                      name: {{ include "certctl.fullname" . }}-postgres
+                      key: username
+                - name: PGPASSWORD
+                  valueFrom:
+                    secretKeyRef:
+                      name: {{ include "certctl.fullname" . }}-postgres
+                      key: password
+                - name: PGDATABASE
+                  valueFrom:
+                    secretKeyRef:
+                      name: {{ include "certctl.fullname" . }}-postgres
+                      key: database
+                {{- if eq (.Values.backup.sink | default "pvc") "s3" }}
+                # S3 sink — operator provides AWS credentials via the
+                # Secret referenced in backup.s3.credentialsSecret. The
+                # credentials need s3:PutObject + s3:ListBucket on the
+                # target bucket only; least-privilege per industry
+                # standard.
+                - name: AWS_ACCESS_KEY_ID
+                  valueFrom:
+                    secretKeyRef:
+                      name: {{ .Values.backup.s3.credentialsSecret.name | quote }}
+                      key: {{ .Values.backup.s3.credentialsSecret.accessKeyIdKey | default "AWS_ACCESS_KEY_ID" }}
+                - name: AWS_SECRET_ACCESS_KEY
+                  valueFrom:
+                    secretKeyRef:
+                      name: {{ .Values.backup.s3.credentialsSecret.name | quote }}
+                      key: {{ .Values.backup.s3.credentialsSecret.secretAccessKeyKey | default "AWS_SECRET_ACCESS_KEY" }}
+                {{- with .Values.backup.s3.region }}
+                - name: AWS_DEFAULT_REGION
+                  value: {{ . | quote }}
+                {{- end }}
+                {{- end }}
+              command:
+                - /bin/sh
+                - -ceu
+                - |
+                  # Phase 4 DEPL-H2: canonical pg_dump shape per
+                  # docs/operator/runbooks/postgres-backup.md.
+                  # Custom-format compressed dump, no ownership /
+                  # ACL embedded — produces a portable artifact
+                  # restorable into any Postgres ≥ source major
+                  # via `pg_restore -d certctl <dump>`.
+                  set -euo pipefail
+                  TIMESTAMP="$(date -u +%Y%m%dT%H%M%SZ)"
+                  DUMP_FILE="/tmp/certctl-${TIMESTAMP}.dump"
+
+                  echo "[backup-cronjob] capturing dump at ${TIMESTAMP}"
+                  pg_dump --format=custom --no-owner --no-acl --dbname="${PGDATABASE}" \
+                    > "${DUMP_FILE}"
+
+                  # Integrity check — pg_restore --list parses the
+                  # dump's table-of-contents; a corrupt dump fails
+                  # here without shipping garbage off-host. Same
+                  # check the manual runbook performs.
+                  echo "[backup-cronjob] verifying dump integrity"
+                  pg_restore --list "${DUMP_FILE}" > /dev/null
+
+                  {{- if eq (.Values.backup.sink | default "pvc") "s3" }}
+                  # S3 sink — requires aws-cli. The default
+                  # postgres:16-alpine image does NOT include
+                  # aws-cli; operators MUST set
+                  # backup.image to an image that bundles both
+                  # (e.g. ghcr.io/your-org/postgres-aws:16) OR
+                  # override backup.command to install aws-cli at
+                  # runtime. The line below assumes the image has
+                  # `aws` on PATH.
+                  S3_PATH="{{ .Values.backup.s3.bucket }}/{{ .Values.backup.s3.prefix | default "certctl" }}/certctl-${TIMESTAMP}.dump"
+                  echo "[backup-cronjob] uploading to s3://${S3_PATH}"
+                  aws s3 cp "${DUMP_FILE}" "s3://${S3_PATH}"
+                  rm -f "${DUMP_FILE}"
+                  {{- else }}
+                  # PVC sink — dump lands at /backups/certctl-${TIMESTAMP}.dump
+                  # mounted from backup.pvc.claimName. Retention is the
+                  # PVC's responsibility (storage-class snapshot lifecycle
+                  # or a separate cleanup CronJob). The Job moves the
+                  # file from /tmp to /backups atomically; never
+                  # writes partial dumps into the durable mount.
+                  FINAL_PATH="/backups/certctl-${TIMESTAMP}.dump"
+                  echo "[backup-cronjob] persisting to ${FINAL_PATH}"
+                  mv "${DUMP_FILE}" "${FINAL_PATH}"
+                  {{- end }}
+                  echo "[backup-cronjob] done"
+              {{- if ne (.Values.backup.sink | default "pvc") "s3" }}
+              volumeMounts:
+                - name: backups
+                  mountPath: /backups
+              {{- end }}
+              resources:
+                {{- toYaml (.Values.backup.resources | default dict) | nindent 16 }}
+          {{- if ne (.Values.backup.sink | default "pvc") "s3" }}
+          volumes:
+            - name: backups
+              persistentVolumeClaim:
+                claimName: {{ .Values.backup.pvc.claimName | quote }}
+          {{- end }}
+          {{- with .Values.nodeAffinity }}
+          affinity:
+            nodeAffinity:
+              {{- toYaml . | nindent 14 }}
+          {{- end }}
+          {{- with .Values.backup.tolerations }}
+          tolerations:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+{{- end }}
@@ -0,0 +1,89 @@
+{{- /*
+Phase 4 DEPL-M1 closure (2026-05-14): Helm pre-install / pre-upgrade
+hook that runs Postgres migrations before the server Deployment rolls.
+
+Pre-DEPL-M1, postgres.RunMigrations was invoked at server boot
+(cmd/server/main.go:151) as the only migration path. That works for
+Compose deployments but conflicts with Kubernetes rolling deploys:
+when a new server image lands with a schema change, multiple replicas
+race the migration during the rollout. The hook resolves the race by
+running migrations OUT OF BAND, exactly once, before any new server
+pod starts.
+
+How it works:
+  - The Job ships the same certctl-server image as the Deployment, so
+    the migration code path is binary-identical to the boot-time path.
+  - It runs `certctl-server --migrate-only` (a flag the cmd/server
+    main process must support — see cmd/server/main.go for the flag
+    parse + early-exit path).
+  - The CERTCTL_MIGRATIONS_VIA_HOOK=true env var is ALSO set on the
+    server Deployment (via values.yaml). When the server boots, it
+    sees this env var and skips its own RunMigrations call — the
+    hook already did the work. Compose deploys don't set the env
+    var, so they keep the boot-time path unchanged.
+  - hook-delete-policy hook-succeeded means the Job is cleaned up
+    automatically on success but retained on failure for operator
+    diagnosis.
+  - The hook-weight ensures the migration Job runs before any other
+    pre-install/pre-upgrade resources (the StatefulSet's PVC has to
+    exist first; in practice the StatefulSet has no hook so it lands
+    naturally in the install phase after the Job completes).
+
+Operators on Compose: this hook is a no-op for you. The server still
+runs migrations at boot per the existing path.
+*/ -}}
+{{- if .Values.migrations.viaHook }}
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: {{ include "certctl.fullname" . }}-migrate
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: migration
+  annotations:
+    "helm.sh/hook": pre-install,pre-upgrade
+    "helm.sh/hook-weight": "-5"
+    "helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
+spec:
+  backoffLimit: {{ .Values.migrations.backoffLimit | default 1 }}
+  activeDeadlineSeconds: {{ .Values.migrations.activeDeadlineSeconds | default 600 }}
+  template:
+    metadata:
+      labels:
+        {{- include "certctl.labels" . | nindent 8 }}
+        app.kubernetes.io/component: migration
+    spec:
+      restartPolicy: Never
+      serviceAccountName: {{ include "certctl.serviceAccountName" . }}
+      securityContext:
+        {{- include "certctl.podSecurityContext" .Values.server.securityContext | nindent 8 }}
+      {{- with .Values.imagePullSecrets }}
+      imagePullSecrets:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+      containers:
+        - name: migrate
+          image: {{ include "certctl.serverImage" . }}
+          imagePullPolicy: {{ .Values.server.image.pullPolicy }}
+          # Migration-only entrypoint. The server binary supports a
+          # --migrate-only flag that runs postgres.RunMigrations +
+          # postgres.RunSeed and exits cleanly (zero on success,
+          # non-zero on migration failure). See cmd/server/main.go
+          # for the implementation. The flag is hermetic — no HTTP
+          # listener starts, no scheduler ticks, no signing
+          # operations occur. Pure schema-mutation pass.
+          command:
+            - /app/server
+            - --migrate-only
+          env:
+            - name: CERTCTL_DATABASE_URL
+              value: {{ include "certctl.databaseURL" . | quote }}
+            - name: CERTCTL_LOG_LEVEL
+              value: {{ .Values.server.logging.level | default "info" | quote }}
+            - name: CERTCTL_LOG_FORMAT
+              value: {{ .Values.server.logging.format | default "json" | quote }}
+          resources:
+            {{- toYaml (.Values.migrations.resources | default .Values.server.resources) | nindent 12 }}
+          securityContext:
+            {{- include "certctl.containerSecurityContext" .Values.server.securityContext | nindent 12 }}
+{{- end }}
@@ -0,0 +1,75 @@
+{{- /*
+Bundle 3 closure (D11): NetworkPolicy for the server Deployment.
+
+Pre-Bundle-3 the chart had no NetworkPolicy template at all — the
+audit-D11 "documented placeholder" finding referred to docs claiming
+deny-by-default network isolation that the rendered chart did not
+provide. Closed.
+
+This template emits a single NetworkPolicy that, when enabled,
+restricts the certctl-server Pod to:
+  - Ingress  : from any agent Pod in the same namespace (selector
+               match on app.kubernetes.io/component=agent) on the
+               server port, plus optional operator-supplied
+               additional from clauses (.networkPolicy.extraIngress).
+  - Egress   : to the postgres Pod (when postgresql.enabled=true),
+               53/UDP+TCP for kube-dns, and operator-supplied
+               additional to clauses for outbound CA / OIDC / SMTP
+               (.networkPolicy.extraEgress).
+
+Default off so existing deploys don't suddenly lose network reach.
+Operators opt in once they've mapped their actual egress surface.
+*/ -}}
+{{- if .Values.networkPolicy.enabled }}
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: {{ include "certctl.fullname" . }}-server
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: server
+spec:
+  podSelector:
+    matchLabels:
+      {{- include "certctl.serverSelectorLabels" . | nindent 6 }}
+  policyTypes:
+    - Ingress
+    - Egress
+  ingress:
+    # Allow in-cluster agent Pods to reach the server's HTTPS port.
+    - from:
+        - podSelector:
+            matchLabels:
+              app.kubernetes.io/name: {{ include "certctl.name" . }}
+              app.kubernetes.io/component: agent
+      ports:
+        - protocol: TCP
+          port: {{ .Values.server.port }}
+    {{- with .Values.networkPolicy.extraIngress }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+  egress:
+    # Kube-DNS (53/UDP + 53/TCP). Required for any in-cluster name
+    # resolution (postgres-service, OIDC issuer hostnames, ACME).
+    - to:
+        - namespaceSelector: {}
+      ports:
+        - protocol: UDP
+          port: 53
+        - protocol: TCP
+          port: 53
+    {{- if .Values.postgresql.enabled }}
+    # Bundled-Postgres egress.
+    - to:
+        - podSelector:
+            matchLabels:
+              app.kubernetes.io/name: {{ include "certctl.name" . }}
+              app.kubernetes.io/component: postgres
+      ports:
+        - protocol: TCP
+          port: 5432
+    {{- end }}
+    {{- with .Values.networkPolicy.extraEgress }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+{{- end }}
@@ -0,0 +1,31 @@
+{{- /*
+Bundle 3 closure (D11): PodDisruptionBudget for the server Deployment.
+
+Pre-Bundle-3 values.yaml carried `podDisruptionBudget.enabled` +
+`minAvailable` + `maxUnavailable` knobs but no template consumed
+them. Audit D11 closed.
+
+The PDB only renders when server.replicas > 1 — a single-replica
+deployment can't satisfy minAvailable=1 during voluntary disruption
+anyway (the K8s scheduler would refuse to drain the node). Operators
+running 2+ replicas get the PDB; operators running a single replica
+get a templated-out NOTES line reminding them to bump replicas first.
+*/ -}}
+{{- if and .Values.podDisruptionBudget.enabled (gt (int .Values.server.replicas) 1) }}
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: {{ include "certctl.fullname" . }}-server
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: server
+spec:
+  selector:
+    matchLabels:
+      {{- include "certctl.serverSelectorLabels" . | nindent 6 }}
+  {{- if .Values.podDisruptionBudget.minAvailable }}
+  minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
+  {{- else if .Values.podDisruptionBudget.maxUnavailable }}
+  maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
+  {{- end }}
+{{- end }}
@@ -1,3 +1,14 @@
+{{- if .Values.postgresql.enabled }}
+{{- /*
+  Bundle 3 closure (D1 + D2): the bundled-Postgres Secret only renders
+  when postgresql.enabled=true. Pre-Bundle-3 this template rendered
+  unconditionally with `password: "changeme"` as the fallback default —
+  which is exactly what the change-me-... cluster of audit findings
+  was about (a deployment that uses the rendered chart with default
+  values ships a known weak password). The Bundle-3 helper at
+  certctl.requiredSecrets fail-closes empty password at template time
+  before this template ever runs.
+*/ -}}
 apiVersion: v1
 kind: Secret
 metadata:
@@ -7,6 +18,7 @@ metadata:
    app.kubernetes.io/component: postgres
 type: Opaque
 stringData:
-  password: {{ .Values.postgresql.auth.password | default "changeme" | quote }}
+  password: {{ required "postgresql.auth.password is required when postgresql.enabled=true (Bundle 3: no fallback default)" .Values.postgresql.auth.password | quote }}
  username: {{ .Values.postgresql.auth.username | quote }}
  database: {{ .Values.postgresql.auth.database | quote }}
+{{- end }}
@@ -9,6 +9,21 @@ metadata:
 spec:
  serviceName: {{ include "certctl.fullname" . }}-postgres
  replicas: 1
+  # Phase 4 DEPL-M4 closure (2026-05-14): explicit StatefulSet update +
+  # pod-management strategies. Defaults make Postgres upgrades
+  # operator-controlled rather than automatic:
+  #   updateStrategy.type: OnDelete — Postgres pods do NOT roll
+  #     automatically when the StatefulSet spec changes. Operator
+  #     deletes the pod explicitly after taking a backup + reviewing
+  #     the change. Prevents an accidental Helm-template tweak from
+  #     triggering a database restart at an awkward time.
+  #   podManagementPolicy: OrderedReady — when scaling Postgres to
+  #     a replica >1 (future HA work), pods come up one at a time
+  #     and must reach Ready before the next pod is created. Aligns
+  #     with the standard Postgres-on-Kubernetes pattern.
+  updateStrategy:
+    type: OnDelete
+  podManagementPolicy: OrderedReady
  selector:
    matchLabels:
      {{- include "certctl.postgresSelectorLabels" . | nindent 6 }}
@@ -0,0 +1,145 @@
+{{- /*
+Phase 4 DEPL-L2 closure (2026-05-14): opt-in Prometheus AlertManager
+rules covering the four operationally-actionable alerts every certctl
+deployment wants out of the box.
+
+OPERATOR OPT-IN. Default `monitoring.prometheusRules.enabled: false`.
+Turning it on requires Prometheus Operator CRDs (PrometheusRule kind)
+to be installed in-cluster. Without them this template renders an
+object Kubernetes will reject — keep the toggle off if you're scraping
+with vanilla Prometheus + a Helm-installed AlertManager rules
+ConfigMap instead.
+
+Metric names + thresholds verified against the actual
+internal/api/handler/metrics.go exposition path:
+  - certctl_certificate_expiring_soon: server-side count of certs with
+    ExpiresAt in (now, now + 30d]. The 30-day window is computed in
+    internal/service/stats.go::GetDashboardSummary.
+  - certctl_agent_online: agents with heartbeat in the last 5 minutes.
+    A drop below certctl_agent_total signals offline agents.
+  - certctl_job_failed_total + certctl_job_completed_total: cumulative
+    counters; ratio gives the failure rate over the rate() window.
+  - certctl_issuance_failures_total: cumulative counter of failed
+    issuance attempts (renewal failures are issuance failures with a
+    specific error_class label).
+
+Adjust thresholds per fleet — the defaults below are tuned for the
+demo dataset (15 certs / 1 agent) and may need raising for production
+fleets with thousands of certs where a steady rate of expiring certs
+is the normal operating state.
+*/ -}}
+{{- if and .Values.monitoring.enabled .Values.monitoring.prometheusRules.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: {{ include "certctl.fullname" . }}-rules
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: monitoring
+    {{- with .Values.monitoring.prometheusRules.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  groups:
+    - name: certctl.alerts
+      interval: {{ .Values.monitoring.prometheusRules.interval | default "60s" }}
+      rules:
+        # ---------------------------------------------------------------
+        # Alert: CertctlCertificateExpiringSoon
+        # Series: certctl_certificate_expiring_soon
+        # The certctl-server counts certs with ExpiresAt in
+        # (now, now + 30d] every metrics scrape. Fires whenever any cert
+        # crosses into that window — operator must triage or extend
+        # automation coverage. Rapid renewal infrastructure should keep
+        # this number small in steady state.
+        # ---------------------------------------------------------------
+        - alert: CertctlCertificateExpiringSoon
+          expr: certctl_certificate_expiring_soon > {{ .Values.monitoring.prometheusRules.thresholds.expiringCertificateCount | default 0 }}
+          for: {{ .Values.monitoring.prometheusRules.thresholds.expiringCertificateFor | default "5m" }}
+          labels:
+            severity: warning
+            component: certctl
+          annotations:
+            summary: "certctl: {{`{{ $value }}`}} certificate(s) expiring within 30 days"
+            description: >-
+              certctl_certificate_expiring_soon has been > {{ .Values.monitoring.prometheusRules.thresholds.expiringCertificateCount | default 0 }}
+              for 5+ minutes. Investigate via
+              /api/v1/certificates?status=expiring or the dashboard's
+              Expiring tab. If renewal automation should have covered
+              these, check the renewal scheduler logs for the cert IDs
+              + the per-issuer failure rate.
+
+        # ---------------------------------------------------------------
+        # Alert: CertctlAgentOffline
+        # Series: certctl_agent_total - certctl_agent_online
+        # Agents flip from online → offline after 5 minutes without a
+        # heartbeat (internal/service/stats.go::GetDashboardSummary).
+        # The 1h `for:` window prevents a flapping agent from paging the
+        # operator on every transient network blip.
+        # ---------------------------------------------------------------
+        - alert: CertctlAgentOffline
+          expr: (certctl_agent_total - certctl_agent_online) > {{ .Values.monitoring.prometheusRules.thresholds.offlineAgentCount | default 0 }}
+          for: {{ .Values.monitoring.prometheusRules.thresholds.offlineAgentFor | default "1h" }}
+          labels:
+            severity: warning
+            component: certctl-agent
+          annotations:
+            summary: "certctl: {{`{{ $value }}`}} agent(s) offline for >1h"
+            description: >-
+              One or more certctl-agent instances have been without a
+              heartbeat for over an hour. Check the agent logs on the
+              affected hosts. If the agent host is intentionally
+              decommissioned, retire the agent via the dashboard or
+              POST /api/v1/agents/{id}/retire to suppress this alert.
+
+        # ---------------------------------------------------------------
+        # Alert: CertctlJobFailureRateHigh
+        # Series: certctl_job_failed_total / (certctl_job_failed_total + certctl_job_completed_total)
+        # Computes the failure rate over a 15-minute rate() window so
+        # short bursts don't fire but a sustained issue does. The 5%
+        # threshold is a conservative starter — adjust per fleet's
+        # baseline.
+        # ---------------------------------------------------------------
+        - alert: CertctlJobFailureRateHigh
+          expr: >-
+            (
+              rate(certctl_job_failed_total[15m])
+              /
+              clamp_min(rate(certctl_job_failed_total[15m]) + rate(certctl_job_completed_total[15m]), 1)
+            ) > {{ .Values.monitoring.prometheusRules.thresholds.jobFailureRate | default 0.05 }}
+          for: {{ .Values.monitoring.prometheusRules.thresholds.jobFailureRateFor | default "15m" }}
+          labels:
+            severity: warning
+            component: certctl
+          annotations:
+            summary: "certctl: job failure rate above 5% over 15m"
+            description: >-
+              The 15m rate of certctl_job_failed_total / total jobs
+              has been above 5% for 15+ minutes. Open
+              /api/v1/jobs?status=failed to see the failing job IDs
+              and root-cause the recurring error class.
+
+        # ---------------------------------------------------------------
+        # Alert: CertctlIssuanceFailures
+        # Series: certctl_issuance_failures_total
+        # Any non-zero rate of issuance failures over a 15m window is
+        # operationally significant — a single CA outage or expired
+        # ACME account can cascade across the fleet.
+        # ---------------------------------------------------------------
+        - alert: CertctlIssuanceFailures
+          expr: rate(certctl_issuance_failures_total[15m]) > {{ .Values.monitoring.prometheusRules.thresholds.issuanceFailureRate | default 0 }}
+          for: {{ .Values.monitoring.prometheusRules.thresholds.issuanceFailureFor | default "15m" }}
+          labels:
+            severity: warning
+            component: certctl
+          annotations:
+            summary: "certctl: certificate issuance / renewal failures over 15m"
+            description: >-
+              certctl_issuance_failures_total has been incrementing
+              over the last 15 minutes. Check the per-issuer breakdown
+              via /api/v1/issuers + the failed-job log in
+              /api/v1/jobs?status=failed. Common causes: CA
+              outage, ACME account rate-limit, EAB credential
+              expiration, stepca provisioner key rotation without
+              certctl-side update.
+{{- end }}
@@ -12,6 +12,8 @@ data:
  keygen-mode: {{ .Values.server.keygen.mode | quote }}
  rate-limit-rps: {{ .Values.server.rateLimiting.rps | quote }}
  rate-limit-burst: {{ .Values.server.rateLimiting.burst | quote }}
+  rate-limit-backend: {{ .Values.server.rateLimiting.backend | default "memory" | quote }}
+  rate-limit-janitor-interval: {{ .Values.server.rateLimiting.janitorInterval | default "5m" | quote }}
  {{- if .Values.server.cors.origins }}
  cors-origins: {{ .Values.server.cors.origins | quote }}
  {{- end }}
@@ -1,5 +1,6 @@
 {{- include "certctl.tls.required" . }}
 {{- include "certctl.validateAuthType" . }}
+{{- include "certctl.requiredSecrets" . }}
 apiVersion: apps/v1
 kind: Deployment
 metadata:
@@ -23,8 +24,13 @@ spec:
        checksum/secret: {{ include (print $.Template.BasePath "/server-secret.yaml") . | sha256sum }}
    spec:
      serviceAccountName: {{ include "certctl.serviceAccountName" . }}
+      # Bundle 3 closure (D3): pod-level fields only. The container-only
+      # fields (readOnlyRootFilesystem, allowPrivilegeEscalation,
+      # capabilities, privileged) render at container scope below —
+      # pre-Bundle-3 they all sat here at pod scope and the K8s API
+      # silently dropped them.
      securityContext:
-        {{- toYaml .Values.server.securityContext | nindent 8 }}
+        {{- include "certctl.podSecurityContext" .Values.server.securityContext | nindent 8 }}
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
@@ -33,6 +39,13 @@ spec:
        - name: server
          image: {{ include "certctl.serverImage" . }}
          imagePullPolicy: {{ .Values.server.image.pullPolicy }}
+          # Bundle 3 closure (D3): container-scope security hardening.
+          # readOnlyRootFilesystem + allowPrivilegeEscalation +
+          # capabilities are container-only fields per the K8s API; the
+          # helper splits them out of the operator-facing
+          # server.securityContext map so existing values keep working.
+          securityContext:
+            {{- include "certctl.containerSecurityContext" .Values.server.securityContext | nindent 12 }}
          ports:
            - name: https
              containerPort: {{ .Values.server.port }}
@@ -51,11 +64,16 @@ spec:
                secretKeyRef:
                  name: {{ include "certctl.fullname" . }}-server
                  key: database-url
+            # Bundle 3 closure (D2): POSTGRES_PASSWORD is only needed
+            # for the bundled-Postgres mode. External Postgres mode
+            # embeds the password directly in externalDatabase.url.
+            {{- if .Values.postgresql.enabled }}
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: {{ include "certctl.fullname" . }}-postgres
                  key: password
+            {{- end }}
            - name: CERTCTL_LOG_LEVEL
              valueFrom:
                configMapKeyRef:
@@ -90,6 +108,19 @@ spec:
                configMapKeyRef:
                  name: {{ include "certctl.fullname" . }}-server
                  key: rate-limit-burst
+            # Phase 13 Sprint 13.3 (ARCH-M1) — cross-replica-consistent
+            # sliding-window rate limiter. Default memory; flip to
+            # postgres when server.replicas > 1.
+            - name: CERTCTL_RATE_LIMIT_BACKEND
+              valueFrom:
+                configMapKeyRef:
+                  name: {{ include "certctl.fullname" . }}-server
+                  key: rate-limit-backend
+            - name: CERTCTL_RATE_LIMIT_JANITOR_INTERVAL
+              valueFrom:
+                configMapKeyRef:
+                  name: {{ include "certctl.fullname" . }}-server
+                  key: rate-limit-janitor-interval
            {{- if .Values.server.cors.origins }}
            - name: CERTCTL_CORS_ORIGINS
              valueFrom:
@@ -0,0 +1,63 @@
+{{- /*
+Bundle 3 closure (D5 + OPS-M1 docs): Prometheus Operator ServiceMonitor.
+
+Pre-Bundle-3 the chart had `monitoring.serviceMonitor.enabled` in
+values.yaml but no template consumed it — toggling it on rendered
+nothing. Audit D5 closed.
+
+The endpoint scrapes /api/v1/metrics/prometheus which the certctl
+server already exposes in Prometheus exposition format (see
+internal/api/handler/metrics.go::GetPrometheusMetrics). Note: the
+endpoint is rbac-gated on `metrics.read`, so the ServiceMonitor needs
+a bearer token. Operators with Prometheus Operator MUST set
+`monitoring.serviceMonitor.bearerTokenSecret` pointing at a Secret
+that holds an API key with the `metrics.read` permission. Without
+that, scrapes return 401.
+
+OPS-M1 caveat: the current /metrics/prometheus handler is a hand-rolled
+exposition-format emitter, not prometheus/client_golang-instrumented
+code. Histograms, exemplars, and target labels are limited to what the
+handler computes statically. Migration to client_golang tracked in
+WORKSPACE-ROADMAP.md.
+*/ -}}
+{{- if and .Values.monitoring.enabled .Values.monitoring.serviceMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: {{ include "certctl.fullname" . }}-server
+  labels:
+    {{- include "certctl.labels" . | nindent 4 }}
+    app.kubernetes.io/component: server
+    {{- with .Values.monitoring.serviceMonitor.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  selector:
+    matchLabels:
+      {{- include "certctl.serverSelectorLabels" . | nindent 6 }}
+  endpoints:
+    - port: https
+      scheme: https
+      path: /api/v1/metrics/prometheus
+      interval: {{ .Values.monitoring.serviceMonitor.interval | default "30s" }}
+      scrapeTimeout: {{ .Values.monitoring.serviceMonitor.scrapeTimeout | default "10s" }}
+      tlsConfig:
+        # The certctl server uses self-signed bootstrap TLS or operator-
+        # provided cert-manager TLS — the ServiceMonitor consumes the
+        # same CA bundle the server presents. When server.tls.existingSecret
+        # is set, operators usually want to pull the matching ca.crt key
+        # out of that Secret. Adjust if your CA chain lives elsewhere.
+        {{- if .Values.monitoring.serviceMonitor.tlsConfig }}
+        {{- toYaml .Values.monitoring.serviceMonitor.tlsConfig | nindent 8 }}
+        {{- else }}
+        insecureSkipVerify: true
+        {{- end }}
+      {{- with .Values.monitoring.serviceMonitor.bearerTokenSecret }}
+      bearerTokenSecret:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+      {{- with .Values.monitoring.serviceMonitor.relabelings }}
+      relabelings:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+{{- end }}
@@ -15,7 +15,10 @@ fullnameOverride: ""
 # Certctl Server Configuration
 # ==============================================================================
 server:
-  # Number of replicas (for HA deployments)
+  # Number of replicas (for HA deployments).
+  # Phase 2 DEPL-H1: production HA is operator-opt-in across this field
+  # + podDisruptionBudget.enabled + server.service.sessionAffinity.
+  # See docs/operator/runbooks/ha.md for the smallest-possible HA overlay.
  replicas: 1

  # Image configuration
@@ -28,6 +31,36 @@ server:
  port: 8443

  # Resource requests and limits
+  #
+  # Phase 4 DEPL-M5 (2026-05-14): per-fleet-size tuning ladder. The
+  # default values below are validated against the demo dataset
+  # (15 certs / 1 agent) and the baselines in
+  # docs/operator/performance-baselines.md (single endpoint < 5s for
+  # 100 sequential requests = ~50ms p50; cursor-paginated 1000-cert
+  # inventory walk < 3s; renewal scan for 15 certs < 100ms).
+  #
+  # Larger fleet recommendations (TBD pending Phase 8 load-test runs;
+  # operators tune empirically until then — capture readings in your
+  # own loadtest-baselines log):
+  #
+  #   ≤ 500 certs / 100 agents:      defaults below                  (100m / 128Mi req, 500m / 512Mi lim)
+  #   5K certs / 1K agents:          tune up — TBD Phase 8           (suggested starter: 500m / 512Mi req, 2000m / 2Gi lim)
+  #   50K certs / 10K agents:        tune up — TBD Phase 8           (suggested starter: 2000m / 2Gi req, 4000m / 4Gi lim)
+  #
+  # The "suggested starter" values above are operator-tuning starting
+  # points, NOT validated. Phase 8 (load test coverage expansion) will
+  # measure them against synthetic fleets and replace the suggestions
+  # with measured ceilings. Until then, treat them as a "raise CPU
+  # before raising memory; raise both before scaling out" mental
+  # model. Per docs/operator/performance-baselines.md, certctl-server
+  # is CPU-bound on issuance / renewal scan work and memory-bound on
+  # the inventory query path.
+  #
+  # Database scale (postgresql.* below) tracks server scale roughly
+  # 1:1 — at 50K certs the Postgres instance needs 4 CPU / 4Gi RAM
+  # and shared_buffers ≥ 1Gi. Postgres tuning is out of scope for
+  # this comment; see docs/operator/runbooks/postgres-backup.md
+  # for the production-tuning entry-point.
  resources:
    requests:
      cpu: 100m
@@ -178,8 +211,25 @@ server:

  # Rate limiting configuration
  rateLimiting:
-    rps: 100      # Requests per second
-    burst: 200    # Burst capacity
+    rps: 100      # Requests per second (token-bucket middleware)
+    burst: 200    # Burst capacity (token-bucket middleware)
+
+    # Sliding-window-log rate-limit backend (Phase 13 Sprint 13.2/13.3
+    # ARCH-M1 closure). Selects the implementation backing the
+    # break-glass / OCSP / cert-export / EST limiters. See
+    # docs/operator/observability.md for the operator decision tree.
+    #
+    #   memory   — per-process (default; single-replica deploys).
+    #   postgres — cross-replica-consistent via rate_limit_buckets.
+    #              REQUIRED when server.replicas > 1 for accurate
+    #              cluster-wide enforcement.
+    backend: memory
+
+    # Scheduler janitor interval for the postgres backend's
+    # rate_limit_buckets sweep. Ignored when backend=memory (the
+    # in-memory backend self-prunes on every Allow call).
+    # Default 5m; minimum 1m.
+    janitorInterval: "5m"

  # Network scanning configuration
  networkScan:
@@ -272,6 +322,34 @@ server:
  #   secret:
  #     secretName: ca-cert

+# ==============================================================================
+# External Database Configuration (Bundle 3 closure / D2 + OPS-L2)
+# ==============================================================================
+# When postgresql.enabled=false, the chart skips the bundled StatefulSet +
+# Secret + Service and instead consumes the URL below verbatim as the
+# server's CERTCTL_DATABASE_URL. The URL embeds username, password,
+# host, port, database, and sslmode — operators are responsible for
+# rotating credentials in this string out-of-band (Kubernetes Secret +
+# helm upgrade is the supported pattern).
+#
+# Recommended sslmode for managed Postgres (RDS, Cloud SQL, Azure DB):
+#   verify-full  — PCI-DSS Req 4 v4.0 §2.2.5 compliant; requires CA bundle.
+#                  Mount the CA via server.volumes / server.volumeMounts and
+#                  set sslrootcert=/path/in/pod/ca.crt in the URL.
+#
+# Example values overrides:
+#   postgresql.enabled: false
+#   externalDatabase.url: "postgres://certctl:HUNTER2@db.example.com:5432/certctl?sslmode=verify-full"
+#
+# Migration from the legacy `server.env.CERTCTL_DATABASE_URL` workaround:
+# both still work (env block overrides the helper-emitted Secret value at
+# pod-spec level), but the new path renders cleaner manifests with no
+# stranded postgres-* templates.
+externalDatabase:
+  # Connection string used when postgresql.enabled=false.
+  # Required in that mode — see certctl.requiredSecrets helper.
+  url: ""
+
 # ==============================================================================
 # PostgreSQL Configuration
 # ==============================================================================
@@ -418,6 +496,27 @@ agent:
  replicas: 1

  # Resource requests and limits
+  #
+  # Phase 4 DEPL-M5 (2026-05-14): per-fleet-size tuning ladder for the
+  # agent. Defaults are sized for the standard "one cert per host"
+  # operating pattern: the agent polls the server every 30 seconds
+  # (hardcoded in cmd/agent/main.go::pollInterval — not yet
+  # env-configurable), generates ECDSA P-256 keys locally on
+  # issuance/renewal events, and is otherwise idle. CPU is bursty only
+  # during keygen + CSR submission.
+  #
+  # Tuning ladder (TBD pending Phase 8 — measure on your fleet):
+  #
+  #   1 cert / host (typical):        defaults below            (50m / 64Mi req, 200m / 256Mi lim)
+  #   10 certs / host:                stays at defaults — agent is poll-driven, not work-bound by cert count
+  #   100 certs / host (rare):        raise lim to 500m / 512Mi if you see throttling on issuance bursts
+  #
+  # The agent does NOT cache certs in memory — issuance is one-shot
+  # generate-then-deploy. So per-host memory scales with whatever
+  # truststore PEM bundles the agent's connectors load (Apache /
+  # Postfix / similar), not with the cert count. Defaults are
+  # appropriate for any "agent terminates ≤ 100 certs on this host"
+  # deployment.
  resources:
    requests:
      cpu: 50m
@@ -510,14 +609,34 @@ rbac:
  create: true

 # ==============================================================================
-# Kubernetes Secrets Target Connector
+# Kubernetes Secrets Target Connector (PREVIEW — Bundle 3 closure / C3)
 # ==============================================================================
+# Bundle 3 audit closure (C3): the connector framework at
+# internal/connector/target/k8ssecret/ ships the Config + interface +
+# 14 unit tests, but the production K8s client at
+# k8ssecret.go::realK8sClient is documented as "a stub placeholder for
+# the real k8s.io/client-go implementation". The repo does not import
+# k8s.io/client-go (verified via `grep -n "client-go" go.mod`), so the
+# connector cannot deploy to a real cluster today.
+#
+# Setting kubernetesSecrets.enabled=true wires up the RBAC verbs the
+# real client will need (get/create/update/patch/delete on Secrets)
+# without making the connector functional — operators trying to use it
+# get the stub's error and a pointer to this note.
+#
+# Status: PREVIEW. Production client lands when the cluster-management
+# bundle ships (tracked in WORKSPACE-ROADMAP.md). Until then,
+# in-cluster deploys use the file-based connectors (NGINX, Apache,
+# HAProxy, etc.) via a Pod-mounted Secret + DaemonSet agent.
 kubernetesSecrets:
-  # Enable RBAC rules for managing TLS Secrets
  enabled: false

 # ==============================================================================
-# Pod Disruption Budget (for HA deployments)
+# Pod Disruption Budget (for HA deployments).
+# Phase 2 DEPL-H1: defaults to enabled=false because a PDB template
+# rendered at `replicas: 1` blocks every rolling restart on a
+# single-node cluster. Production HA flips this to true alongside
+# server.replicas ≥ 2. See docs/operator/runbooks/ha.md.
 # ==============================================================================
 podDisruptionBudget:
  enabled: false
@@ -527,6 +646,13 @@ podDisruptionBudget:
 # ==============================================================================
 # Monitoring Configuration
 # ==============================================================================
+# Bundle 3 closure (D5): the ServiceMonitor template at
+# templates/servicemonitor.yaml renders when both monitoring.enabled=true
+# AND monitoring.serviceMonitor.enabled=true. The endpoint scrapes
+# /api/v1/metrics/prometheus, which is rbac-gated on `metrics.read` —
+# operators MUST provide a bearer token via
+# monitoring.serviceMonitor.bearerTokenSecret pointing at a Secret with
+# an API key holding that permission. Without the token, scrapes 401.
 monitoring:
  enabled: false
  # Prometheus ServiceMonitor
@@ -534,8 +660,196 @@ monitoring:
    enabled: false
    interval: 30s
    scrapeTimeout: 10s
+    # Additional labels applied to the ServiceMonitor metadata.
    # labels: {}
-    # selector: {}
+    # Bearer-token Secret reference (required when the certctl server's
+    # /api/v1/metrics/prometheus endpoint is gated by api-key auth).
+    # Example:
+    #   bearerTokenSecret:
+    #     name: certctl-prometheus-key
+    #     key: api-key
+    # bearerTokenSecret: {}
+    # TLS config for the scrape endpoint. The certctl server presents
+    # the same TLS cert the rest of the chart uses; insecureSkipVerify
+    # defaults to true so demos work out of the box. Production deploys
+    # should pin the CA via caFile or ca.secret.
+    # tlsConfig:
+    #   caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
+    #   serverName: certctl-server
+    # tlsConfig: {}
+    # Optional relabeling for the scrape job.
+    # relabelings: []
+
+  # ----------------------------------------------------------------------
+  # Phase 4 DEPL-L2 closure (2026-05-14): PrometheusRule (alert rules)
+  #
+  # Operator opt-in. Requires Prometheus Operator CRDs (the
+  # `monitoring.coreos.com/v1` PrometheusRule kind) installed in
+  # cluster. Without those CRDs the rendered object is rejected by
+  # `kubectl apply` — keep enabled: false if you scrape with vanilla
+  # Prometheus + AlertManager rules ConfigMap instead.
+  #
+  # Four starter rules ship out of the box (see
+  # templates/prometheusrules.yaml for the full PromQL):
+  #
+  #   CertctlCertificateExpiringSoon — certs expiring within 30d
+  #   CertctlAgentOffline             — agent without heartbeat for >1h
+  #   CertctlJobFailureRateHigh       — job-failure rate over 5% (15m)
+  #   CertctlIssuanceFailures         — any issuance failures in last 15m
+  #
+  # All thresholds are operator-tunable via the `thresholds:` block
+  # below. The defaults are tuned for the demo dataset (15 certs / 1
+  # agent); production fleets with sustained renewal volume MAY want
+  # to raise the expiringCertificateCount + jobFailureRate thresholds
+  # to suppress steady-state noise.
+  prometheusRules:
+    enabled: false
+    # Evaluation interval for the rule group.
+    interval: 60s
+    # Additional labels applied to the PrometheusRule metadata.
+    # labels: {}
+    # Per-alert threshold / duration tunables.
+    thresholds:
+      # Fire when more than N certs are in the expiring-soon window.
+      expiringCertificateCount: 0
+      expiringCertificateFor: 5m
+      # Fire when more than N agents are offline (server - online).
+      offlineAgentCount: 0
+      offlineAgentFor: 1h
+      # Fire when job failure rate exceeds this fraction (15m window).
+      jobFailureRate: 0.05
+      jobFailureRateFor: 15m
+      # Fire when issuance failure rate exceeds this value (15m window).
+      issuanceFailureRate: 0
+      issuanceFailureFor: 15m
+
+# ==============================================================================
+# Backup CronJob (Phase 4 DEPL-H2 closure, 2026-05-14)
+# ==============================================================================
+# Operator opt-in. Default OFF. The CronJob runs `pg_dump --format=custom
+# --no-owner --no-acl --dbname=certctl` matching the canonical shape
+# documented in docs/operator/runbooks/postgres-backup.md (so manual
+# and automated dumps are byte-identical) and ships the result to a
+# sink chosen below.
+#
+# DO NOT enable this for managed Postgres deployments (AWS RDS / GCP
+# Cloud SQL / Azure DB) — those have built-in PITR backup that this
+# CronJob cannot match. For in-cluster Postgres only.
+backup:
+  enabled: false
+  # Cron expression (UTC). Default: 02:30 UTC daily.
+  schedule: "30 2 * * *"
+  # Sink: "pvc" (default — dump lands on a PersistentVolumeClaim) or
+  # "s3" (uploads via aws-cli — requires an image that bundles
+  # aws-cli, see backup.image below).
+  sink: pvc
+  # Container image. The default postgres:16-alpine has pg_dump but
+  # NOT aws-cli; for sink: s3 set this to an image that bundles both
+  # (e.g. ghcr.io/your-org/postgres-aws:16) or override the Job's
+  # command to install aws-cli at runtime.
+  image: postgres:16-alpine
+  imagePullPolicy: IfNotPresent
+  # PVC sink config — used when sink: pvc.
+  pvc:
+    # Name of an existing PersistentVolumeClaim mounted at /backups
+    # in the Job's pod. The PVC's storage class controls durability
+    # and snapshot retention. Operator creates this PVC out of band
+    # via their own storage policy.
+    claimName: certctl-backups
+  # S3 sink config — used when sink: s3.
+  s3:
+    # Target bucket (without s3:// prefix).
+    bucket: ""
+    # Object key prefix inside the bucket. Dumps land at
+    # s3://<bucket>/<prefix>/certctl-<TIMESTAMP>.dump.
+    prefix: certctl
+    # AWS region (sets AWS_DEFAULT_REGION). Optional if the image's
+    # AWS SDK can resolve the region another way (instance profile,
+    # IRSA, etc.).
+    region: ""
+    # Secret holding AWS credentials. The IAM principal needs
+    # s3:PutObject + s3:ListBucket on the target bucket only.
+    credentialsSecret:
+      name: certctl-backup-aws-creds
+      accessKeyIdKey: AWS_ACCESS_KEY_ID
+      secretAccessKeyKey: AWS_SECRET_ACCESS_KEY
+  # Job housekeeping.
+  successfulJobsHistoryLimit: 3
+  failedJobsHistoryLimit: 1
+  startingDeadlineSeconds: 300
+  backoffLimit: 1
+  activeDeadlineSeconds: 3600
+  # Resource budget for the backup container. pg_dump is generally
+  # memory-light; ~250MB RSS for fleets up to 100K certs is typical.
+  resources:
+    requests:
+      cpu: 100m
+      memory: 128Mi
+    limits:
+      cpu: 500m
+      memory: 512Mi
+  # Optional tolerations for the backup Job pod.
+  tolerations: []
+
+# ==============================================================================
+# Migrations via Helm hook (Phase 4 DEPL-M1 closure, 2026-05-14)
+# ==============================================================================
+# When viaHook: true, the chart deploys templates/migration-job.yaml as
+# a pre-install + pre-upgrade hook that runs `certctl-server
+# --migrate-only` (a hermetic schema-mutation pass) before the server
+# Deployment rolls.
+#
+# Set CERTCTL_MIGRATIONS_VIA_HOOK=true in the server Deployment env to
+# tell the server to skip its boot-time RunMigrations call (the hook
+# already did the work; running again at boot would race across
+# replicas during rollouts).
+#
+# Default OFF — when off, the server runs migrations at boot exactly
+# as it always has (Compose deploys keep this path).
+migrations:
+  viaHook: false
+  # Job housekeeping.
+  backoffLimit: 1
+  activeDeadlineSeconds: 600
+  # Resource budget for the migration Job pod. The migration pass is
+  # I/O-bound on Postgres; matches the server's resource budget by
+  # default. Override here if migrations on a large database need
+  # more headroom than the steady-state server.
+  # resources:
+  #   requests:
+  #     cpu: 100m
+  #     memory: 128Mi
+  #   limits:
+  #     cpu: 500m
+  #     memory: 512Mi
+
+# ==============================================================================
+# Network Policy (Bundle 3 closure / D11)
+# ==============================================================================
+# Default off so existing deploys don't suddenly lose network reach.
+# When enabled, restricts the server pod to:
+#   - Ingress: from in-namespace agent pods only.
+#   - Egress: kube-dns + bundled Postgres (if enabled).
+# Operators add CA / OIDC / SMTP egress via extraEgress.
+networkPolicy:
+  enabled: false
+  # Additional Ingress rules merged into the policy. Each entry is a
+  # raw networking.k8s.io/v1 NetworkPolicyIngressRule.
+  extraIngress: []
+  # Additional Egress rules merged into the policy. Common operator
+  # need: 443/TCP to an OIDC issuer, 443/TCP to a public CA endpoint,
+  # 25/TCP to an SMTP relay.
+  # Example:
+  # extraEgress:
+  #   - to:
+  #       - ipBlock:
+  #           cidr: 0.0.0.0/0
+  #           except:
+  #             - 10.0.0.0/8
+  #     ports:
+  #       - protocol: TCP
+  #         port: 443
+  extraEgress: []

 # ==============================================================================
 # Advanced Configuration
@@ -6,8 +6,8 @@
 # Per H-001 guard: every FROM is digest-pinned. Operator re-pins
 # quarterly per docs/deployment-vendor-matrix.md.

-# golang:1.25.9-bookworm digest pinned per H-001.
-FROM golang:1.25.9-bookworm@sha256:1a1408bf8d2d3077f9508880caf0e8bb0fde195fe3c890e7ea480dfb66dc7827 AS builder
+# golang:1.25.10-bookworm digest pinned per H-001.
+FROM golang:1.25.10-bookworm@sha256:e3a54b77385b4f8a31c1db4d12429ffb3718ea76865731a787c497755d409547 AS builder
 WORKDIR /src
 COPY deploy/test/f5-mock-icontrol/ ./
 RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags "-s -w" -o /out/f5-mock-icontrol .
@@ -1,3 +1,3 @@
 module github.com/certctl-io/certctl/deploy/test/f5-mock-icontrol

-go 1.25.9
+go 1.25.10
@@ -82,16 +82,30 @@ ARG LIBEST_REF
 # is the same major version libest r3.2.0 was tested against. libest
 # also wants libcurl + libsafec; we install both via apt rather than
 # building from source for reproducibility.
-RUN apt-get update && apt-get install --no-install-recommends -y \
-        autoconf \
-        automake \
-        build-essential \
-        ca-certificates \
-        git \
-        libcurl4-openssl-dev \
-        libssl-dev \
-        libtool \
-        pkg-config \
+#
+# Hotfix #18 (2026-05-14): wrap in a 3-retry loop with --fix-missing
+# fallback to absorb transient Debian mirror flakes. The original
+# unwrapped apt-get install failed CI run #N on a "Connection reset
+# by peer" mid-fetch of libssh2-1 from fastly's debian.org mirror at
+# 151.101.202.132. Mirrors flake; production-grade Dockerfiles wrap
+# network ops in retry. Same pattern as the main Dockerfile's npm-ci
+# 3-retry loop from Hotfix #9.
+RUN for i in 1 2 3; do \
+        apt-get update && \
+        apt-get install --no-install-recommends -y --fix-missing \
+            autoconf \
+            automake \
+            build-essential \
+            ca-certificates \
+            git \
+            libcurl4-openssl-dev \
+            libssl-dev \
+            libtool \
+            pkg-config \
+        && break; \
+        echo "apt-get install attempt $i/3 failed; sleeping 5s before retry"; \
+        sleep 5; \
+    done \
    && rm -rf /var/lib/apt/lists/*

 WORKDIR /src
@@ -172,13 +186,22 @@ RUN git clone --depth 1 --branch ${LIBEST_REF} https://github.com/cisco/libest.g
 # Pinned to the same digest as the builder above (Bundle A / H-001).
 FROM debian:bullseye-slim@sha256:1a4701c321b1d28b1ff5f0230e766791e4b79b1d4c6c7a70064f4b297b1a330f

-RUN apt-get update && apt-get install --no-install-recommends -y \
-        bash \
-        ca-certificates \
-        curl \
-        libcurl4 \
-        libssl1.1 \
-        openssl \
+# Hotfix #18 (2026-05-14): same 3-retry pattern as the builder stage
+# above. Runtime image installs are also vulnerable to transient
+# mirror flakes.
+RUN for i in 1 2 3; do \
+        apt-get update && \
+        apt-get install --no-install-recommends -y --fix-missing \
+            bash \
+            ca-certificates \
+            curl \
+            libcurl4 \
+            libssl1.1 \
+            openssl \
+        && break; \
+        echo "apt-get install attempt $i/3 failed; sleeping 5s before retry"; \
+        sleep 5; \
+    done \
    && rm -rf /var/lib/apt/lists/* \
    && useradd --create-home --uid 1000 estuser

@@ -1,7 +1,7 @@
 # certctl Load-Test Harness

 Closes the **#8 acquisition-readiness blocker** from the 2026-05-01 issuer
-coverage audit (`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`).
+coverage audit (the 2026-05-01 issuer coverage audit).
 Pre-fix, certctl had zero benchmarks or load tests for any API path; an
 acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day
 rotation" had nothing to point at. This harness is the substantiation.
@@ -352,8 +352,35 @@ the ACME flow scenario. Operators with kind / cert-manager available
 should pair this with `make acme-cert-manager-test` for end-to-end
 verification.

+## Scale tier (Phase 8 SCALE-H2, 2026-05-14)
+
+Phase 8 closure added three new k6 scenarios that exercise the
+scale-relevant load surfaces the API tier and connector tier left
+uncovered:
+
+| Scenario | k6 file | Seed | Make target |
+|---|---|---|---|
+| Bulk-renewal under load | `k6/bulk_renewal.js` | `seed/01_bulk_renewal_certs.sql` (10K certs) | `make loadtest-scale-bulk` |
+| ACME enrollment burst | `k6/acme_burst.js` | (none — unauth surface) | `make loadtest-scale-acme` |
+| Agent heartbeat storm | `k6/agent_storm.js` | `seed/02_agent_fleet.sql` (5K agents) | `make loadtest-scale-agent` |
+
+The scale-tier scenarios live behind the `scale` compose profile so
+the default `make loadtest` (API tier + connector tier, ~7 min)
+stays fast. Run all three serially with `make loadtest-scale`, or
+trigger the `loadtest.yml` workflow's `k6-scale` matrix jobs from
+the Actions tab for canonical-hardware capture.
+
+Operator-facing baseline table + threshold contracts + documented
+limitations live in [`docs/operator/scale.md`](../../../docs/operator/scale.md)
+under the "Scale-tier scenarios (SCALE-H2, Phase 8)" section. Treat
+that as the canonical source — this README only links.
+
+The seed fixtures + their idempotency contract are documented in
+[`seed/README.md`](seed/README.md).
+
 ## Audit references

- API tier:       `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` fix #8.
- Connector tier: `cowork/deployment-target-audit-2026-05-02/RESULTS.md` Bundle 10.
- ACME flows:     Phase 5 master prompt (`cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md`).
+- API tier:       2026-05-01 issuer coverage audit fix #8.
+- Connector tier: 2026-05-02 deployment-target audit Bundle 10.
+- ACME flows:     Phase 5 master prompt (project notes).
+- Scale tier:     2026-05-14 architecture diligence Phase 8 (SCALE-H2).
@@ -53,8 +53,8 @@
 # Usage:  make loadtest  (from the repo root)
 # Manual: cd deploy/test/loadtest && docker compose up --abort-on-container-exit --exit-code-from k6
 #
-# Audit reference (API tier):       cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
-# Audit reference (connector tier): cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
+# Audit reference (API tier):       2026-05-01 issuer coverage audit fix #8.
+# Audit reference (connector tier): 2026-05-02 deployment-target audit Bundle 10.
 # =============================================================================

 services:
@@ -290,7 +290,15 @@ services:
  # /healthz endpoint.
  # ---------------------------------------------------------------------------
  f5-mock-target:
-    build: ../f5-mock-icontrol
+    # Long-form build to match docker-compose.test.yml: the Dockerfile
+    # has `COPY deploy/test/f5-mock-icontrol/ ./` which assumes the
+    # build context is the REPO ROOT. The previous shorthand form
+    # `build: ../f5-mock-icontrol` set the context to the
+    # f5-mock-icontrol directory itself, breaking the COPY at CI build
+    # time (run #25305811340: "deploy/test/f5-mock-icontrol: not found").
+    build:
+      context: ../../..
+      dockerfile: deploy/test/f5-mock-icontrol/Dockerfile
    container_name: certctl-loadtest-f5-mock
    healthcheck:
      test: ["CMD-SHELL", "wget -q -O- http://localhost:8080/healthz || exit 1"]
@@ -343,3 +351,128 @@ services:
      - run
      - --summary-export=/results/summary.json
      - /scripts/k6.js
+
+  # ===========================================================================
+  # Phase 8 SCALE-H2 — scale-tier scenarios (opt-in via `--profile scale`).
+  #
+  # The default `make loadtest` path runs the API tier + connector tier
+  # scenarios above against the demo-scale seed. The Phase 8 scenarios are
+  # heavier (10K cert + 5K agent fixtures) and would slow the default path
+  # without serving the per-PR signal the existing run targets, so they live
+  # behind a separate compose profile.
+  #
+  # Three components, all profile-gated:
+  #   1. scale-seed    — one-shot init that runs ./seed/*.sql against the
+  #                      same postgres the server uses. Idempotent.
+  #   2. k6-scale-bulk / k6-scale-acme / k6-scale-agent — one driver each
+  #                      for the three Phase 8 scenarios. The matrix dispatch
+  #                      in .github/workflows/loadtest.yml picks one per job.
+  #
+  # Run a single scale scenario locally:
+  #   docker compose --profile scale up \
+  #       --abort-on-container-exit --exit-code-from k6-scale-bulk \
+  #       scale-seed k6-scale-bulk
+  # ===========================================================================
+
+  scale-seed:
+    # postgres:16-alpine bundles psql; no extra image needed.
+    image: postgres:16-alpine
+    container_name: certctl-loadtest-scale-seed
+    restart: "no"
+    profiles: ["scale"]
+    depends_on:
+      postgres:
+        condition: service_healthy
+      # Wait for certctl-server to be healthy — the server runs schema
+      # migrations + seed_demo.sql at boot. The Phase 8 seeds reference
+      # FKs (iss-local, o-alice, t-platform, rp-standard) that
+      # seed_demo.sql creates, so the order MUST be:
+      #   postgres up → server runs migrations + seed_demo.sql → scale-seed runs
+      certctl-server:
+        condition: service_healthy
+    environment:
+      PGHOST: postgres
+      PGUSER: certctl
+      PGPASSWORD: loadtestpass
+      PGDATABASE: certctl
+    volumes:
+      - ./seed:/seed:ro
+    entrypoint: /bin/sh
+    command:
+      - -c
+      - |
+        set -eu
+        echo "==> Phase 8 scale-seed: running SQL fixtures (lexical order)"
+        for f in /seed/*.sql; do
+            echo "----> $$f"
+            psql -v ON_ERROR_STOP=1 -f "$$f"
+        done
+        echo "==> Phase 8 scale-seed: complete"
+
+  k6-scale-bulk:
+    image: grafana/k6:0.54.0
+    container_name: certctl-loadtest-k6-bulk
+    profiles: ["scale"]
+    depends_on:
+      certctl-server:
+        condition: service_healthy
+      scale-seed:
+        condition: service_completed_successfully
+    environment:
+      CERTCTL_BASE: https://certctl-server:8443
+      CERTCTL_TOKEN: load-test-token
+      K6_INSECURE_SKIP_TLS_VERIFY: "true"
+    volumes:
+      - ./k6/bulk_renewal.js:/scripts/bulk_renewal.js:ro
+      - ./results:/results
+    command:
+      - run
+      - --summary-export=/results/summary-bulk-renewal.json
+      - /scripts/bulk_renewal.js
+
+  k6-scale-acme:
+    image: grafana/k6:0.54.0
+    container_name: certctl-loadtest-k6-acme
+    profiles: ["scale"]
+    depends_on:
+      certctl-server:
+        condition: service_healthy
+      # ACME scenario doesn't depend on the SQL seeds (it hits the
+      # unauthenticated directory + nonce + ARI surface) but routing
+      # it through the same dependency chain keeps the compose
+      # ordering predictable across the three scale jobs.
+      scale-seed:
+        condition: service_completed_successfully
+    environment:
+      CERTCTL_ACME_DIRECTORY: https://certctl-server:8443/acme/profile/prof-test/directory
+      K6_INSECURE_SKIP_TLS_VERIFY: "true"
+    volumes:
+      - ./k6/acme_burst.js:/scripts/acme_burst.js:ro
+      - ./results:/results
+    command:
+      - run
+      - --summary-export=/results/summary-acme-burst.json
+      - /scripts/acme_burst.js
+
+  k6-scale-agent:
+    image: grafana/k6:0.54.0
+    container_name: certctl-loadtest-k6-agent
+    profiles: ["scale"]
+    depends_on:
+      certctl-server:
+        condition: service_healthy
+      scale-seed:
+        condition: service_completed_successfully
+    environment:
+      CERTCTL_BASE: https://certctl-server:8443
+      CERTCTL_TOKEN: load-test-token
+      K6_INSECURE_SKIP_TLS_VERIFY: "true"
+      # Match the seed's 5K-agent fleet.
+      K6_AGENT_FLEET: "5000"
+    volumes:
+      - ./k6/agent_storm.js:/scripts/agent_storm.js:ro
+      - ./results:/results
+    command:
+      - run
+      - --summary-export=/results/summary-agent-storm.json
+      - /scripts/agent_storm.js
@@ -60,8 +60,8 @@
 // tests are too slow to gate per-PR signal).
 //
 // Audit references:
-//   - API tier:       cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
-//   - Connector tier: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
+//   - API tier:       2026-05-01 issuer coverage audit fix #8.
+//   - Connector tier: 2026-05-02 deployment-target audit Bundle 10.

 import http from 'k6/http';
 import { check } from 'k6';
@@ -0,0 +1,183 @@
+// Phase 8 SCALE-H2 — ACME enrollment burst.
+//
+// What this measures:
+//   200 concurrent VUs hammering the unauthenticated ACME directory
+//   + new-nonce + ARI surface for 5 minutes. The goal is the
+//   throughput ceiling for the entry-point handlers and the
+//   per-account rate-limit response shape Phase 5 added (RFC 8555
+//   §6.7 + RFC 7807 + the certctl-specific
+//   ErrACMEConcurrentOrdersExceeded path).
+//
+// What this does NOT measure (and why):
+//   - JWS-signed POST flows (new-account, new-order, finalize).
+//     k6 doesn't ship JWS, and bundling a Go signing helper into
+//     the k6 container would obscure the server-side latency the
+//     scenario is trying to pin. The existing
+//     `deploy/test/loadtest/k6/acme_flow.js` Phase 5 scenario
+//     made the same explicit trade-off; this Phase 8 burst scenario
+//     reuses the constraint. End-to-end JWS-signed conformance is
+//     gated by `make acme-rfc-conformance-test` (which uses lego
+//     against the same compose stack).
+//   - The actual order/finalize hot path. The newOrder handler's
+//     constant-time SCAN against acme_orders + the per-account
+//     concurrent-orders gate ARE useful to load-test, but require
+//     valid JWS to reach. The directory + new-nonce surface this
+//     scenario hits is what every ACME client transits BEFORE the
+//     signed flow — measuring it pins the server's headroom for
+//     the rest of the flow.
+//   - Issuer-side enrollment latency (DigiCert ACME, Let's Encrypt
+//     against a real prod CA, etc.). Same "load-testing someone
+//     else's API" carve-out as the API tier.
+//
+// What this DOES measure:
+//   - GET /acme/profile/{id}/directory throughput. Sustained 200
+//     concurrent VUs at a low per-VU sleep produces ~600-1000 req/s
+//     against this endpoint, well above what any production ACME
+//     client would generate but the right shape for finding the
+//     ceiling.
+//   - HEAD /acme/profile/{id}/new-nonce throughput. Nonce
+//     allocation is a hot path that writes one row to acme_nonces.
+//   - GET /acme/profile/{id}/renewal-info/{cert-id} 4xx fast path.
+//     Synthetic cert-id → handler returns 4xx without a DB lookup
+//     (cert-id is malformed at the parse layer). Measures the
+//     handler-front overhead under load.
+//   - 429 rate-limit response shape. The Phase 5 ACME per-account
+//     rate limit fires at sustained spike rates; the scenario pins
+//     that the 429 body is RFC 7807 with the
+//     "urn:ietf:params:acme:error:rateLimited" type. A regression
+//     that returned a plain text 429 or a different problem type
+//     would break ACME clients hard.
+//
+// Threshold contract:
+//   - directory p95 < 500ms, new-nonce p95 < 300ms, renewal-info
+//     p95 < 800ms — same as the Phase 5 acme_flow.js baselines.
+//   - 429 responses are EXPECTED at sustained 200 VU rate (the
+//     server's RFC-compliant rate limiter SHOULD kick in). The
+//     http_req_failed metric is tagged separately so 429s don't
+//     break the threshold; a separate `rate_limited` Counter
+//     tracks them so the operator can see how often the limiter
+//     fires.
+
+import http from 'k6/http';
+import { check } from 'k6';
+import { Counter, Trend } from 'k6/metrics';
+import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.2/index.js';
+
+const ACME_BASE = __ENV.CERTCTL_ACME_DIRECTORY ||
+    'https://certctl-server:8443/acme/profile/prof-test/directory';
+
+// Custom metrics.
+const directoryDuration = new Trend('acme_directory_duration', true);
+const newNonceDuration  = new Trend('acme_new_nonce_duration', true);
+const renewalInfoDuration = new Trend('acme_renewal_info_duration', true);
+const rateLimitedCount  = new Counter('acme_rate_limited_count');
+const rateLimitShapeOK  = new Counter('acme_rate_limit_shape_ok');
+
+export const options = {
+    scenarios: {
+        acme_burst: {
+            executor: 'constant-vus',
+            vus: parseInt(__ENV.K6_ACME_VUS || '200', 10),
+            duration: __ENV.K6_ACME_DURATION || '5m',
+            gracefulStop: '30s',
+            tags: { scenario: 'acme_burst' },
+        },
+    },
+    thresholds: {
+        'acme_directory_duration':    ['p(95)<500'],
+        'acme_new_nonce_duration':    ['p(95)<300'],
+        'acme_renewal_info_duration': ['p(95)<800'],
+        // 4xx (rate-limited or malformed-cert-id) is expected; 5xx is
+        // not. Filter to status >= 500 for the failure floor.
+        'http_req_failed{scenario:acme_burst,server_error:true}': ['rate<0.001'],
+    },
+    insecureSkipTLSVerify: true,
+    summaryTrendStats: ['avg', 'min', 'med', 'p(95)', 'p(99)', 'max'],
+};
+
+export default function () {
+    // Step 1 — directory.
+    let res = http.get(ACME_BASE, {
+        tags: { scenario: 'acme_burst', step: 'directory' },
+    });
+    directoryDuration.add(res.timings.duration);
+    check(res, { 'directory 200': (r) => r.status === 200 });
+
+    if (res.status === 429) {
+        recordRateLimit(res);
+        return; // backoff this VU iteration
+    }
+    if (res.status !== 200) return;
+
+    const dir = res.json();
+
+    // Step 2 — new-nonce.
+    if (dir.newNonce) {
+        res = http.head(dir.newNonce, {
+            tags: { scenario: 'acme_burst', step: 'new_nonce' },
+        });
+        newNonceDuration.add(res.timings.duration);
+        if (res.status === 429) {
+            recordRateLimit(res);
+            return;
+        }
+        check(res, {
+            'new-nonce 200': (r) => r.status === 200,
+            'replay-nonce header present': (r) => !!r.headers['Replay-Nonce'],
+        });
+    }
+
+    // Step 3 — ARI synthetic 4xx fast path. Phase 4 added ARI
+    // (RFC 9773); this exercises the malformed-cert-id branch which
+    // returns a 4xx without a DB lookup. Pinning this here means a
+    // regression that turned the malformed path into a DB query
+    // would surface as a p95 spike.
+    if (dir.renewalInfo) {
+        res = http.get(dir.renewalInfo + '/aaaa.bbbb', {
+            tags: { scenario: 'acme_burst', step: 'renewal_info' },
+        });
+        renewalInfoDuration.add(res.timings.duration);
+        if (res.status === 429) {
+            recordRateLimit(res);
+            return;
+        }
+        check(res, {
+            'renewal-info 4xx for synthetic cert-id':
+                (r) => r.status === 400 || r.status === 404,
+        });
+    }
+}
+
+// recordRateLimit pins the Phase 5 ACME rate-limit response shape:
+//   - HTTP 429
+//   - Content-Type: application/problem+json
+//   - Body: {"type":"urn:ietf:params:acme:error:rateLimited", ...}
+// A regression that returned 503 or a plain-text 429 or a different
+// problem type would NOT increment acme_rate_limit_shape_ok and the
+// operator would see (rate_limited_count - shape_ok_count) > 0 in
+// the summary.
+function recordRateLimit(res) {
+    rateLimitedCount.add(1);
+    const ct = res.headers['Content-Type'] || '';
+    if (!ct.includes('application/problem+json')) {
+        return;
+    }
+    let body;
+    try {
+        body = res.json();
+    } catch (e) {
+        return;
+    }
+    if (body && typeof body.type === 'string' &&
+        body.type.startsWith('urn:ietf:params:acme:error:rateLimited')) {
+        rateLimitShapeOK.add(1);
+    }
+}
+
+export function handleSummary(data) {
+    return {
+        '/results/summary-acme-burst.json': JSON.stringify(data, null, 2),
+        '/results/summary-acme-burst.txt': textSummary(data, { indent: ' ', enableColors: false }),
+        stdout: textSummary(data, { indent: ' ', enableColors: true }),
+    };
+}
@@ -0,0 +1,126 @@
+// Phase 8 SCALE-H2 — agent fleet heartbeat storm.
+//
+// What this measures:
+//   5,000 agents heartbeating at 30s intervals = ~167 heartbeats/sec
+//   sustained. Each heartbeat is POST /api/v1/agents/{id}/heartbeat
+//   with optional metadata. Pre-seeded fleet provided by
+//   deploy/test/loadtest/seed/02_agent_fleet.sql.
+//
+// What this does NOT measure:
+//   - The agent work-poll path (GET /api/v1/agents/{id}/work). The
+//     heartbeat hot path is the highest-frequency call on a typical
+//     fleet (work-poll cadence is 30s default like heartbeat, but
+//     work-poll returns the empty set 99% of the time and is cheap;
+//     heartbeat does an UPDATE on every call). v2 of the harness
+//     could combine them.
+//   - The agent CSR-submit path (POST /api/v1/agents/{id}/csr). That
+//     fires on per-cert issuance, not per heartbeat, and is exercised
+//     by the existing API tier's POST /api/v1/certificates scenario.
+//   - Auth-key per-agent rotation. The loadtest stack runs with a
+//     single api-key (`load-test-token`); per-agent api-key
+//     hashing/rotation isn't a load axis.
+//
+// Why constant-arrival-rate (not constant-vus):
+//   The point is to model what 5K real agents would offer the server
+//   at their native cadence. 5K agents * (1 heartbeat / 30s) =
+//   166.67 req/s offered. constant-arrival-rate fires at exactly
+//   that rate regardless of latency; if the server backpressures,
+//   queue builds and p99 shows it. constant-vus would let slow
+//   responses block, masking the actual ceiling.
+//
+// Threshold contract:
+//   - p99 < 1s for the heartbeat POST. The handler does an UPDATE on
+//     agents.last_heartbeat_at (+ optional metadata columns) and an
+//     RBAC check. Even at 200 req/s a tight UPDATE on an indexed
+//     primary key should stay sub-second.
+//   - p95 < 500ms.
+//   - Error rate < 0.1%. The seeded agents are all status='Online'
+//     so no 410 Gone (retired-agent) responses; anything 4xx is a
+//     bug. 5xx is a server health regression.
+//
+// Phase 8 reference:
+//   - Source finding: SCALE-H2.
+//   - Pre-state: heartbeat path not load-tested. The 100-agent demo
+//     seed in seed_demo.sql produces ~3 heartbeats/sec, orders of
+//     magnitude below fleet scale.
+
+import http from 'k6/http';
+import { check } from 'k6';
+import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.2/index.js';
+
+const BASE  = __ENV.CERTCTL_BASE  || 'https://certctl-server:8443';
+const TOKEN = __ENV.CERTCTL_TOKEN || 'load-test-token';
+
+// 5000 agents * (1 / 30s) = 166.67 heartbeats/sec. Round to 167.
+const TARGET_RATE = parseInt(__ENV.K6_AGENT_RATE || '167', 10);
+
+// Total agents in the fleet seed. The k6 scenario picks an agent at
+// random per iteration (deterministic via __ITER) to spread the
+// per-row UPDATE pressure across the table.
+const FLEET_SIZE = parseInt(__ENV.K6_AGENT_FLEET || '5000', 10);
+
+export const options = {
+    scenarios: {
+        agent_storm: {
+            executor: 'constant-arrival-rate',
+            rate: TARGET_RATE,
+            timeUnit: '1s',
+            duration: '5m',
+            preAllocatedVUs: 50,
+            maxVUs: 200,
+            exec: 'heartbeat',
+            tags: { scenario: 'agent_storm' },
+        },
+    },
+    thresholds: {
+        'http_req_duration{scenario:agent_storm}': ['p(99)<1000', 'p(95)<500'],
+        'http_req_failed{scenario:agent_storm}': ['rate<0.001'],
+    },
+    summaryTrendStats: ['avg', 'min', 'med', 'p(95)', 'p(99)', 'max'],
+    insecureSkipTLSVerify: true,
+};
+
+// agentID returns a deterministic agent id from the loadtest fleet
+// seed. Spreading round-robin across the fleet means the UPDATE
+// pressure hits every row equally rather than the same hot row over
+// and over.
+function agentID() {
+    // __ITER is k6's per-VU iteration counter; combined with __VU
+    // (the VU index) we get a unique-per-call number that spans
+    // 0..FLEET_SIZE on the modulo.
+    const idx = (__VU * 1000 + __ITER) % FLEET_SIZE;
+    return 'ag-loadtest-' + String(idx + 1).padStart(5, '0');
+}
+
+export function heartbeat() {
+    const id = agentID();
+    // Optional metadata; the heartbeat handler tolerates an empty body
+    // (no metadata) but real agents send their version + hostname on
+    // every call so we include them here.
+    const payload = JSON.stringify({
+        version: '2.1.0',
+        hostname: 'loadtest-' + id.slice(-5) + '.fleet.example.test',
+        os: 'linux',
+        architecture: 'amd64',
+    });
+
+    const res = http.post(`${BASE}/api/v1/agents/${id}/heartbeat`, payload, {
+        headers: {
+            'Content-Type': 'application/json',
+            'Authorization': `Bearer ${TOKEN}`,
+        },
+        tags: { scenario: 'agent_storm' },
+    });
+
+    check(res, {
+        'heartbeat 2xx': (r) => r.status >= 200 && r.status < 300,
+    });
+}
+
+export function handleSummary(data) {
+    return {
+        '/results/summary-agent-storm.json': JSON.stringify(data, null, 2),
+        '/results/summary-agent-storm.txt': textSummary(data, { indent: ' ', enableColors: false }),
+        stdout: textSummary(data, { indent: ' ', enableColors: true }),
+    };
+}
@@ -0,0 +1,129 @@
+// Phase 8 SCALE-H2 — bulk-renewal under load.
+//
+// What this measures:
+//   POST /api/v1/certificates/bulk-renew throughput against a
+//   10K-cert pre-seeded fleet. Each iteration POSTs a criteria-mode
+//   bulk-renew request scoped to a subset of the seeded fleet (by
+//   tag) so the server enqueues N renewal jobs and returns a
+//   per-cert {certificate_id, job_id} envelope.
+//
+// Why criteria-mode (not certificate-ids mode):
+//   The seeded fleet has a stable `tags.batch = 'bulk-renewal'`
+//   marker. Criteria-mode lets the scenario re-fire without
+//   maintaining a moving list of cert IDs and still scopes the
+//   action to the Phase 8 fixture (no risk of touching a real
+//   tenant's certs if someone runs the scenario against a non-
+//   loadtest server by mistake — the criteria simply matches
+//   nothing).
+//
+// What this does NOT measure:
+//   - The scheduler's renewal scan itself. The bulk-renew handler
+//     enqueues issuance jobs synchronously into the `jobs` table;
+//     the scheduler's `jobProcessorLoop` picks them up on its next
+//     tick. The DB write throughput is what's measured here; the
+//     job-execution path is bounded by per-issuer concurrency
+//     (CERTCTL_RENEWAL_CONCURRENCY=25 default) and isn't usefully
+//     amplified by adding more inbound bulk-renew calls.
+//   - Full POST → poll deployments → cert-served loop. Same v1/v2
+//     deferral as the connector-tier scenarios — needs the agent
+//     poll surface plumbed end-to-end.
+//
+// Threshold contract:
+//   - p99 < 5s, p95 < 2s for the bulk-renew POST. Each call walks
+//     the criteria, materializes the matching managed_certificates
+//     rows, inserts N rows into `jobs`, and returns the envelope.
+//   - Error rate < 1%. Anything 4xx/5xx counts.
+//
+// Phase 8 reference:
+//   - Source finding: SCALE-H2.
+//   - Pre-state: only the API tier (50 req/s POST /certificates +
+//     GET /certificates) and connector tier (per-target handshake)
+//     were measured. The bulk-renew hot path was uncovered.
+//   - Seed: deploy/test/loadtest/seed/01_bulk_renewal_certs.sql
+//     creates 10K rows with tags.batch='bulk-renewal'. The seed
+//     must run before this scenario; the scale-seed compose
+//     profile gates this.
+
+import http from 'k6/http';
+import { check } from 'k6';
+import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.2/index.js';
+
+const BASE  = __ENV.CERTCTL_BASE  || 'https://localhost:8443';
+const TOKEN = __ENV.CERTCTL_TOKEN || 'load-test-token';
+
+// Sustained throughput target. constant-arrival-rate at 5 req/s for 5
+// minutes = 1500 bulk-renew POSTs. Each POST touches up to 10K
+// managed_certificates rows (criteria scan) + inserts up to 10K
+// rows into `jobs`, so the offered load is higher than the API
+// tier's 50 req/s on raw queries-per-second but the per-call
+// cost is larger.
+//
+// 5 req/s was picked deliberately:
+//   - 50 req/s combined with the API tier's 50 saturates the demo-
+//     scale compose's DB pool (CERTCTL_DATABASE_MAX_CONNS=50). The
+//     Phase 8 scenario should measure the per-call ceiling without
+//     fighting the pool.
+//   - Each call enqueues thousands of jobs; the scheduler's
+//     jobProcessorLoop has finite per-tick budget. Pushing higher
+//     than 5 req/s would queue work faster than the scheduler
+//     drains it, which produces a transient backlog metric (worth
+//     measuring eventually) but isn't what SCALE-H2 asks for.
+export const options = {
+    scenarios: {
+        bulk_renewal: {
+            executor: 'constant-arrival-rate',
+            rate: 5,
+            timeUnit: '1s',
+            duration: '5m',
+            preAllocatedVUs: 10,
+            maxVUs: 30,
+            exec: 'bulkRenewal',
+            tags: { scenario: 'bulk_renewal' },
+        },
+    },
+    thresholds: {
+        // Single-scenario threshold — narrower than the API tier
+        // because each call is heavier (DB scan + N inserts).
+        'http_req_duration{scenario:bulk_renewal}': ['p(99)<5000', 'p(95)<2000'],
+        'http_req_failed{scenario:bulk_renewal}': ['rate<0.01'],
+    },
+    summaryTrendStats: ['avg', 'min', 'med', 'p(95)', 'p(99)', 'max'],
+    insecureSkipTLSVerify: true,
+};
+
+export function bulkRenewal() {
+    // Scope by team_id — the seed binds every loadtest cert to
+    // t-platform; in a production-multi-tenant deploy, team scoping
+    // is the typical bulk-renew shape. This exercises the criteria
+    // walker AND the team-scoped permission check in the handler.
+    //
+    // NOTE: this does NOT include `tags` because the BulkRenewalCriteria
+    // domain type (handler/bulk_renewal.go) only exposes profile_id,
+    // owner_id, agent_id, issuer_id, team_id, certificate_ids — not
+    // tag-based filtering. The team_id scope plus the production-
+    // separated FK guarantees we only touch the Phase 8 seed.
+    const payload = JSON.stringify({
+        team_id: 't-platform',
+        issuer_id: 'iss-local',
+    });
+
+    const res = http.post(`${BASE}/api/v1/certificates/bulk-renew`, payload, {
+        headers: {
+            'Content-Type': 'application/json',
+            'Authorization': `Bearer ${TOKEN}`,
+        },
+        tags: { scenario: 'bulk_renewal' },
+    });
+
+    check(res, {
+        'bulk-renew 2xx': (r) => r.status >= 200 && r.status < 300,
+    });
+}
+
+export function handleSummary(data) {
+    return {
+        '/results/summary-bulk-renewal.json': JSON.stringify(data, null, 2),
+        '/results/summary-bulk-renewal.txt': textSummary(data, { indent: ' ', enableColors: false }),
+        stdout: textSummary(data, { indent: ' ', enableColors: true }),
+    };
+}
@@ -0,0 +1,85 @@
+-- Phase 8 SCALE-H2: bulk-renewal scenario seed.
+--
+-- Generates 10,000 managed_certificates rows linked to the existing
+-- seed_demo.sql FKs (iss-local, o-alice, t-platform, rp-standard) so
+-- the bulk-renewal k6 scenario can POST /api/v1/certificates/bulk-renew
+-- against a fleet-scale dataset instead of the 15-row demo seed.
+--
+-- Behavior:
+--   - Idempotent. ON CONFLICT (name) DO NOTHING — re-running the seed
+--     against an already-seeded DB is a no-op.
+--   - expires_at is uniformly distributed across the next 30 days so
+--     a renewal_window_days = 30 policy considers every row eligible.
+--   - status = 'active' so the renewal selector treats them as
+--     live (the scheduler skips status IN ('pending', 'failed',
+--     'revoked', 'retired')).
+--   - name is generated as 'loadtest-bulk-NNNNN.example.test' for a
+--     stable, predictable identifier the k6 scenario can pattern-match
+--     to scope its criteria to the seeded set (the production fleet
+--     wouldn't share this prefix).
+--
+-- Volume target: 10,000 rows. Insert wall time on the loadtest stack
+-- (postgres:16-alpine, 2 CPU / 4 GiB): typically < 5 seconds via the
+-- single-statement generate_series + INSERT pattern below. The
+-- compose seed-init container runs this BEFORE the k6 driver starts,
+-- so the steady-state load measurement isn't affected by seed time.
+--
+-- Why not generated in Go via a fixtures helper:
+--   - The certctl-server boots from a clean DB and runs migrations +
+--     seed_demo.sql automatically when CERTCTL_DEMO_SEED=true. Adding
+--     a Go-side fixtures helper would require either (a) a new
+--     CERTCTL_LOADTEST_SEED flag wired into cmd/server/main.go (cross-
+--     cutting change for one test path) or (b) a separate seed binary
+--     (more compose surface). Raw SQL is the smallest viable change.
+--
+-- Phase 8 entry point — runs only when the loadtest compose stack is
+-- explicitly opted into the scale-seed via LOADTEST_SCALE_SEED=true.
+
+INSERT INTO managed_certificates (
+    id,
+    name,
+    common_name,
+    sans,
+    environment,
+    owner_id,
+    team_id,
+    issuer_id,
+    renewal_policy_id,
+    status,
+    expires_at,
+    tags,
+    created_at,
+    updated_at
+)
+SELECT
+    'cert-loadtest-bulk-' || lpad(g::text, 5, '0'),
+    'loadtest-bulk-' || lpad(g::text, 5, '0') || '.example.test',
+    'loadtest-bulk-' || lpad(g::text, 5, '0') || '.example.test',
+    ARRAY['loadtest-bulk-' || lpad(g::text, 5, '0') || '.example.test'],
+    'loadtest',
+    'o-alice',
+    't-platform',
+    'iss-local',
+    'rp-standard',
+    'active',
+    -- Distribute expires_at uniformly across the next 30 days so a
+    -- 30-day-window renewal policy sees every row as eligible.
+    NOW() + ((g % 30) || ' days')::interval + ((g % 24) || ' hours')::interval,
+    jsonb_build_object('source', 'loadtest-phase8', 'batch', 'bulk-renewal'),
+    NOW(),
+    NOW()
+FROM generate_series(1, 10000) AS g
+ON CONFLICT (name) DO NOTHING;
+
+-- Confirmation row count — the seed-init container greps this in its
+-- logs to verify the fleet shape post-insert. The output appears in
+-- `docker compose logs certctl-loadtest-scale-seed` after the run.
+DO $$
+DECLARE
+    cert_count integer;
+BEGIN
+    SELECT COUNT(*) INTO cert_count
+    FROM managed_certificates
+    WHERE name LIKE 'loadtest-bulk-%';
+    RAISE NOTICE 'Phase 8 bulk-renewal seed: % managed_certificates rows present', cert_count;
+END $$;
@@ -0,0 +1,85 @@
+-- Phase 8 SCALE-H2: agent-fleet heartbeat-storm scenario seed.
+--
+-- Generates 5,000 agents rows so the heartbeat-storm k6 scenario can
+-- model a fleet-scale heartbeat pattern (5K agents heartbeating at the
+-- native 30s cadence = ~167 heartbeats/sec sustained) instead of the
+-- ~10-agent demo seed.
+--
+-- Behavior:
+--   - Idempotent. ON CONFLICT (id) DO NOTHING — re-runnable against an
+--     already-seeded DB.
+--   - name is unique (a UNIQUE constraint in migration 000001) so the
+--     name suffix mirrors the id suffix.
+--   - status = 'Online' so the heartbeat handler's retire-check
+--     (service.ErrAgentRetired) doesn't 410 the storm.
+--   - last_heartbeat_at staggered across the prior 60 seconds so the
+--     stale-agent reaper (agentHealthCheckLoop) doesn't immediately
+--     flip half the fleet to 'Offline' during the first scheduler
+--     tick of the load run.
+--   - api_key_hash = 'loadtest_no_auth'. The loadtest compose runs
+--     CERTCTL_AUTH_TYPE=api-key with a single static token
+--     (load-test-token), which bypasses per-agent key check the same
+--     way the existing API tier scenarios do. Production deploys with
+--     CERTCTL_AUTH_TYPE=agent-key per-agent would seed real bcrypt'd
+--     hashes; this column is opaque to the load-test path.
+--   - registered_at = NOW() - random 1-90 day interval so agent age
+--     looks realistic and any age-based query plans are exercised.
+--
+-- Volume target: 5,000 rows. The agents schema is much narrower than
+-- managed_certificates so the insert is sub-second on the loadtest
+-- stack. The 5K agents do not own any deployment_targets in this
+-- fixture (the scenario only measures the heartbeat hot path, not
+-- the work-poll path which depends on cert + target wiring).
+--
+-- Phase 8 entry point — runs only when the loadtest compose stack is
+-- explicitly opted into the scale-seed via LOADTEST_SCALE_SEED=true.
+
+INSERT INTO agents (
+    id,
+    name,
+    hostname,
+    status,
+    last_heartbeat_at,
+    registered_at,
+    api_key_hash,
+    os,
+    architecture,
+    ip_address,
+    version
+)
+SELECT
+    'ag-loadtest-' || lpad(g::text, 5, '0'),
+    'loadtest-agent-' || lpad(g::text, 5, '0'),
+    'loadtest-' || lpad(g::text, 5, '0') || '.fleet.example.test',
+    'Online',
+    -- Stagger last_heartbeat_at across the prior 60 seconds (= 2x the
+    -- agent's native poll interval) so the first wave of incoming
+    -- heartbeats doesn't all arrive in lockstep at t=0.
+    NOW() - ((g % 60) || ' seconds')::interval,
+    -- Registered_at randomized 1-90 days back.
+    NOW() - ((g % 90 + 1) || ' days')::interval,
+    'loadtest_no_auth',
+    -- Mix linux/windows/darwin so the OS distribution column in the
+    -- agents page isn't pure-linux during the storm.
+    CASE (g % 10)
+        WHEN 0 THEN 'windows'
+        WHEN 1 THEN 'darwin'
+        ELSE 'linux'
+    END,
+    -- amd64 dominates; arm64 minority.
+    CASE WHEN (g % 5) = 0 THEN 'arm64' ELSE 'amd64' END,
+    -- IPv4 in the 10.42.0.0/16 fleet range, deterministic per id.
+    '10.42.' || ((g / 256) % 256)::text || '.' || (g % 256)::text,
+    '2.1.0'
+FROM generate_series(1, 5000) AS g
+ON CONFLICT (id) DO NOTHING;
+
+DO $$
+DECLARE
+    agent_count integer;
+BEGIN
+    SELECT COUNT(*) INTO agent_count
+    FROM agents
+    WHERE id LIKE 'ag-loadtest-%';
+    RAISE NOTICE 'Phase 8 agent-storm seed: % agents rows present', agent_count;
+END $$;
@@ -0,0 +1,87 @@
+# Phase 8 load-test seed fixtures
+
+Opt-in seed scripts that grow the loadtest DB from the demo-scale
+fixture (~15 certs / ~10 agents from `migrations/seed_demo.sql`) to
+fleet scale (10K certs + 5K agents) so the Phase 8 SCALE-H2 scenarios
+measure something representative.
+
+## When these run
+
+The default `make loadtest` path does NOT touch this directory — the
+API tier and connector tier scenarios run against the demo seed alone
+and complete in ~5 minutes. The Phase 8 scenarios opt-in via the
+`LOADTEST_SCALE_SEED=true` environment variable; when set, the
+`certctl-loadtest-scale-seed` one-shot init container runs every
+`*.sql` file in this directory in lexical order against the same
+Postgres instance the server uses.
+
+Compose service wiring (see `../docker-compose.yml`):
+- Service: `scale-seed`
+- Profile: `scale-seed` (compose `profiles:` gate; not started by
+  default)
+- Depends on: `postgres` (service_healthy) AND `certctl-server`
+  (service_healthy — server runs schema migrations at boot so the
+  seed runs AFTER tables exist)
+- Order: lexical (`01_bulk_renewal_certs.sql` then
+  `02_agent_fleet.sql`)
+- Idempotent: every script uses `ON CONFLICT DO NOTHING` so re-running
+  is a no-op.
+
+## What gets seeded
+
+| File | Rows | Purpose |
+|---|---|---|
+| `01_bulk_renewal_certs.sql` | 10,000 managed_certificates | Fleet shape for `bulk_renewal.js`. All linked to demo FKs (iss-local, o-alice, t-platform, rp-standard). Status `active`, expires_at distributed across the next 30 days so a 30-day renewal window considers every row eligible. Name prefix `loadtest-bulk-` so the k6 scenario can scope its bulk-renew criteria. |
+| `02_agent_fleet.sql` | 5,000 agents | Fleet shape for `agent_storm.js`. Status `Online`, last_heartbeat_at staggered across prior 60s, name prefix `loadtest-agent-`. OS distribution: 80% linux / 10% windows / 10% darwin. Arch: 80% amd64 / 20% arm64. |
+
+## How to run the Phase 8 scenarios locally
+
+```bash
+cd deploy/test/loadtest
+LOADTEST_SCALE_SEED=true docker compose --profile scale-seed up --build \
+    --abort-on-container-exit --exit-code-from k6-scale
+```
+
+Or via the dedicated Makefile target (preferred for CI parity):
+
+```bash
+make loadtest-scale
+```
+
+## Why SQL fixtures instead of a Go seed binary
+
+- The certctl-server already boots from a clean DB and runs migrations
+  + `seed_demo.sql` when `CERTCTL_DEMO_SEED=true`. Adding a third seed
+  mode (loadtest-scale) would mean either a new
+  `CERTCTL_LOADTEST_SEED` flag wired into `cmd/server/main.go` (cross-
+  cutting change for one test path) or a separate seed binary (more
+  compose surface).
+- Raw SQL is the smallest viable change: each script is a single
+  multi-row `INSERT … SELECT FROM generate_series(…)` plus a
+  `DO $$ … RAISE NOTICE` confirmation block.
+- Idempotency is straightforward via `ON CONFLICT … DO NOTHING` — the
+  same pattern `seed_demo.sql` uses.
+
+## Why these volumes specifically
+
+- **10K certs.** The SCALE-H2 audit asked for "10K certs with
+  renewal_at < now." Round number, fits in postgres:16-alpine on a
+  CI runner without OOM, and large enough that the renewal selector's
+  query plan is exercised (the demo's 15 rows would index-scan
+  trivially).
+- **5K agents.** Heartbeat at 30s cadence = ~167 heartbeats/sec
+  sustained. That's well above the 50 req/s the existing API tier
+  measures and stresses the agent.heartbeat handler's per-call cost
+  (last_heartbeat_at UPDATE + the RBAC permission check + the
+  audit-log row).
+
+If a future scenario needs more rows (50K certs / 10K agents), add a
+new `03_…sql` here and another scenario file. Don't grow the existing
+files — re-running existing scenarios against a different fixture
+shape would invalidate the captured baseline.
+
+## Phase 8 audit reference
+
+Source finding: SCALE-H2 in
+`cowork/certctl-architecture-diligence-audit.html`.
+Phase 8 closure commit: see `git log --grep='Phase 8'`.
@@ -0,0 +1,140 @@
+# certctl Documentation
+
+> Last reviewed: 2026-05-12
+
+The full docs index, organized by audience. Pick the section that matches what you need to do; each link below opens a focused doc rather than a wall of text.
+
+For the elevator pitch and quickstart commands, see the repo `README.md` at the root. For the marketing site, see [certctl.io](https://certctl.io).
+
+---
+
+## Getting Started
+
+You're new to certctl, just cloned the repo, or want to understand what it does before installing.
+
+| Doc | What it covers |
+|---|---|
+| [Concepts](getting-started/concepts.md) | TLS certificates explained for beginners — CAs, ACME, EST, private keys, the full glossary |
+| [Quickstart](getting-started/quickstart.md) | Five-minute setup with Docker Compose, dashboard tour, API tour |
+| [Examples](getting-started/examples.md) | Five turnkey scenarios — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer |
+| [Advanced demo](getting-started/advanced-demo.md) | End-to-end certificate lifecycle with technical depth at each step |
+| [Why certctl](getting-started/why-certctl.md) | Positioning vs ACME clients, agent-based SaaS, enterprise platforms; when to look elsewhere |
+
+## Reference
+
+You're operating certctl in production or building integrations and need authoritative technical detail.
+
+| Doc | What it covers |
+|---|---|
+| [Architecture](reference/architecture.md) | System design, data flow, security model, deployment topologies |
+| [Profiles](reference/profiles.md) | CertificateProfile policy object — issuer wiring, EKUs, RequiresApproval gate (with profile-edit closure) |
+| [API](reference/api.md) | OpenAPI 3.1 spec, integration patterns, client SDK generation |
+| [CLI](reference/cli.md) | certctl-cli command reference and CI/CD integration patterns |
+| [Configuration](reference/configuration.md) | `CERTCTL_*` environment variable reference (scheduler, rate limits, deploy verify, audit, agent) |
+| [MCP server](reference/mcp.md) | Model Context Protocol integration for AI assistants |
+| [Release verification](reference/release-verification.md) | Cosign / SLSA / SBOM verification procedure |
+| [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md) | Multi-level CA tree management — RFC 5280 §3.2/§4.2.1.9/§4.2.1.10 enforcement |
+| [Auth standards implemented](reference/auth-standards-implemented.md) | RFC + CWE evidence for the API-key + RBAC + OIDC + sessions + break-glass surface (NOT a compliance-mapping doc) |
+| [Deployment model](reference/deployment-model.md) | Atomic write, post-deploy verify, rollback semantics across all targets |
+| [Vendor matrix](reference/vendor-matrix.md) | Tested vendor versions per target connector |
+
+### Connectors
+
+The [connector index](reference/connectors/index.md) is the canonical catalog (interfaces, registry, scanners, plus an inline reference per built-in). Per-connector deep-dive siblings cover operator-grade material — vendor edges, troubleshooting, rotation playbooks, when-to-use vs alternatives.
+
+**Issuers** (13 deep-dives): [ACME](reference/connectors/acme.md) · [ADCS](reference/connectors/adcs.md) · [AWS ACM Private CA](reference/connectors/aws-acm-pca.md) · [DigiCert](reference/connectors/digicert.md) · [EJBCA / Keyfactor](reference/connectors/ejbca.md) · [Entrust](reference/connectors/entrust.md) · [GlobalSign Atlas HVCA](reference/connectors/globalsign.md) · [Google CAS](reference/connectors/google-cas.md) · [Local CA](reference/connectors/local-ca.md) · [OpenSSL / Custom CA](reference/connectors/openssl.md) · [Sectigo SCM](reference/connectors/sectigo.md) · [step-ca / Smallstep](reference/connectors/step-ca.md) · [Vault PKI](reference/connectors/vault.md)
+
+**Targets** (15 deep-dives): [Apache](reference/connectors/apache.md) · [AWS Certificate Manager](reference/connectors/aws-acm.md) · [Azure Key Vault](reference/connectors/azure-kv.md) · [Caddy](reference/connectors/caddy.md) · [Envoy](reference/connectors/envoy.md) · [F5 BIG-IP](reference/connectors/f5.md) · [HAProxy](reference/connectors/haproxy.md) · [IIS](reference/connectors/iis.md) · [Java Keystore](reference/connectors/jks.md) · [Kubernetes Secrets](reference/connectors/k8s.md) · [NGINX](reference/connectors/nginx.md) · [Postfix / Dovecot](reference/connectors/postfix.md) · [SSH (agentless)](reference/connectors/ssh.md) · [Traefik](reference/connectors/traefik.md) · [Windows Certificate Store](reference/connectors/wincertstore.md)
+
+### Protocols
+
+| Doc | What it covers |
+|---|---|
+| [ACME server](reference/protocols/acme-server.md) | Run certctl as an RFC 8555 + RFC 9773 ARI ACME server |
+| [ACME server threat model](reference/protocols/acme-server-threat-model.md) | Security posture for the ACME server endpoint |
+| [SCEP server](reference/protocols/scep-server.md) | RFC 8894 native SCEP server — RA cert config, multi-profile dispatch, must-staple, mTLS sibling route |
+| [SCEP for Microsoft Intune](reference/protocols/scep-intune.md) | Intune-specific deployment guide — NDES replacement playbook |
+| [EST server](reference/protocols/est.md) | RFC 7030 EST server — 802.1X / Wi-Fi enrollment, IoT bootstrap, channel binding |
+| [CRL & OCSP](reference/protocols/crl-ocsp.md) | RFC 5280 CRL + RFC 6960 OCSP responder for relying parties |
+| [Async CA polling](reference/protocols/async-ca-polling.md) | Bounded polling for async-CA issuer connectors |
+
+## Operator
+
+You're running certctl in production and need operational guidance.
+
+| Doc | What it covers |
+|---|---|
+| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation, RBAC + OIDC + sessions + break-glass, bootstrap |
+| [Secret custody](operator/secret-custody.md) | Where private keys live; FileDriver vs HSM/KMS; encryption wire format; env-seeded vs DB-seeded plaintext policy |
+| [Observability](operator/observability.md) | Metrics surface, Prometheus exposition vs client_golang, tracing scope, log structure, rate-limit semantics across restarts/replicas |
+| [RBAC operator reference](operator/rbac.md) | Roles, permissions, scopes, scope-down + day-0 bootstrap |
+| [Auth threat model](operator/auth-threat-model.md) | API-key + RBAC + OIDC + sessions + break-glass — token forgery, session hijacking, IdP compromise, role-grant abuse, bootstrap-token leak, audit-mutation |
+| [OIDC / SSO runbooks](operator/oidc-runbooks/index.md) | Per-IdP setup guides — Keycloak, Authentik, Okta, Auth0, Entra ID, Google Workspace |
+| [Control plane TLS](operator/tls.md) | Self-signed bootstrap, operator-supplied Secret, cert-manager Certificate CR |
+| [Database TLS](operator/database-tls.md) | PostgreSQL transport encryption |
+| [Approval workflow](operator/approval-workflow.md) | Two-person integrity gate for high-stakes issuance + profile-edit closure |
+| [Helm deployment](operator/helm-deployment.md) | Kubernetes installation via the bundled chart |
+| [Performance baselines](operator/performance-baselines.md) | Operator-runnable benchmarks for regression spot checks |
+| [Auth benchmarks](operator/auth-benchmarks.md) | Session + OIDC validation p99 targets and measured baselines |
+| [Legacy clients (TLS 1.2)](operator/legacy-clients-tls-1.2.md) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 |
+
+### Runbooks
+
+| Runbook | When |
+|---|---|
+| [Cloud targets](operator/runbooks/cloud-targets.md) | AWS ACM + Azure Key Vault deployment, debugging, rollback |
+| [Expiry alerts](operator/runbooks/expiry-alerts.md) | Per-policy multi-channel routing matrix, severity tiers |
+| [Disaster recovery](operator/runbooks/disaster-recovery.md) | CRL cache, OCSP responder cert, CA private-key rotation, Postgres restore |
+| [Config-encryption upgrade](operator/runbooks/config-encryption-upgrade.md) | Force v1/v2 → v3 re-seal across the database; passphrase rotation procedure |
+| [PostgreSQL backup](operator/runbooks/postgres-backup.md) | Operator-run backup recipe (docker-compose + Kubernetes); recommended cadence; quarterly DR dry-run |
+
+## Migration
+
+You're moving from another cert-management tool to certctl, or running both in parallel.
+
+| From | Doc |
+|---|---|
+| Certbot | [migration/from-certbot.md](migration/from-certbot.md) |
+| acme.sh | [migration/from-acmesh.md](migration/from-acmesh.md) |
+| cert-manager (coexistence, not replacement) | [migration/cert-manager-coexistence.md](migration/cert-manager-coexistence.md) |
+| Caddy ACME (point Caddy at certctl) | [migration/acme-from-caddy.md](migration/acme-from-caddy.md) |
+| cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) |
+| Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) |
+| **API keys → RBAC (v2.0.x → v2.1.0)** | [migration/api-keys-to-rbac.md](migration/api-keys-to-rbac.md) — **AUDIT YOUR API KEYS** post-upgrade |
+| **Enable OIDC SSO** | [migration/oidc-enable.md](migration/oidc-enable.md) — step-by-step OIDC onboarding for an existing API-key + RBAC deployment |
+
+## Contributor
+
+You're contributing to certctl, running tests locally, or trying to understand the CI pipeline.
+
+| Doc | What it covers |
+|---|---|
+| [Testing strategy](contributor/testing-strategy.md) | What we test and why; per-PR fast gates vs daily deep-scan |
+| [Test environment](contributor/test-environment.md) | Local environment with real CAs (Pebble, step-ca, etc.) |
+| [QA prerequisites](contributor/qa-prerequisites.md) | Before running QA: stack boot, demo data baseline, env vars |
+| [QA test suite](contributor/qa-test-suite.md) | qa_test.go reference for release QA |
+| [GUI QA checklist](contributor/gui-qa-checklist.md) | Manual GUI verification pass for release |
+| [Release sign-off](contributor/release-sign-off.md) | Release-day checklist — code state, automated gates, manual QA, artefact verification |
+| [CI pipeline](contributor/ci-pipeline.md) | CI shape, regression guards, adding new checks |
+| [CI guards](contributor/ci-guards.md) | Per-class CI guards (code-shape, contract-parity, build/dep, operational); how to add one |
+
+## Archive
+
+Historical docs preserved for reference. Most operators don't need these.
+
+| Doc | Why archived |
+|---|---|
+| [Upgrade to TLS (v2.2)](archive/upgrades/to-tls-v2.2.md) | Pre-v2.2 HTTPS-everywhere upgrade procedure |
+| [Upgrade past v2 JWT removal](archive/upgrades/to-v2-jwt-removal.md) | G-1 milestone JWT auth removal procedure |
+
+---
+
+## Reading order by role
+
+**First-time operator:** [Concepts](getting-started/concepts.md) → [Quickstart](getting-started/quickstart.md) → [Examples](getting-started/examples.md). About 90 minutes end to end.
+
+**Production operator:** [Architecture](reference/architecture.md) → [Security posture](operator/security.md) → [Control plane TLS](operator/tls.md) → [Disaster recovery runbook](operator/runbooks/disaster-recovery.md). About 4 hours end to end.
+
+**PKI engineer:** [ACME server](reference/protocols/acme-server.md) → [SCEP server](reference/protocols/scep-server.md) → [EST server](reference/protocols/est.md) → [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md). About 6 hours end to end.
+
+**Contributor:** [Architecture](reference/architecture.md) → [Testing strategy](contributor/testing-strategy.md) → [Test environment](contributor/test-environment.md) → [CI pipeline](contributor/ci-pipeline.md). About 3 hours end to end.
@@ -1,10 +1,18 @@
 # Upgrading to HTTPS-Everywhere (v2.2)

+> Last reviewed: 2026-05-05
+
+> **Archived 2026-05-05.** This upgrade guide applies to certctl < v2.2.
+> Current operators on v2.2+ already have HTTPS-only control planes and
+> don't need this procedure. For the steady-state TLS reference, see
+> [`docs/operator/tls.md`](../../operator/tls.md). Preserved here for
+> late upgraders coming off pre-v2.2 releases.
+
 certctl's control plane is HTTPS-only as of v2.2. There is no `http` mode, no `auto` mode, no dual-listener bind, no N-release migration window. The cutover is a single step. Out-of-date agents that still point at `http://…` fail at the TCP/TLS handshake layer on first connect after the upgrade and stay `Offline` in the dashboard until their env block is updated and the fleet is rolled.

 This doc walks operators through the cutover for the two shipped deployment topologies — docker-compose and Helm — and documents the failure modes and rollback posture explicitly.

-For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](tls.md). This doc is the narrow "how do I upgrade" procedure.
+For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](../../operator/tls.md). This doc is the narrow "how do I upgrade" procedure.

 ## Preconditions

@@ -22,7 +30,7 @@ There is no schema migration tied to this release; the only at-rest state that c

 ## Procedure — docker-compose operators

-The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)
+The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](../../operator/tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)

 1. **Pull the HTTPS-everywhere release.** From the repo root:

@@ -68,7 +76,7 @@ The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init conta

 ## Procedure — Helm operators

-The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](tls.md) for the full pattern catalog.
+The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](../../operator/tls.md) for the full pattern catalog.

 1. **Provision cert material.** Pick one of:

@@ -182,13 +190,13 @@ Once every agent is `Online`, confirm a few invariants:
 - `curl -sS -o /dev/null -w "%{http_code}\n" http://localhost:8443/health` returns `000` with `Connection refused` (no HTTP listener). Plaintext is gone.
 - `openssl s_client -connect localhost:8443 -tls1_2 </dev/null` fails the handshake. TLS 1.2 is rejected.
 - `openssl s_client -connect localhost:8443 -tls1_3 </dev/null` succeeds and prints the server's SAN list. TLS 1.3 is live.
- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](tls.md).
+- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](../../operator/tls.md).

 Update your runbooks. Every `http://certctl.example.com` URL in internal documentation, monitoring config, and on-call playbooks should become `https://certctl.example.com` plus a CA-trust note.

 ## Related docs

- [`tls.md`](tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough (post-HTTPS)
- [`test-env.md`](test-env.md) — integration test environment (HTTPS-only)
+- [`tls.md`](../../operator/tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
+- [`quickstart.md`](../../getting-started/quickstart.md) — docker-compose walkthrough (post-HTTPS)
+- [`test-env.md`](../../contributor/test-environment.md) — integration test environment (HTTPS-only)
 - Milestone spec: `prompts/https-everywhere-milestone.md`
@@ -1,8 +1,17 @@
 # Upgrading past G-1 — `CERTCTL_AUTH_TYPE=jwt` removal

+> Last reviewed: 2026-05-05
+
+> **Archived 2026-05-05.** This upgrade guide applies to operators
+> upgrading past the G-1 milestone (the `CERTCTL_AUTH_TYPE=jwt` removal).
+> Current operators on post-G-1 releases don't need this. For the
+> steady-state security posture reference, see
+> [`docs/operator/security.md`](../../operator/security.md). Preserved
+> here for late upgraders.
+
 If your certctl deployment currently sets `CERTCTL_AUTH_TYPE=jwt` (or `server.auth.type=jwt` in Helm), the next certctl upgrade will fail-fast at startup with a dedicated diagnostic. This guide explains why, what to switch to, and how to keep JWT/OIDC at your edge.

-For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`upgrade-to-tls.md`](upgrade-to-tls.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.
+For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`to-tls-v2.2.md`](to-tls-v2.2.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.

 ## Why we removed it

@@ -98,7 +107,7 @@ services:
      # ... rest of the certctl env block unchanged
 ```

-Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).
+Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](../../operator/tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).

 ### Traefik ForwardAuth pattern (Kubernetes)

@@ -147,8 +156,8 @@ There is no on-disk state that changes with this upgrade — no migrations to ro

 ## Cross-references

- [`architecture.md`](architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
- [`tls.md`](tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
+- [`architecture.md`](../../reference/architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
+- [`tls.md`](../../operator/tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
 - [`../deploy/helm/certctl/README.md`](../deploy/helm/certctl/README.md) — Helm-chart-flavored guidance.
 - `internal/config/config.go::ValidAuthTypes` — the single source of truth for what's accepted post-G-1.
 - `internal/repository/postgres/db.go::wrapPingError` — unrelated; pattern for runtime diagnostic of operator misconfiguration.
@@ -1,230 +0,0 @@
-# CI Pipeline — Operator Guide
-
-> Authoritative guide to certctl's CI pipeline shape.
-> Per `cowork/ci-pipeline-cleanup-prompt.md` Phase 12.
-
-## Trigger model
-
-Three triggers, each with its own scope. Don't mix.
-
-| Trigger | Workflow | Scope | Wall-clock target |
-|---|---|---|---|
-| Push to master, PR to master | `.github/workflows/ci.yml` + `.github/workflows/codeql.yml` | Blocking — every check earns its keep | <10 min |
-| Daily 06:00 UTC + `workflow_dispatch` | `.github/workflows/security-deep-scan.yml` | Slow scans (gosec, osv, trivy, ZAP, schemathesis, nuclei, testssl, semgrep, mutation, `-race -count=10`); best-effort, never blocks | 60 min budget |
-| Tag push (`v*`) | `.github/workflows/release.yml` | Cross-platform binaries, ghcr.io push, SLSA provenance, GitHub release | n/a |
-
-This guide covers the **on-push pipeline** only.
-
-## On-push pipeline (7 status checks)
-
-```mermaid
-flowchart TD
-    Push["push to master"]
-    CI["CI workflow (5 jobs)"]
-    CodeQL["CodeQL workflow (2 jobs)"]
-    GoBuild["go-build-and-test<br/>~6-7 min"]
-    Frontend["frontend-build<br/>~1 min"]
-    HelmLint["helm-lint<br/>~10 sec"]
-    Vendor["deploy-vendor-e2e<br/>~5 min, depends on go-build-and-test"]
-    Image["image-and-supply-chain<br/>~3 min, parallel"]
-    AnalyzeGo["Analyze (go)<br/>~5 min, parallel"]
-    AnalyzeJS["Analyze (javascript-typescript)<br/>~5 min, parallel"]
-    Push --> CI
-    Push --> CodeQL
-    CI --> GoBuild
-    CI --> Frontend
-    CI --> HelmLint
-    CI --> Vendor
-    CI --> Image
-    CodeQL --> AnalyzeGo
-    CodeQL --> AnalyzeJS
-    GoBuild -.depends on.-> Vendor
-```
-
-End-to-end wall-clock: dominated by `go-build-and-test` + `deploy-vendor-e2e` chain (~12 min) running in parallel with CodeQL (~5 min). Target ~10 min.
-
-## Per-job deep-dive
-
-### `go-build-and-test` (Ubuntu, ~6-7 min)
-
-Runs the Go build/test suite + 18 of 20 regression guards.
-
-Steps:
-1. `actions/checkout@v4`
-2. `actions/setup-go@v5` (Go 1.25.9)
-3. `go build ./cmd/...` (server, agent, mcp-server, cli)
-4. **gofmt drift** — `gofmt -l .` must be empty (Makefile::verify parity)
-5. **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
-6. `go vet ./...`
-7. Install + run **golangci-lint** v2.11.4 (`--timeout 5m`)
-8. Install + run **govulncheck** (hard gate)
-9. Install + run **staticcheck** (hard gate; `continue-on-error: false`)
-10. **Race Detection** — `go test -race -count=1 ./internal/...` (9-package list, 5min timeout)
-11. **Go Test with Coverage** — full coverage profile to `coverage.out`
-12. **Check Coverage Thresholds** — `bash scripts/check-coverage-thresholds.sh` (reads `.github/coverage-thresholds.yml`)
-13. **Upload Coverage Report** — artifact (`go-coverage`, 30-day retention)
-14. **Coverage PR comment** — posts/updates per-PR coverage table (PR builds only)
-15. **Regression guards** — loop runs all `scripts/ci-guards/*.sh` (18 of 20 guards)
-
-Local equivalent: `make verify` covers steps 4, 6, 7, 11 (with `-short`).
-
-### `frontend-build` (Ubuntu, ~1 min)
-
-Vitest tests + tsc check + vite build + 2 of 20 regression guards (already covered by the ci-guards loop in `go-build-and-test`).
-
-Steps:
-1. `actions/checkout@v4`
-2. `actions/setup-node@v4` (Node 22)
-3. `npm ci`
-4. `npx tsc --noEmit`
-5. `npx vitest run`
-6. `npx vite build`
-7. **Regression guards** — same `scripts/ci-guards/*.sh` loop as `go-build-and-test` (catches frontend-side guards: S-1, P-1, T-1, L-015, L-019, M-009, G-3)
-
-### `helm-lint` (Ubuntu, ~10 sec)
-
-Helm chart validation in 3 modes + inverse fail-loud test:
-1. `helm lint` with existingSecret
-2. `helm template` (existingSecret mode)
-3. `helm template` (cert-manager mode)
-4. `helm template` (no TLS source — MUST fail per fail-loud guard)
-
-### `deploy-vendor-e2e` (Ubuntu, ~5 min, depends on `go-build-and-test`)
-
-Single-job collapse of the prior 12-job matrix (per ci-pipeline-cleanup Phase 5 / frozen decision 0.4 — revises Bundle II decision 0.9).
-
-Steps:
-1. `actions/checkout@v5`
-2. `actions/setup-go@v5` (Go 1.25.9, cache: true)
-3. **Build f5-mock-icontrol sidecar** — only sidecar without published image
-4. **Bring up all vendor sidecars** — `docker compose --profile deploy-e2e up -d` (11 sidecars)
-5. **Run all vendor-edge e2e** — `go test -tags integration -race -count=1 -run 'VendorEdge_'`; output captured to `test-output.log`
-6. **Skip-count enforcement** — `bash scripts/ci-guards/vendor-e2e-skip-check.sh test-output.log` (catches sidecar boot failures via skip-count vs allowlist)
-7. **Tear down sidecars** — `docker compose down -v` (always runs)
-
-The `deploy-vendor-e2e-windows` matrix was deleted entirely (per ci-pipeline-cleanup Phase 6 / frozen decision 0.5 — revises Bundle II decision 0.4). IIS + WinCertStore validation moved to [`docs/connector-iis.md::Operator validation playbook`](connector-iis.md#operator-validation-playbook-windows-host).
-
-### `image-and-supply-chain` (Ubuntu, ~3 min, parallel)
-
-Three checks bundled (per ci-pipeline-cleanup Phases 7-9 / frozen decision 0.8):
-1. **Digest validity** — `bash scripts/ci-guards/digest-validity.sh`. Resolves every `@sha256:<digest>` ref in `deploy/**/*.{yml,Dockerfile*}` against its registry. Closes the H-001 lying-field gap.
-2. **Docker build smoke** — builds all 4 Dockerfiles (`Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`).
-3. **OpenAPI ↔ handler operationId parity** — `bash scripts/ci-guards/openapi-handler-parity.sh`. Every router route must have a matching `operationId` in `api/openapi.yaml` or be documented in `api/openapi-handler-exceptions.yaml`.
-
-### CodeQL (Ubuntu × 2 languages, ~5 min)
-
-`.github/workflows/codeql.yml` — interprocedural taint tracking. Two matrix jobs: `go` and `javascript-typescript`. Triggers on push, PR, and weekly Sunday cron.
-
-## The 20 regression guards
-
-Located at `scripts/ci-guards/<id>.sh`. Each script is callable locally:
-
-```bash
-bash scripts/ci-guards/G-3-env-docs-drift.sh
-```
-
-Or run all of them:
-
-```bash
-for g in scripts/ci-guards/*.sh; do
-  echo "=== $(basename "$g") ==="
-  bash "$g" || echo "  FAILED"
-done
-```
-
-| ID | Catches |
-|---|---|
-| `G-1-jwt-auth-literal` | JWT silent auth downgrade reappearing |
-| `L-001-insecure-skip-verify` | Bare `InsecureSkipVerify: true` without `//nolint:gosec` |
-| `H-001-bare-from` | Bare Dockerfile `FROM` without `@sha256:` digest pin |
-| `M-012-no-root-user` | Dockerfile missing terminal `USER <non-root>` |
-| `H-009-readme-jwt` | README re-introducing JWT-as-supported claim |
-| `G-2-api-key-hash-json` | `api_key_hash` in JSON-emitting surface |
-| `U-2-plaintext-healthcheck` | Plaintext `http://` in HEALTHCHECK |
-| `U-3-migration-mount` | Migration file mounted into postgres initdb |
-| `D-1-D-2-statusbadge-phantom` | Dead StatusBadge keys + 8 TS phantom fields across 4 interfaces |
-| `L-1-bulk-action-loop` | Client-side `for ... await` bulk action loops |
-| `B-1-orphan-crud` | 8 update/create/delete fns lose page consumers |
-| `S-2-strings-contains-err` | `strings.Contains(err.Error(), ...)` brittle dispatch |
-| `G-3-env-docs-drift` | `CERTCTL_*` env var defined OR documented but not both |
-| `test-naming-convention` | `func TestXxx` lowercase first letter (Go silently skips) |
-| `S-1-hardcoded-source-counts` | Hardcoded "N issuer connectors" prose |
-| `P-1-documented-orphan-fns` | 16 read-fn names removed from client.ts exports |
-| `T-1-frontend-page-coverage` | New page in `web/src/pages/` without sibling `.test.tsx` |
-| `bundle-8-L-015-target-blank-rel-noopener` | `target="_blank"` without `rel="noopener noreferrer"` |
-| `bundle-8-L-019-dangerously-set-inner-html` | `dangerouslySetInnerHTML` outside `safeHtml.ts` |
-| `bundle-8-M-009-bare-usemutation` | Bare `useMutation()` outside the `useTrackedMutation` wrapper |
-
-Plus three additional scripts for non-guard operator workflows:
- `scripts/ci-guards/vendor-e2e-skip-check.sh` — vendor-e2e skip-count enforcement (used by `deploy-vendor-e2e` job)
- `scripts/ci-guards/digest-validity.sh` — used by `image-and-supply-chain` job
- `scripts/ci-guards/openapi-handler-parity.sh` — used by `image-and-supply-chain` job
- `scripts/ci-guards/coverage-pr-comment.sh` — used by `go-build-and-test` job
- `scripts/check-coverage-thresholds.sh` — used by `go-build-and-test` job
-
-## Coverage thresholds
-
-Manifest at `.github/coverage-thresholds.yml`. Each entry has `floor:` (integer percentage) + `why:` (load-bearing context). Lowering a floor REQUIRES corresponding code-side test work — never lower the gate to make CI green.
-
-To add a new gated package: add an entry to the YAML; no script changes needed.
-
-## Make targets — three-tier convention
-
-| Target | When | What |
-|---|---|---|
-| `make verify` | **Required pre-commit** | gofmt + vet + golangci-lint + go test -short |
-| `make verify-deploy` | Optional pre-push | digest-validity + OpenAPI parity + Docker build smoke (server + agent only — fast subset) |
-| `make verify-docs` | **Required pre-tag** | QA-doc Part-count + seed-count drift checks |
-
-## Adding a new check
-
-| Check type | Where it goes | Auto-picked-up by CI? |
-|---|---|---|
-| Regression guard (grep / shape pattern) | New `scripts/ci-guards/<id>.sh` script | Yes — loop step iterates `*.sh` |
-| Coverage threshold (per-package) | New entry in `.github/coverage-thresholds.yml` | Yes — bash loop reads YAML |
-| OpenAPI route exception | New entry in `api/openapi-handler-exceptions.yaml` | Yes — parity script reads YAML |
-| Vendor-e2e expected skip | New line in `scripts/ci-guards/vendor-e2e-skip-allowlist.txt` | Yes — skip-check script reads file |
-| New CI job | Edit `.github/workflows/ci.yml` directly | n/a (job definition is the source) |
-
-## Troubleshooting
-
-| CI step fails | Likely cause | Fix |
-|---|---|---|
-| `gofmt drift` | source needs `gofmt -w` | `make fmt` locally + commit |
-| `go mod tidy drift` | imported a package without committing go.mod | `go mod tidy` + commit |
-| `Run staticcheck` | new SA1019 deprecated-API site | migrate the API OR add `//lint:ignore SA1019 <reason>` |
-| `Check Coverage Thresholds` | per-package coverage dropped below floor | add tests; do NOT lower the floor |
-| `Regression guards` (any `<id>.sh`) | the audit-finding the guard pinned reappeared | read the guard's head-comment block for the closure rationale + fix the regression |
-| `Skip-count enforcement` | a vendor sidecar failed to start | check docker logs; fix sidecar; OR if a new Windows-only test was added, add to `scripts/ci-guards/vendor-e2e-skip-allowlist.txt` |
-| `Digest validity` | a `@sha256` digest doesn't resolve | re-resolve from registry, replace in compose / Dockerfile |
-| `OpenAPI ↔ handler parity` | new router route without operationId | add to `api/openapi.yaml` (preferred) OR `api/openapi-handler-exceptions.yaml` |
-| `Docker build smoke` | Dockerfile syntax error or COPY path drift | fix the Dockerfile |
-| `CodeQL Analyze` | interprocedural dataflow finding | review the SARIF in Security → Code scanning tab |
-
-## Status check accounting
-
-**Current (post-cleanup):** 7 status checks per push.
- 1 × `Go Build & Test`
- 1 × `Frontend Build`
- 1 × `Helm Chart Validation`
- 1 × `deploy-vendor-e2e`
- 1 × `image-and-supply-chain`
- 2 × `CodeQL Analyze (<lang>)` (go + javascript-typescript)
-
-**Pre-cleanup (HEAD `1de61e91`):** 19 status checks. The 12-vendor matrix + 2-vendor Windows matrix collapsed to 1 + 0 respectively; the 3 Go/Frontend/Helm jobs unchanged; 2 CodeQL unchanged; 1 new `image-and-supply-chain` added.
-
-## Required GitHub branch protection list
-
-When updating the `master` branch protection rule (Settings → Branches), the "Require status checks to pass" list should be exactly:
-
-```
-Go Build & Test
-Frontend Build
-Helm Chart Validation
-deploy-vendor-e2e
-image-and-supply-chain
-Analyze (go)
-Analyze (javascript-typescript)
-```
-
-Old-name checks (`deploy-vendor-e2e (<vendor>)` × 12, `deploy-vendor-e2e-windows (<vendor>)` × 2) won't appear on new PRs after the workflow change. Operator removes them from the required list.
@@ -1,341 +0,0 @@
-# NIST SP 800-57 Key Management Alignment
-
-NIST SP 800-57 Part 1 Rev 5 (May 2020) is the authoritative US government guidance on cryptographic key management. This document maps certctl's implementation to its recommendations. certctl follows NIST guidance where applicable; this guide documents the alignment and identifies gaps for future roadmap planning.
-
-## Contents
-
-1. [Key Generation (Section 6.1)](#key-generation-section-61)
-2. [Key Storage and Protection (Sections 6.3, 6.4)](#key-storage-and-protection-sections-63-64)
-3. [Cryptoperiods (Section 5.3, Table 1)](#cryptoperiods-section-53-table-1)
-4. [Key States and Transitions (Section 5.2)](#key-states-and-transitions-section-52)
-5. [Algorithm Recommendations (Section 5.1, SP 800-131A)](#algorithm-recommendations-section-51-sp-800-131a)
-6. [Key Distribution and Transport (Section 6.2)](#key-distribution-and-transport-section-62)
-7. [Revocation and Compromise (NIST SP 800-57 Part 3)](#revocation-and-compromise-nist-sp-800-57-part-3)
-8. [Alignment Summary Table](#alignment-summary-table)
-9. [Gaps and Remediation Roadmap](#gaps-and-remediation-roadmap)
-   - [V2 (Current)](#v2-current)
-   - [V3 (Planned: 2026)](#v3-planned-2026)
-   - [V5 (Planned: 2027+)](#v5-planned-2027)
-   - [Post-Quantum (2027+)](#post-quantum-2027)
-10. [References](#references)
-11. [Questions or Corrections?](#questions-or-corrections)
-
-## Key Generation (Section 6.1)
-
-certctl generates certificate keys on agent infrastructure using Go's `crypto/rand` for entropy, backed by `/dev/urandom` on Linux and `CryptGenRandom` on Windows. Key generation happens as follows:
-
-**Agent-Side Key Generation (Production Default)**
- Agents generate ECDSA P-256 key pairs per certificate using `crypto/ecdsa` + `crypto/elliptic` (Go stdlib)
- Key generation triggered by `AwaitingCSR` job state in renewal/issuance workflows
- Agent creates Certificate Signing Request (CSR) with `x509.CreateCertificateRequest`, signed with the agent's private key
- Only the CSR crosses the network to the control plane; private key material never leaves the agent
- Configuration: `CERTCTL_KEYGEN_MODE=agent` (default, production)
-
-**Server-Side Key Generation (Demo Only)**
- Available for development and testing via `CERTCTL_KEYGEN_MODE=server`
- Explicitly logged as a warning at startup: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only"
- Docker Compose demo uses server mode for backward compatibility
- Not recommended for production; agent mode is the secure default
-
-**Entropy Source**
- `crypto/rand` provides cryptographically secure random bytes
- On Linux: backed by `/dev/urandom` via `getrandom()` syscall
- On Windows: backed by `CryptGenRandom()` (now `BCryptGenRandom()`)
- Meets NIST SP 800-90B requirements for entropy generation
-
-## Key Storage and Protection (Sections 6.3, 6.4)
-
-certctl implements tiered key storage with different protection profiles based on key purpose.
-
-**Agent Private Keys**
- Stored on agent filesystem at `CERTCTL_KEY_DIR` (default: `/var/lib/certctl/keys`)
- File permissions: 0600 (read/write by agent process only, no world/group access)
- One PEM file per certificate, organized by certificate ID
- Accessible only to the agent process; isolated from other processes
- For container deployments: use Docker volumes with restricted permissions (`-v /var/lib/certctl/keys:0600`)
-
-**Issuing CA Keys (Local CA Connector)**
- Loaded from disk at server startup via `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH` env vars
- Supports RSA (PKCS#1, PKCS#8) and ECDSA (SEC1, PKCS#8) key formats
- Validates certificate constraints before use:
-  - `IsCA=true` flag present
-  - `KeyUsageCertSign` extension set
-  - Valid certificate chain (for sub-CA mode)
- Keys held in memory during server runtime (no on-disk caching after load)
- Cleared from memory only on server shutdown
-
-**Sub-CA Mode (Enterprise Integration)**
- CA certificate and key signed by upstream enterprise root (e.g., Active Directory Certificate Services)
- Certctl acts as subordinate CA, inheriting issuer DN from upstream CA
- All issued certificates chain to enterprise trust anchor
- CA key protection inherits upstream root's key management practices
- Configured via: `CERTCTL_CA_CERT_PATH=/path/to/ca.crt` and `CERTCTL_CA_KEY_PATH=/path/to/ca.key`
-
-**NIST Gap: HSM Storage**
-NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for certctl Pro (V3), enabling integration with:
- AWS CloudHSM
- Azure Dedicated HSM
- Thales Luna, Gemalto SafeNet, YubiHSM (on-premises)
- PKCS#11-compatible devices
-
-## Cryptoperiods (Section 5.3, Table 1)
-
-NIST recommends cryptoperiods (key validity durations) based on key type and security requirements. certctl enforces cryptoperiods through certificate profiles and renewal policies.
-
-**Certificate Profile Enforcement**
- Certificate profiles (M11a) define `max_ttl` constraint per enrollment profile
- All certificates issued through a profile cannot exceed the profile's max_ttl
- Profile configuration example:
-  ```json
-  {
-    "id": "prof-web-prod",
-    "name": "Production Web Certs",
-    "max_ttl_seconds": 31536000,  // 1 year max
-    "allowed_key_algorithms": ["ECDSA_P256"],
-    "required_sans": ["example.com"]
-  }
-  ```
-
-**Renewal Thresholds**
- Renewal policies with configurable `alert_thresholds_days`: `[30, 14, 7, 0]` (days before expiry)
- Background scheduler checks renewal eligibility every 1 hour
- Certificates transitioned to `Expiring` status at 30 days, `Expired` at 0 days
- Renewal workflow can be triggered manually or automatically
-
-**NIST Cryptoperiod Recommendations vs certctl Implementation**
-
-| Key Type | NIST Recommendation | certctl Implementation |
-|----------|---------------------|------------------------|
-| CA signing key | 3–10 years | Configured via CA certificate not-after date; inheritable from upstream CA in sub-CA mode |
-| End-entity web server cert | 1–3 years (trending shorter) | Profile `max_ttl` configurable; ACME issuer typically 90 days; SC-081v3 mandating 47 days by 2029 |
-| Code signing cert | 2–8 years | Profile enforcement via `max_ttl`; not primary certctl use case |
-| Short-lived credentials | < 1 hour recommended | Profile TTL < 1 hour; exempt from CRL/OCSP (expiry is sufficient revocation); auto-expiry on scheduler tick |
-| OCSP signing key | 1–2 years | Embedded OCSP responder uses issuing CA key (same period as issuer) or delegated signing cert |
-| TLS/SSL interoperability cert | 1–2 years | Trending 1 year or less; certctl's ACME/sub-CA/step-ca issuers all support short periods |
-
-## Key States and Transitions (Section 5.2)
-
-NIST defines lifecycle states for keys: pre-activation, active, suspended, deactivated, compromised, and destroyed. certctl maps these to certificate and job states:
-
-| NIST Key State | certctl Equivalent | Implementation |
-|---|---|---|
-| **Pre-activation** | `Pending` job state / `AwaitingCSR` | Job created but key not yet generated; awaiting agent CSR submission (agent-mode) or server keygen (demo mode) |
-| **Active** | Certificate status `Active` | Cert deployed to targets and in use; within validity period (not before < now < not after) |
-| **Suspended** | Job state `AwaitingApproval` | Interactive approval holds deployment job pending human review; resumes on approval or cancels on rejection |
-| **Deactivated** | Certificate status `Expired` | Past not-after date; auto-transitioned by scheduler every 2 minutes; renewal eligible |
-| **Compromised** | Certificate status `Revoked` | Issued via `POST /api/v1/certificates/{id}/revoke` with RFC 5280 revocation reason |
-| **Destroyed** | Archived (implementation detail) | Operator responsibility; certctl retains all certs in audit trail for compliance; no destructive deletion API |
-
-**State Transition Audit Trail**
-All transitions logged to immutable `audit_events` table with:
- Event type (e.g., `certificate_revoked`, `renewal_job_completed`)
- Actor (authenticated user or agent ID)
- Timestamp (RFC3339)
- Resource (certificate ID)
- Reason (revocation reason code, approval reason, etc.)
- HTTP method, path, status (for API calls)
-
-Example audit entry for revocation:
-```json
-{
-  "id": "ae-2024-0615",
-  "event_type": "certificate_revoked",
-  "actor": "ops-alice@example.com",
-  "timestamp": "2024-06-15T14:23:00Z",
-  "resource_id": "cert-web-prod-2024",
-  "resource_type": "certificate",
-  "description": "Revoked: reason=keyCompromise",
-  "body_hash": "sha256:a1b2c3d..."
-}
-```
-
-## Algorithm Recommendations (Section 5.1, SP 800-131A)
-
-NIST SP 800-131A Rev 2 (January 2024) categorizes cryptographic algorithms as Approved, Conditionally Approved, or Disallowed. certctl implements only NIST-approved algorithms:
-
-| Algorithm | NIST Status | certctl Support | Notes |
-|-----------|-------------|-----------------|-------|
-| **ECDSA P-256** | Approved (128-bit security strength) | Default for agent-side keygen | Meets NIST curve requirements (FIPS 186-4) |
-| **ECDSA P-384** | Approved (192-bit security strength) | Supported via profile configuration | Higher security margin; slower than P-256 |
-| **ECDSA P-521** | Approved (256-bit security strength) | Supported via profile configuration | Rarely needed; overkill for TLS |
-| **RSA 2048** | Approved minimum (112-bit security, transitioning) | Supported via all issuers | Deprecated path; migrate to 3072+ by 2030 per NIST |
-| **RSA 3072** | Approved (128-bit security) | Supported via all issuers | Recommended minimum for long-term security |
-| **RSA 4096** | Approved (192-bit security) | Supported via all issuers | Supported but slower; overkill for most TLS |
-| **SHA-256** | Approved | Used throughout | CSR signing, certificate fingerprints, audit body hashing, CRL/OCSP signing |
-| **SHA-384** | Approved (192-bit) | Supported where algorithm selection available | Used in some CA signing scenarios |
-| **SHA-512** | Approved (256-bit) | Supported where algorithm selection available | Rarely needed; SHA-256 suffices for most use cases |
-| **SHA-1** | Deprecated | Not used in certctl | Browsers reject SHA-1 certs; certctl never generates them |
-
-**Algorithm Enforcement via Profiles**
-Certificate profiles enforce allowed key algorithms:
-```json
-{
-  "id": "prof-web-prod",
-  "allowed_key_algorithms": ["ECDSA_P256", "ECDSA_P384", "RSA3072"]
-}
-```
-
-**Post-Quantum Cryptography (Tracking)**
-NIST has finalized PQC standards (FIPS 204, FIPS 205) in August 2024:
- **ML-KEM** (Kyber): Approved key encapsulation mechanism
- **ML-DSA** (Dilithium): Approved digital signature algorithm
- **SLH-DSA** (SPHINCS+): Approved stateless hash-based signature scheme
-
-certctl will track NIST's PQC roadmap and plan integration when hybrid PQC+classical certificate formats reach browser/infrastructure support. Currently, pure PQC certificates are not widely interoperable.
-
-## Key Distribution and Transport (Section 6.2)
-
-NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize exposure during transit. certctl implements a zero-transmission-of-private-keys model:
-
-**Private Key Distribution**
- Agent-side keygen model: Private keys never leave agent infrastructure
- CSR transmitted over HTTPS (TLS 1.2+) with mutual TLS optional
- API key authentication via `Authorization: Bearer <api-key>` header
- All API calls logged to immutable audit trail
-
-**Signed Certificate Distribution**
- Certificates (public component) distributed via `GET /agents/{id}/work` over HTTPS
- Work endpoint enriches deployment jobs with certificate PEM and metadata
- Certificate PEM is idempotent (same cert always returns same bytes)
-
-**Target Deployment**
- Deployment to targets via local filesystem write (NGINX, Apache, HAProxy)
- No network transmission of private keys to targets
- Agents read local private key from `CERTCTL_KEY_DIR` on deployment
- For appliances without agents (F5 BIG-IP, IIS), proxy agent pattern:
-  - Proxy agent runs in same trust zone as appliance
-  - Proxy agent holds target API credentials (iControl, WinRM)
-  - Control plane never communicates with appliance directly
-  - Deployment request includes certificate and proxy agent ID
-  - Proxy agent executes deployment via appliance API
-
-**Revocation Distribution**
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
-  - Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
-  - 24-hour validity period
-  - Includes all revoked serials, reasons, and revocation timestamps
-  - Served unauthenticated so relying parties without certctl API credentials can fetch it
-  - Subject to URL caching; OCSP preferred for real-time revocation
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
-  - Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
-  - Signed by issuing CA (or delegated OCSP signing cert)
-  - Responds with good/revoked/unknown status
-  - Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
-  - Real-time, more bandwidth-efficient than CRL polling
-
-## Revocation and Compromise (NIST SP 800-57 Part 3)
-
-NIST SP 800-57 Part 3 covers revocation (Section 2.5) when keys are suspected compromised or no longer needed. certctl implements comprehensive revocation infrastructure:
-
-**Revocation API**
- Endpoint: `POST /api/v1/certificates/{id}/revoke`
- Request body:
-  ```json
-  {
-    "reason": "keyCompromise",
-    "reason_text": "Private key exposed in log file"
-  }
-  ```
- Supports all 8 RFC 5280 revocation reason codes:
-  - `unspecified` — no specific reason provided
-  - `keyCompromise` — private key suspected compromised
-  - `caCompromise` — issuing CA key compromised
-  - `affiliationChanged` — subject org/affiliation changed
-  - `superseded` — cert superseded by newer cert
-  - `cessationOfOperation` — key no longer in use
-  - `certificateHold` — temporary hold (rarely used)
-  - `privilegeWithdrawn` — subject authorization withdrawn
-
-**Revocation Recording**
- Certificate status updated to `Revoked`
- Entry recorded in `certificate_revocations` table with:
-  - Certificate serial number
-  - Revocation timestamp
-  - Revocation reason code
-  - Issuer ID
- Idempotent (revoking an already-revoked cert is safe; returns 200 OK)
-
-**Issuer Notification (Best-Effort)**
- Control plane calls `issuer.RevokeCertificate(ctx, serial, reason)` on issuing connector
- Failure does not block the revocation (async, logged, retried)
- Supported issuers:
-  - Local CA: generates new CRL immediately
-  - ACME: submits revocation to ACME server (RFC 8555 Section 7.6)
-  - step-ca: calls `/revoke` API
-  - OpenSSL: executes user-provided revocation script
-
-**Revocation Notifications**
- Notifiers triggered after revocation recorded: Slack, Teams, PagerDuty, OpsGenie, email, webhook
- Message includes certificate common name, issuer, reason, actor, timestamp
- Delivery is asynchronous and retried on failure
-
-**CRL and OCSP Distribution**
- CRL updated on every revocation (or scheduled refresh for non-issued revocations)
- OCSP responder queries revocation table in real-time
- Short-lived certificate exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
-
-**Bulk Revocation for Large-Scale Compromise Response** (V2.2) — NIST SP 800-57 Part 3 emphasizes rapid revocation when keys are compromised. `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria (profile, owner, agent, issuer) in a single operation. This enables operators to execute fleet-wide revocation for key compromise events affecting multiple certificates. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring every certificate is recorded in the audit trail with the incident reason.
-
-**Revocation Audit Trail**
-All revocation events logged:
- Event type: `certificate_revoked` or `bulk_revocation_initiated` (for fleet operations)
- Actor: authenticated user or service
- Reason code: RFC 5280 enum (or incident justification for bulk operations)
- Timestamp: RFC3339
- Issuer notification status: success or error reason
- Filter criteria: profile_id, owner_id, agent_id, issuer_id (for bulk revocation)
-
-## Alignment Summary Table
-
-| NIST SP 800-57 Area | Status | Coverage | Notes |
-|---|---|---|---|
-| **Key Generation** | ✅ Aligned | 100% | Agent-side ECDSA P-256 using crypto/rand; server mode flagged as demo-only |
-| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V3 Pro |
-| **Cryptoperiods** | ✅ Aligned | 100% | Profile-enforced max_ttl; threshold-based renewal alerting |
-| **Key States** | ✅ Aligned | 100% | Full lifecycle tracking with immutable audit trail |
-| **Algorithms** | ✅ Aligned | 100% | NIST-approved algorithms only; post-quantum tracking in progress |
-| **Key Distribution** | ✅ Aligned | 100% | Private keys never transmitted; CSR/cert over TLS; agent-local deployment |
-| **Revocation** | ✅ Aligned | 100% | CRL, OCSP, all RFC 5280 reason codes; real-time updates |
-
-## Gaps and Remediation Roadmap
-
-### V2 (Current)
- [x] Agent-side key generation
- [x] Profile-enforced cryptoperiods
- [x] CRL and OCSP distribution
- [x] RFC 5280 revocation support
- [x] Immutable audit trail
-
-### V2.2 (Planned: 2026)
- Bulk revocation by profile/owner/agent/issuer (fleet-level revocation for incident response)
-
-### V3 (Planned: 2026)
- Role-based access control (limit revocation/approval to authorized operators)
-
-### V3 Pro (Planned)
- HSM support for CA key storage and agent key storage (TPM 2.0, PKCS#11)
- FIPS 140-2/3 validated crypto module (BoringCrypto build or external FIPS library)
- Key destruction API (explicit secure erasure of agent keys)
- Key escrow / recovery mechanism (backup encrypted private keys for disaster recovery)
-
-### Post-Quantum (2027+)
- ML-KEM and ML-DSA support when browser/TLS ecosystem supports hybrid certificates
- Migration path documentation (how to transition existing RSA certs to PQC)
-
-## References
-
- NIST SP 800-57 Part 1 Rev 5 (May 2020): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf
- NIST SP 800-131A Rev 2 (January 2024): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
- FIPS 186-4 (Digital Signature Standard): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
- RFC 5280 (X.509 PKI Certificate and CRL Profile): https://tools.ietf.org/html/rfc5280
- RFC 8555 (Automatic Certificate Management Environment): https://tools.ietf.org/html/rfc8555
- NIST FIPS 204 (ML-DSA): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.204.pdf
- NIST FIPS 205 (ML-KEM): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.205.pdf
-
-## Questions or Corrections?
-
-This document reflects certctl's implementation as of March 2026. For the latest code, refer to:
- Key generation: `cmd/agent/main.go` (agent keygen) and `internal/service/renewal.go` (server keygen)
- Key storage: `internal/config/config.go` (CERTCTL_KEY_DIR, CERTCTL_CA_CERT_PATH)
- Revocation: `internal/service/revocation.go` and `internal/api/handler/certificates.go`
- Audit trail: `internal/api/middleware/audit.go`
@@ -1,825 +0,0 @@
-# PCI-DSS 4.0 Compliance Mapping
-
-This guide maps certctl's existing capabilities to PCI-DSS 4.0 requirements relevant to TLS certificate and cryptographic key management. It is **not a compliance attestation** — a qualified security assessor (QSA) must evaluate your organization's complete control environment. Rather, this document helps you understand which PCI-DSS control objectives certctl supports and where operator responsibility lies.
-
-Organizations subject to PCI-DSS typically need to demonstrate control over certificate issuance, renewal, rotation, revocation, and key management. Certctl automates the technical controls for certificate lifecycle; compliance depends on how you deploy, monitor, and audit it.
-
-## Contents
-
-1. [How to Use This Guide](#how-to-use-this-guide)
-2. [Requirement 4: Protect Data in Transit](#requirement-4-protect-data-in-transit)
-   - [4.2.1 — Strong Cryptography for Transmission](#421--strong-cryptography-for-transmission)
-   - [4.2.2 — Certificate Inventory and Validation](#422--certificate-inventory-and-validation)
-3. [Requirement 3: Protect Stored Cardholder Data (Key Management)](#requirement-3-protect-stored-cardholder-data-key-management)
-   - [3.6 — Cryptographic Key Documentation](#36--cryptographic-key-documentation)
-   - [3.7 — Key Lifecycle Procedures](#37--key-lifecycle-procedures)
-4. [Requirement 8: Identify and Authenticate](#requirement-8-identify-and-authenticate)
-   - [8.3 — Strong Authentication](#83--strong-authentication)
-   - [8.6 — Application Account Management](#86--application-account-management)
-5. [Requirement 10: Log and Monitor](#requirement-10-log-and-monitor)
-   - [10.2 — Implement Automated Audit Logging](#102--implement-automated-audit-logging)
-   - [10.3 — Protect Audit Trail](#103--protect-audit-trail)
-   - [10.4 — Promptly Review and Address Audit Trail Exceptions](#104--promptly-review-and-address-audit-trail-exceptions)
-   - [10.7 — Retain and Protect Audit Trail History](#107--retain-and-protect-audit-trail-history)
-6. [Requirement 6: Develop and Maintain Secure Systems and Applications](#requirement-6-develop-and-maintain-secure-systems-and-applications)
-   - [6.3.1 — Security Coding Practices](#631--security-coding-practices)
-   - [6.5.10 — Broken Authentication and Cryptography Prevention](#6510--broken-authentication-and-cryptography-prevention)
-7. [Requirement 7: Restrict Access by Business Need-to-Know](#requirement-7-restrict-access-by-business-need-to-know)
-   - [7.2 — Implement Access Control](#72--implement-access-control)
-8. [Evidence Summary Table](#evidence-summary-table)
-9. [Operator Responsibilities](#operator-responsibilities)
-10. [V3 Enhancements for PCI-DSS](#v3-enhancements-for-pci-dss)
-11. [Next Steps for Compliance](#next-steps-for-compliance)
-12. [Questions?](#questions)
-
-## How to Use This Guide
-
-Your QSA will request evidence that your certificate and key management systems meet specific PCI-DSS 4.0 requirements. For each applicable requirement, this guide identifies:
-
-1. **Which certctl features support the control** — API endpoints, database tables, background processes
-2. **What evidence you can produce** — audit logs, dashboard metrics, API queries, deployment configs
-3. **Operator responsibilities** — what you must do outside certctl (policy, monitoring, access control)
-4. **Status** — Available (v1.0 shipped), Planned (future release), or Operator Responsibility (outside scope)
-
---
-
-## Requirement 4: Protect Data in Transit
-
-**Objective**: Ensure strong cryptography is used to protect sensitive data during transmission.
-
-### 4.2.1 — Strong Cryptography for Transmission
-
-**Requirement**: Use appropriate and current cryptographic algorithms for all TLS and SSH connections protecting card data in transit.
-
-**certctl Support**:
- **Automated TLS certificate lifecycle** — Certctl issues TLS certificates to NGINX, Apache HAProxy targets via `POST /api/v1/deployments`. Certificates include RSA 2048-bit and ECDSA P-256 key types (configurable per profile, M11a).
- **Control plane TLS enforcement** — All REST API endpoints served exclusively over HTTPS. Agent-to-server heartbeat and work polling use TLS. No plaintext protocol options.
- **Issuer connector key negotiation** — ACME v2 (Let's Encrypt, ZeroSSL) validates issuer cryptography. Local CA enforces RSA/ECDSA constraints. step-ca integration ensures Smallstep's cryptography standards.
- **Certificate profiles** (M11a) document allowed key types and minimum key sizes per environment (development, production, cardholder-network).
-
-**Evidence You Can Provide**:
- Exported certificate inventory via `GET /api/v1/certificates` with key algorithm and size (serial JSON).
- Issued certificate details showing RSA 2048+ or ECDSA P-256 for all deployed certificates.
- Audit trail (`GET /api/v1/audit`) showing issuer connector selection and certificate profile assignment per certificate.
- Target deployment logs showing TLS certificate installation on NGINX/Apache/HAProxy.
-
-**Operator Responsibility**:
- Configure certificate profiles for your environments with approved key algorithms.
- Audit cipher suite configuration on deployed targets (certctl deploys certs; you verify target TLS settings).
- Periodically review `CERTCTL_KEYGEN_MODE` — must be `agent` in production (never `server`).
- Monitor issuer connector configuration to ensure issuers meet your cryptography standards.
-
-**Status**: **Available** (v1.0 shipped)
-
---
-
-### 4.2.2 — Certificate Inventory and Validation
-
-**Requirement**: Ensure all TLS/SSL certificates used for data transmission are valid, current, and meet required cryptographic standards.
-
-**certctl Support**:
-
- **Managed Certificate Inventory** — Full CRUD API (`/api/v1/certificates`) with sortable, filterable list. Fields: common name, SANs, subject, issuer, serial number, key type/size, not-before/after dates, issuer ID, profile ID, owner, team, status (Active/Expiring/Expired/Revoked).
-
- **Filesystem Certificate Discovery** (M18b) — Agents scan configured directories (`CERTCTL_DISCOVERY_DIRS` env var) for existing PEM/DER certificates every 6 hours and on startup. Control plane deduplicates by SHA-256 fingerprint. Three triage statuses: Unmanaged (not managed by certctl), Managed (linked to a managed certificate), Dismissed (operator-marked as out-of-scope).
-  - API endpoints:
-    - `GET /api/v1/discovered-certificates?status=Unmanaged` — find orphaned certs
-    - `GET /api/v1/discovery-summary` — aggregate counts by status
-    - `POST /api/v1/discovered-certificates/{id}/claim` — link to managed certificate
-    - `POST /api/v1/discovered-certificates/{id}/dismiss` — mark out-of-scope
-
- **Expiration Threshold Alerting** — Renewal policies support `alert_thresholds_days` (default 30, 14, 7, 0). Background scheduler evaluates daily; certificates transition to Expiring/Expired status automatically. Notifications sent to owners via email/webhook/Slack/Teams/PagerDuty.
-
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
-
- **Revocation Infrastructure** (M15a, M15b, M-006):
-  - Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
-  - CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` — DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
-  - OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
-  - Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
-  - Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
-
- **Stats API** (M14) — Real-time visibility:
-  - `GET /api/v1/stats/summary` — total certs, by status, by issuer
-  - `GET /api/v1/stats/expiration-timeline?days=90` — expiration distribution (weekly buckets)
-  - `GET /api/v1/stats/job-trends?days=30` — renewal/issuance job success rates
-  - `GET /api/v1/certificates` with `?sort=-notAfter&fields=id,commonName,notAfter,status` — sparse, sorted inventory
-
-**Evidence You Can Provide**:
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
-
-**Operator Responsibility**:
- Configure `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate storage locations (e.g., `/etc/nginx/certs`, `/etc/apache2/certs`, `/usr/local/share/ca-certificates`).
- Regularly triage discovered certificates: `GET /api/v1/discovered-certificates?status=Unmanaged`, claim or dismiss each.
- Set renewal policies for all certificate profiles with appropriate `alert_thresholds_days` (recommendation: 30, 14, 7, 0).
- Monitor expiration dashboard and respond to Expiring alerts before certificates expire.
- Verify that issued certificates meet your organization's cryptography standards (key type, key size, SANs).
- Test CRL/OCSP endpoints periodically to confirm they are reachable and signed correctly.
-
-**Status**: **Available** (v1.0 shipped, discovery M18b, revocation M15a/M15b)
-
---
-
-## Requirement 3: Protect Stored Cardholder Data (Key Management)
-
-**Objective**: Render cardholder data unreadable anywhere it is stored; protect cryptographic keys used to encrypt data.
-
-### 3.6 — Cryptographic Key Documentation
-
-**Requirement**: Document and implement all key management processes and procedures covering generation, storage, archival, destruction, and change; protect cryptographic keys; and restrict access to keys to the minimum required.
-
-**certctl Support**:
-
- **Certificate Profile Documentation** (M11a) — Named profiles define allowed key types, maximum TTL, and allowed EKUs per use case. Each profile is a documented policy:
-  ```json
-  {
-    "id": "p-web-tls",
-    "name": "Web TLS Production",
-    "allowed_key_types": ["RSA_2048", "ECDSA_P256"],
-    "max_ttl_seconds": 31536000,
-    "require_sans": true,
-    "description": "Production TLS certs for external web services"
-  }
-  ```
-
- **Owner and Team Tracking** (M11b) — Every certificate is assigned an owner (person + email) and optionally a team. This documents key responsibility and escalation paths.
-
- **Issuer Connector Specification** — Configuration and API endpoints document which CA and protocol issues each certificate:
-  - `GET /api/v1/issuers/{id}` returns issuer type (local-ca, acme, step-ca, openssl), CA endpoint, authentication method, constraints
-  - Each issuer type has documented key handling (e.g., Local CA loads CA key from `CERTCTL_CA_CERT_PATH`, step-ca via JWK provisioner)
-
- **Immutable Audit Trail** (M19) — Every certificate lifecycle event recorded in append-only `audit_events` table:
-  - `certificate_issued` — when certificate created, by whom, issuer type, profile
-  - `certificate_renewed` — when renewed, by whom, issuer
-  - `certificate_revoked` — when revoked, by whom, RFC 5280 reason code
-  - `certificate_deployed` — when deployed to target, by agent, target type
-  - Query: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}`
-
-**Evidence You Can Provide**:
- Exported certificate profiles: `GET /api/v1/profiles` showing documented key types, max TTLs, constraints per environment.
- Certificate-to-owner mapping: `GET /api/v1/certificates` with owner/team fields.
- Issuer configuration audit: `GET /api/v1/issuers` showing CA endpoints, key storage paths, auth methods.
- Audit trail for a certificate: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete lifecycle.
-
-**Operator Responsibility**:
- Define and document certificate profiles for each environment and use case.
- Assign owner and team to each certificate via API or dashboard.
- Document issuer connector configuration (CA endpoint, auth method, key storage location).
- Maintain baseline audit trail exports for compliance evidence.
- Establish certificate retirement policy (how long to retain audit records after certificate expiry/revocation).
-
-**Status**: **Available** (v1.0 shipped)
-
---
-
-### 3.7 — Key Lifecycle Procedures
-
-**Requirement**: Generate, store, protect, access, and destroy cryptographic keys used to encrypt data in transit or at rest.
-
-This requirement covers key generation, storage, rotation, and destruction. Certctl addresses the certificate/TLS key portion (not symmetric encryption keys used for cardholder data at rest — those are outside scope).
-
-#### 3.7.1 — Key Generation
-
-**Requirement**: Generate new keys using strong cryptography.
-
-**certctl Support**:
-
- **Agent-Side Key Generation** (M8) — Production mode (default `CERTCTL_KEYGEN_MODE=agent`):
-  - Agents generate ECDSA P-256 key pairs using `crypto/ecdsa` + `crypto/elliptic.P256()` + `crypto/rand` (cryptographically secure random).
-  - Key generation happens **only on the agent**, never on the control plane.
-  - Agent submits Certificate Signing Request (CSR) with public key to control plane via `POST /api/v1/agents/{id}/csr`.
-  - Issued certificate is returned; private key remains on agent at `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`).
-
- **Server-Side Fallback** (demo/development only) — `CERTCTL_KEYGEN_MODE=server`:
-  - Control plane generates RSA 2048-bit or ECDSA P-256 keys using `crypto/rand` + `crypto/rsa`.
-  - Server signs CSR and stores the private key in the certificate version record for agent deployment. **Security note:** In server keygen mode, the control plane holds private keys — this is why agent keygen mode is the recommended default for production.
-  - **Must not be used in production.** Explicit warning logged: `server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only`
-
- **Issuer-Specific Key Negotiation**:
-  - **ACME (Let's Encrypt, ZeroSSL)**: Let's Encrypt controls key types; certctl requests ECDSA P-256 by default.
-  - **Local CA**: Supports RSA 2048+, ECDSA (P-256, P-384), PKCS#8 format. Key algorithm inherited from CA cert or specified via profile.
-  - **step-ca**: Smallstep's provisioner defines key type; certctl respects server constraints.
-  - **OpenSSL / Custom CA**: User-provided signing script; key type depends on CA backend.
-
-**Evidence You Can Provide**:
- Deployment configuration: `CERTCTL_KEYGEN_MODE=agent` in production (verify in `docker-compose.yml`, Kubernetes manifests, or systemd units).
- Agent log excerpt showing key generation: Go `crypto/ecdsa.GenerateKey(elliptic.P256())` via agent process logs with CSR submission timestamp.
- Certificate CSR audit: `GET /api/v1/audit?type=certificate_issued` showing CSR fingerprint (SHA-256 hash of CSR PEM).
- Renewal job logs showing agent-submitted CSR, not server-generated key.
-
-**Operator Responsibility**:
- **Enforce `CERTCTL_KEYGEN_MODE=agent` in all production deployments.** Never use `server` mode outside demos.
- Verify agent hardware is adequately isolated (crypto/rand relies on OS `/dev/urandom` quality).
- Monitor `CERTCTL_KEY_DIR` on agents for unauthorized file access (use OS-level file audit if available).
- Backup agent key directory (`/var/lib/certctl/keys`) as part of disaster recovery procedure.
-
-**Status**: **Available** (v1.0 shipped)
-
-#### 3.7.2 — Key Storage and Access Control
-
-**Requirement**: Restrict cryptographic key access to the minimum required and protect keys from unauthorized access.
-
-**certctl Support**:
-
- **Agent-Side Key Storage** (M8) — Private keys written to `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`):
-  - File permissions: `0600` (readable/writable by agent process owner only).
-  - Filename convention: one file per certificate (e.g., `web-tls-prod.key`, `api-service.key`).
-  - No key data passed over the network between agent and control plane (CSR only).
-  - Keys used locally by agent to sign TLS handshakes, never transmitted to control plane or other systems.
-
- **Control Plane Key Storage** — Sensitive credentials managed via environment variables or `.env` files:
-  - CA private key path: `CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH` (for Local CA sub-CA mode).
-  - ACME account key: embedded in ACME issuer config (not stored separately; ACME library handles in memory).
-  - step-ca provisioner key: `CERTCTL_STEPCA_KEY_PATH` env var (path to JWK private key file, loaded into memory during runtime).
-  - API keys: `CERTCTL_API_KEY` (SHA-256 hashed in database, plaintext never stored).
-  - Database credentials: `CERTCTL_DATABASE_URL` in `.env` file, not in source code.
-
- **Docker Compose Credential Management** — `.env` file (git-ignored) holds all secrets:
-  ```bash
-  CERTCTL_API_KEY=sk-test-...
-  CERTCTL_DATABASE_URL=postgres://user:pass@db:5432/certctl
-  CERTCTL_CA_KEY_PATH=/run/secrets/ca.key
-  ```
-  Credentials never in `docker-compose.yml` or Dockerfile.
-
- **Kubernetes Secrets** (operator responsibility) — Deploy control plane with:
-  ```yaml
-  env:
-    - name: CERTCTL_DATABASE_URL
-      valueFrom:
-        secretKeyRef:
-          name: certctl-secrets
-          key: database-url
-    - name: CERTCTL_API_KEY
-      valueFrom:
-        secretKeyRef:
-          name: certctl-secrets
-          key: api-key
-  ```
-
-**Evidence You Can Provide**:
- Agent key directory listing (without keys): `ls -la /var/lib/certctl/keys` (shows file count, permissions, timestamps).
- Deployment manifest (`docker-compose.yml` or Kubernetes YAML) showing secrets via env var or Secret object (not inline).
- `.env` file (do not share contents, only confirm existence and git-ignore status).
- API key hash verification: `GET /api/v1/auth/check` with API key, verifying hash matching without plaintext exposure.
-
-**Operator Responsibility**:
- **Store `.env` and credential files outside version control.** Verify `.gitignore` includes `.env`, `*.key`, `ca.key`, etc.
- **Restrict file system access to `/var/lib/certctl/keys` on agents** via OS-level permissions (Linux: `chmod 0700`, owned by agent user).
- **Limit CA key file read access** — `CERTCTL_CA_KEY_PATH` should be readable only by certctl server process (OS permissions).
- **Rotate API keys periodically** (recommendation: annually or when personnel changes). No audit trail for API key rotation (outside certctl scope).
- **Backup private key stores** (agent key dirs, CA key file) as part of disaster recovery. Encrypt backups at rest.
- **Monitor access logs** to `/var/lib/certctl/keys` and CA key file location (use OS audit or file integrity monitoring).
-
-**Status**: **Available** (v1.0 shipped)
-
-#### 3.7.3 — Key Rotation
-
-**Requirement**: Rotate cryptographic keys upon expiration or compromise.
-
-**certctl Support**:
-
- **Automated Certificate Renewal** — Renewal policies trigger certificate renewal automatically:
-  - Background scheduler checks every 60 minutes (configurable via `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL`).
-  - For each policy, evaluates all managed certificates: if `(not-after - now) <= policy.renewal_threshold_days`, trigger renewal.
-  - Renewal job created in AwaitingCSR state; agent receives work, generates new key pair, submits new CSR.
-  - Issuer connector signs new CSR with new key; old key discarded by agent after new certificate installed.
-  - New certificate deployed to target via deployment job.
-
- **Expiration-Based Rotation** — Certificate profiles (M11a) define `max_ttl_seconds` (e.g., 31536000 for 1 year, 3600 for short-lived certs):
-  - Short-lived certificates (TTL < 1 hour) rotate every deployment cycle, providing defense-in-depth (RFC 5280 revocation not needed).
-  - Longer-lived certs (90/180/365 days) rotated via renewal policy thresholds (30/14/7 day alerts).
-
- **Renewal Audit Trail** — Every renewal recorded:
-  - `GET /api/v1/audit?type=certificate_renewed&resource_id={cert_id}` shows each renewal, old serial, new serial, issuer, actor.
-
-**Evidence You Can Provide**:
- Renewal policy configuration: `GET /api/v1/policies` showing `renewal_threshold_days` and `alert_thresholds_days`.
- Renewal job history: `GET /api/v1/jobs?type=Renewal&status=Completed` with timestamp, before/after serial numbers.
- Certificate version history: `GET /api/v1/certificates/{id}/versions` showing all issued versions, dates, issuers.
- Audit trail: `GET /api/v1/audit?type=certificate_renewed` for trending and compliance reporting.
-
-**Operator Responsibility**:
- **Define renewal policies for all certificate profiles** with appropriate thresholds (typically 30 days before expiration for 90+ day certs, more aggressive for shorter-lived).
- **Monitor renewal job success** via dashboard (M14 charts show renewal success trends) and alerts.
- **Investigate renewal failures** (stuck AwaitingCSR, issuer connectivity, deployment errors) promptly to avoid expired certificates.
- **Test renewal workflow in staging environment** before rolling out to production.
- **Document key rotation schedule** for your organization (renewal policy thresholds, approval workflows if AwaitingApproval).
-
-**Status**: **Available** (v1.0 shipped)
-
-#### 3.7.4 — Key Destruction
-
-**Requirement**: Render cryptographic keys unreadable and unusable when they reach the end of their cryptographic lifetime.
-
-**certctl Support**:
-
- **Certificate Revocation API** (M15a) — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
-  - `unspecified` — general revocation
-  - `keyCompromise` — suspected key compromise
-  - `caCompromise` — CA compromise
-  - `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn` — lifecycle management
-  - Revocation recorded in `certificate_revocations` table with timestamp and reason.
-  - Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
-  - Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
-
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
-  - CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
-  - OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
-  - Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
-  - Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
-
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
-
- **Private Key Destruction on Agent** — When certificate renewed or revoked:
-  - Agent removes old private key file from `CERTCTL_KEY_DIR` when new certificate deployed.
-  - Job status tracking confirms old key is no longer needed.
-  - No audit trail of key deletion (private keys don't pass through control plane).
-
-**Evidence You Can Provide**:
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
-
-**Operator Responsibility**:
- **Revoke certificates immediately upon key compromise suspicion** using reason code `keyCompromise`.
- **Revoke certificates at end of lifecycle** (host decommissioning, service sunset) using reason code `cessationOfOperation`.
- **Monitor CRL/OCSP availability** — ensure clients can check revocation status (test with TLS validator tools).
- **Establish certificate revocation procedure** (who can revoke, approval workflow if required, documentation).
- **Physically destroy backup private keys** (if offline backups are kept) when certificate is revoked or after archival period expires.
- **Test revocation workflow in staging** — issue test cert, revoke, verify OCSP/CRL reflects revocation within SLA.
-
-**Status**: **Available** (v1.0 shipped)
-
---
-
-## Requirement 8: Identify and Authenticate
-
-**Objective**: Limit access to system components and cardholder data by business need-to-know, and authenticate and manage all access.
-
-### 8.3 — Strong Authentication
-
-**Requirement**: Authentication mechanisms must use strong cryptography and render authentication credentials (passwords, passphrases, keys) unreadable during transmission and storage.
-
-**certctl Support**:
-
- **API Key Authentication** — All REST API endpoints require authentication (default):
-  - Bearer token format: `Authorization: Bearer sk-...`
-  - Key stored as SHA-256 hash in database (plaintext never persisted).
-  - Comparison uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks.
-  - Configuration: `CERTCTL_AUTH_TYPE=api-key` (enforced by default, no opt-out without explicit env var).
-
- **GUI Authentication Context** — Web dashboard login flow:
-  - Login page (`/login`) accepts API key entry.
-  - AuthProvider context stores API key in session (localStorage in browser, sent in Authorization header for all API calls).
-  - 401 Unauthorized responses trigger automatic redirect to login.
-  - Logout button clears session.
-  - No session server-side (stateless API).
-
- **Credential Transmission** — All API traffic over TLS:
-  - HTTPS enforced at server level (no plaintext HTTP).
-  - API key transmitted in Authorization header (not URL parameter, not cookie).
-  - Browser to server: TLS.
-  - Agent to server: TLS.
-  - No credential logging (audit records the per-key actor `Name`, never the Bearer token; logs redact the `Authorization` header).
-
-**Evidence You Can Provide**:
- API configuration: `CERTCTL_AUTH_TYPE=api-key` in deployment manifest.
- Key inventory: `CERTCTL_API_KEYS_NAMED` env var (format `name:key:admin,...`) — seeds the in-memory `NamedAPIKey{Name, Key, Admin}` struct at `internal/api/middleware/middleware.go:29`. Keys are constant-time-compared (`subtle.ConstantTimeCompare`) against the Bearer token. No database table stores them; protect the env var contents at rest via a secrets manager (Vault / AWS Secrets Manager / Kubernetes Secrets / Docker Secrets).
- API audit log: `GET /api/v1/audit?action=api_call` showing per-key actor names (`Name` field of matched `NamedAPIKey`) on every call, with zero plaintext or hashed key material recorded.
- TLS certificate on control plane: `openssl s_client -connect {server}:8443` showing valid certificate, TLS 1.2+, strong cipher.
- GUI login flow: browser network tab showing Authorization header (token value redacted in compliance report).
-
-**Operator Responsibility**:
- **Issue API keys to users/systems** requiring API access (outside certctl; you maintain key registry).
- **Rotate API keys using zero-downtime rotation** — `CERTCTL_AUTH_SECRET` supports comma-separated keys (e.g., `new-key,old-key`). Add the new key, migrate clients, then remove the old key. Recommendation: rotate at least annually, or immediately when personnel changes.
- **Revoke API keys immediately** when user leaves or token is compromised (set `enabled=false` in API key management — not yet implemented in v1, owner must track manually).
- **Enforce strong TLS** on control plane: TLS 1.2+, modern ciphers (configure on reverse proxy or `CERTCTL_TLS_*` env vars if operator-controlled).
- **Protect `.env` and credential files** where API key is defined (restrict file system access, no version control).
- **Monitor API audit trail** for suspicious access patterns (many 401 errors, access from unexpected IPs, etc.).
-
-**Status**: **Available** (v1.0 shipped)
-
-### 8.6 — Application Account Management
-
-**Requirement**: Users' system access must be restricted to the minimum level of application functions or data needed to perform duties. Application accounts (non-human) must use strong authentication.
-
-**certctl Support**:
-
- **No Application Account Management in v1** — Certctl does not manage user accounts (no user directory, LDAP, OIDC).
-  - All authentication via API key (service-to-service or human user with API key).
-  - No per-user roles or permissions (that's V3 RBAC feature).
-  - Single API key shared across team or one key per automation script (operator's responsibility to manage).
-
- **Credentials Not in Source Code** — Security hardening:
-  - API keys via `CERTCTL_API_KEY` env var (not in `main.go`, Dockerfile, `docker-compose.yml`).
-  - Database credentials via `CERTCTL_DATABASE_URL` in `.env` (git-ignored).
-  - CA private key path via `CERTCTL_CA_CERT_PATH`/`CERTCTL_CA_KEY_PATH` (not inline).
-
- **Service Account Isolation** (planned for V3) — Future RBAC will support:
-  - Automation script API keys with scoped permissions (e.g., read-only, renew-only, deploy-only).
-  - OIDC/SSO for human users with fine-grained role assignment (admin, operator, viewer).
-  - Audit trail showing which account/role performed each action.
-
-**Evidence You Can Provide**:
- Deployment manifest (Dockerfile, docker-compose.yml) showing no hardcoded API keys, database credentials, or CA key paths.
- `.env` file existence (confirm via CI or compliance check, without sharing contents).
- `.gitignore` configuration showing `.env`, `*.key`, secrets excluded.
- Code review: grep `main.go`, `config.go` for `CERTCTL_API_KEY` — should only see env var reference, not hardcoded values.
-
-**Operator Responsibility**:
- **Manage API keys externally** (issue, rotate, revoke).
- **Document who/what has API key access** (automation scripts, team members, third-party integrations).
- **Rotate application credentials** (API keys, database passwords) according to your organization's policy.
- **Segregate credentials** — one API key per automation script where possible, or use V3 RBAC scoping.
- **Monitor application account usage** via audit trail — `GET /api/v1/audit` filtered by action/actor.
-
-**Status**: **Available in part** (v1.0: credentials out of source code). **Planned V3**: scoped API keys and RBAC.
-
---
-
-## Requirement 10: Log and Monitor
-
-**Objective**: Log and monitor access to network resources and cardholder data.
-
-### 10.2 — Implement Automated Audit Logging
-
-**Requirement**: Automatically log and monitor all access to system components and records containing cardholder data.
-
-**certctl Support**:
-
- **Immutable API Audit Log** (M19) — Middleware captures every API call:
-  - `audit_events` table (append-only, no UPDATE/DELETE):
-    - `method`: HTTP method (GET, POST, PUT, DELETE)
-    - `path`: API endpoint path only, excluding query parameters (e.g., `/api/v1/certificates` — query strings intentionally omitted to prevent sensitive data persistence in the append-only audit trail)
-    - `actor`: authenticated user/service (extracted from API key or context)
-    - `body_hash`: SHA-256 hash of request body (truncated to 16 chars, first 8 chars shown in logs)
-    - `status_code`: HTTP response status (200, 201, 400, 401, 404, 500, etc.)
-    - `latency_ms`: request duration in milliseconds
-    - `timestamp`: RFC 3339 timestamp
-
- **Certificate Lifecycle Events** — Higher-level events logged separately:
-  - `certificate_issued` — new certificate created, issuer, profile, profile ID
-  - `certificate_renewed` — certificate renewed, old/new serial, renewal policy
-  - `certificate_revoked` — certificate revoked, RFC 5280 reason code
-  - `certificate_deployed` — certificate deployed to target, agent, target type
-  - `certificate_validated` — validation job result (success/failure reason)
-
- **Job Lifecycle Events** — Job status transitions:
-  - `job_created` — renewal/issuance/deployment/validation job created
-  - `job_status_updated` — job state change (Pending → AwaitingCSR → Running → Completed/Failed)
-
- **Policy and Configuration Events** — Administrative changes:
-  - `policy_created`, `policy_updated`, `policy_deleted` — renewal policy changes
-  - `profile_created`, `profile_updated`, `profile_deleted` — certificate profile changes
-  - `issuer_created`, `issuer_deleted` — CA connector registration changes
-
- **Excluded Paths** — Health/readiness probes not logged to reduce noise:
-  - `GET /health` (excluded by default)
-  - `GET /ready` (excluded by default)
-  - Configurable via `CERTCTL_AUDIT_EXCLUDE_PATHS` env var
-
-**Evidence You Can Provide**:
- Audit trail export: `GET /api/v1/audit` or manual database query, showing sample events with timestamp, actor, action, resource.
- API call audit log: Query `audit_events` table showing method, path, actor, status code for last 24-48 hours.
- Configuration changes: `GET /api/v1/audit?type=policy_created,policy_updated,issuer_created` showing who changed what and when.
- Certificate lifecycle: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete issuance → deployment → renewal/revocation history.
-
-**Operator Responsibility**:
- **Enable audit logging** — it's on by default; verify `CERTCTL_AUDIT_EXCLUDE_PATHS` is not set to exclude certificate-related paths.
- **Monitor audit log growth** — `audit_events` table will grow with every API call. Recommend database maintenance (log rotation policy, archival after 90 days, etc.).
- **Export and archive audit logs** — periodically `SELECT * FROM audit_events WHERE timestamp > {date}` and export to secure storage (S3, syslog, SIEM).
- **Establish audit review procedure** — QSA may request sample of logs; have export process documented.
- **Test audit logging** — make API call, verify event appears in audit trail within seconds.
-
-**Status**: **Available** (M19 shipped)
-
-### 10.3 — Protect Audit Trail
-
-**Requirement**: Promptly protect audit trail files from unauthorized modifications.
-
-**certctl Support**:
-
- **Append-Only Database Design** — PostgreSQL triggers and constraints prevent modification:
-  - `audit_events` table has no `UPDATE` or `DELETE` triggers.
-  - Application code never executes UPDATE/DELETE on `audit_events`.
-  - Primary key is `id` (serial); new events always INSERT.
-
- **Read-Only API Access** — Audit events accessible only via read (`GET /api/v1/audit`):
-  - No `POST /api/v1/audit/{id}` endpoint (no creation from API).
-  - No `PUT /api/v1/audit/{id}` endpoint (no modification).
-  - No `DELETE /api/v1/audit/{id}` endpoint (no deletion).
-  - Only control plane can record events (via internal service layer, not exposed API).
-
- **Database Access Control** (operator responsibility) — PostgreSQL user permissions:
-  - `certctl` application user: INSERT, SELECT on `audit_events`.
-  - `certctl_read_only` user (for compliance/audit team): SELECT only on `audit_events`.
-  - `postgres` superuser: restricted to DBA operations, logged separately by PostgreSQL.
-
-**Evidence You Can Provide**:
- Database schema: `\d audit_events` showing columns, primary key, no UPDATE/DELETE triggers.
- Application code review: `internal/service/audit.go` showing `RecordEvent(...)` as only INSERT operation.
- API endpoint audit: grep `internal/api/handler/audit*.go` or `internal/api/router/router.go` — no PUT/DELETE routes for events.
- PostgreSQL permissions: `psql -d certctl -c "\dp audit_events"` showing INSERT/SELECT grants only.
-
-**Operator Responsibility**:
- **Restrict database access** — issue read-only PostgreSQL user for compliance/audit team (no write privileges).
- **Enable PostgreSQL query logging** — log all database connections and operations for DBA audit trail.
- **Backup audit logs** — regularly export `audit_events` to offsite storage (S3, archive tape, syslog aggregator) for long-term retention.
- **Monitor database modifications** — alert if any UPDATE/DELETE is attempted on `audit_events` (log-based alerting or PostgreSQL event triggers).
- **Encrypt audit exports** — if archiving to external storage, encrypt backups at rest.
-
-**Status**: **Available** (v1.0 shipped)
-
-### 10.4 — Promptly Review and Address Audit Trail Exceptions
-
-**Requirement**: Promptly review audit logs and investigate exceptions/anomalies.
-
-**certctl Support**:
-
- **Dashboard Charts** (M14) — Real-time observability:
-  - **Renewal Success Trends** (30-day line chart) — shows job success rate; spikes in failures warrant investigation.
-  - **Certificate Status Distribution** (donut chart) — shows Expiring/Expired counts; high Expired = missed renewals.
-  - **Expiration Timeline** (90-day weekly heatmap) — shows upcoming expirations; bunching = renewal policy tuning needed.
-  - **Issuance Rate** (30-day bar chart) — shows certificate creation/renewal activity; anomalies (zero issuances for weeks) indicate stopped automation.
-
- **Stats API** (M14) — Machine-readable trends:
-  - `GET /api/v1/stats/job-trends?days=30` — renewal/issuance/deployment success/failure counts per day.
-  - `GET /api/v1/stats/summary` — total certs, counts by status.
-  - `GET /api/v1/stats/expiration-timeline?days=90` — expiration buckets for forecasting.
-
- **Agent Fleet Overview** (M14) — Agent health visibility:
-  - Pie chart: agent status distribution (healthy, offline, error).
-  - Version breakdown: agent versions in use (identify outdated agents).
-  - Per-agent detail: last heartbeat timestamp, OS/architecture, IP address, recent jobs.
-
- **Alert Notifications** (M3, M16a) — Configurable escalation:
-  - Email alerts: certificate approaching expiration, renewal failure, revocation notification.
-  - Webhook: custom HTTP POST to your monitoring system (Slack, Teams, PagerDuty, OpsGenie, custom webhook).
-  - **Retry & Dead-Letter Queue** (I-005) — Transient notifier failures (SMTP timeout, webhook 5xx) are retried with exponential backoff (`2^n` minutes capped at 1h, 5-attempt budget) before landing in the terminal `dead` status. Operators monitor DLQ depth via the `certctl_notification_dead_total` Prometheus counter and requeue via the Notifications page Dead letter tab once the underlying outage is resolved. Closes the pre-I-005 silent-drop gap where a single 5xx could lose a compliance-relevant alert without evidence.
-  - Deduplication: one alert per threshold/certificate per day (avoid alert fatigue).
-
- **Audit Trail Filtering and Export** (M13) — Compliance reporting:
-  - `GET /api/v1/audit?actor={user}&timestamp_after={date}` — filter audit log by actor, timestamp, type.
-  - Export CSV/JSON via dashboard: audit page → select filters → "Export CSV" or "Export JSON".
-  - Can export full audit trail for QSA review.
-
-**Evidence You Can Provide**:
- Dashboard screenshots: expiration timeline, renewal success trends, status distribution.
- Job trend report: `GET /api/v1/stats/job-trends?days=90` showing success/failure rates.
- Agent fleet health: `GET /api/v1/agents` showing heartbeat status, version count distribution.
- Audit log sample: `GET /api/v1/audit?limit=100` showing certificate issuance/renewal/revocation activity.
- Alert configuration: screenshot of renewal policy `alert_thresholds_days` (30, 14, 7, 0) and notifier settings (email, Slack, etc.).
-
-**Operator Responsibility**:
- **Review dashboard charts weekly** — look for anomalies (high Expired count, failure spike, renewal stalled).
- **Respond to alerts promptly** — expiration alert = investigate renewal (check job logs, issuer connectivity, agent heartbeat).
- **Set alert thresholds appropriately** — default 30/14/7/0 days is a starting point; adjust per your SLA and staffing.
- **Maintain alert distribution list** — ensure alerts reach the right on-call engineer/team.
- **Archive and review audit logs** — export monthly/quarterly for compliance trending (e.g., "all certificate changes last quarter").
- **Test alert delivery** — trigger a test renewal failure or manual revocation, verify alert is sent.
-
-**Status**: **Available** (v1.0 shipped, M14 observable charts, M19 audit log)
-
-### 10.7 — Retain and Protect Audit Trail History
-
-**Requirement**: Retain audit trail history for at least one year and ensure it can be retrieved.
-
-**certctl Support**:
-
- **Immutable Audit Trail** (M19) — `audit_events` table stores all API calls and certificate lifecycle events with timestamps.
- **No Automatic Purge** — Certctl does not delete audit events. They remain in PostgreSQL indefinitely.
- **Queryable History** — All events accessible via `GET /api/v1/audit` with time range, actor, resource filters.
-
-**Evidence You Can Provide**:
- Database retention policy: confirm `audit_events` table has no DELETE triggers or maintenance jobs that purge events.
- Sample audit query: `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '365 days'` showing one year+ of events.
- Export procedure: documented process for exporting audit logs to cold storage (S3, archive tape, syslog).
-
-**Operator Responsibility**:
- **Configure PostgreSQL backup/retention** — certctl relies on database backups for audit trail protection.
-  - Backup `audit_events` table daily or per your RPO/RTO.
-  - Retain backups for at least 1 year (configure retention policy on backup system).
-  - Test restore procedure annually.
-
- **Export and archive audit logs** — periodically export `SELECT * FROM audit_events WHERE timestamp > {start_date}` to offsite storage.
-  - Recommendation: monthly exports to S3 with versioning enabled.
-  - Encrypt exports at rest.
-  - Retain archives for at least 3 years (adjust per your compliance requirements).
-
- **Monitor audit log growth** — `audit_events` table will grow ~1-5 MB/day depending on API call volume.
-  - Estimate: 10,000 API calls/day = ~50 MB/month.
-  - Plan PostgreSQL storage and backup capacity accordingly.
-
-**Status**: **Available** (v1.0 shipped)
-
---
-
-## Requirement 6: Develop and Maintain Secure Systems and Applications
-
-**Objective**: Develop and maintain secure systems and applications.
-
-### 6.3.1 — Security Coding Practices
-
-**Requirement**: Develop all custom application code in accordance with secure coding practices and include authentication, access control, input validation, and error handling.
-
-**certctl Support**:
-
- **Input Validation** — Centralized validators enforce strong input constraints:
-  - Common name: max 253 chars, DNS-safe characters only, no leading/trailing hyphens.
-  - CSR PEM: must be valid PEM format (regex validation).
-  - Policy type: whitelist enum (Issuance, Renewal, Revocation, etc.).
-  - API key: alphanumeric + hyphens only.
-  - Implemented in `internal/domain/validation.go` and called from all handler layer inputs.
-
- **Error Handling** — No sensitive data leakage in error responses:
-  - HTTP 500 errors return generic "Internal Server Error" message, not stack trace.
-  - Database errors logged internally (structured slog), not exposed to client.
-  - 404 errors do not reveal whether resource exists (consistent "Not Found" regardless of auth vs. not-found).
-
- **No Hardcoded Credentials** — All secrets via environment variables:
-  - `CERTCTL_API_KEY`, `CERTCTL_DATABASE_URL`, `CERTCTL_CA_KEY_PATH` — env vars only.
-  - Credentials not in `main.go`, Dockerfile, `docker-compose.yml`, or Git history.
-  - `.env` file git-ignored and excluded from version control.
-
- **Dependency Management** — Go module pinning (`go.mod`):
-  - All external dependencies pinned to specific versions.
-  - No wildcard versions or `latest` tags.
-  - CI runs `go mod verify` to detect tampering.
-
-**Evidence You Can Provide**:
- Code review: `internal/domain/validation.go` showing input validation functions (Common name length, CSR PEM, policy type, etc.).
- Error handling audit: `internal/api/handler/certificates.go` showing HTTP error responses (no stack traces).
- Credentials in source code check: `grep -r "CERTCTL_API_KEY\|DATABASE_URL\|CA_KEY" cmd/ internal/ | grep -v ".env"` (should only show env var references, not values).
- `go.mod` review: no wildcard versions, all pinned.
- CI workflow: `.github/workflows/ci.yml` showing `go mod verify` step.
-
-**Operator Responsibility**:
- **Review dependency updates** — keep Go version current, update certctl dependencies regularly (security patches).
- **Scan container images** — use Trivy, Clair, or similar to scan Docker images for known vulnerabilities.
- **Maintain secure coding practices** in any custom issuer/target connectors you deploy (scripts for OpenSSL, BASH/PowerShell for IIS/F5).
-
-**Status**: **Available** (v1.0 shipped)
-
-### 6.5.10 — Broken Authentication and Cryptography Prevention
-
-**Requirement**: Prevent broken authentication and cryptography weaknesses.
-
-**certctl Support**:
-
- **Authentication** — API key with SHA-256 hashing, constant-time comparison (`crypto/subtle.ConstantTimeCompare`).
- **Cryptography** — Go's `crypto/*` standard library (no weak ciphers). ECDSA P-256, RSA 2048+.
- **TLS** — HTTPS enforced (no plaintext HTTP endpoints).
- **No Sessions** — Stateless API (no session cookies, no session fixation risk).
-
-**Status**: **Available** (v1.0 shipped)
-
---
-
-## Requirement 7: Restrict Access by Business Need-to-Know
-
-**Objective**: Limit access to system components and cardholder data by business need-to-know and ensure users are authenticated and authorized.
-
-### 7.2 — Implement Access Control
-
-**Requirement**: Ensure proper user identity management and implement access controls based on business need-to-know.
-
-**certctl v1 Support** (limited):
- **Certificate Ownership** (M11b) — Each certificate assigned to owner (person + email) and optional team. Ownership is metadata; access control is not enforced at API level.
- **Agent Groups** (M11b) — Renewal policies target specific agent groups (OS, architecture, CIDR, version). Groups are used for policy targeting, not user access control.
- **Interactive Approval** (M11b) — `AwaitingApproval` job state allows manual approval/rejection of renewals (enforcement of business workflows, not user access control).
-
-**certctl v3 Support** (planned):
- **OIDC/SSO** — Okta, Azure AD, Google integration. Users log in via identity provider.
- **Role-Based Access Control (RBAC)** — Three roles: admin (all operations), operator (issue/renew/deploy), viewer (read-only). Roles assigned via OIDC claims or group membership.
- **Profile/Owner Gating** — Operator can renew only certificates assigned to their team; viewer cannot modify anything.
- **Audit Trail Attribution** — Every action shows which user/role performed it.
-
-**Evidence You Can Provide** (v1):
- Certificate ownership mapping: `GET /api/v1/certificates` showing owner, team fields (metadata only; access not controlled).
- Agent group targeting: `GET /api/v1/policies` showing `agent_group_id` field.
- Interactive approval workflow: job detail showing `AwaitingApproval` state, approve/reject endpoints in API docs.
-
-**Operator Responsibility** (v1):
- **Manage API key distribution** externally — only issue API keys to authorized users/systems.
- **Implement reverse proxy auth** (Nginx, Apache, Okta proxy) in front of certctl to enforce OIDC/LDAP (outside certctl).
- **Plan for V3 RBAC** — budget for upgrade when finer-grained access control is needed.
-
-**Planned** (V3):
- Upgrade to certctl Pro with OIDC/RBAC and per-role audit trail.
-
-**Status**: **Available in part** (v1.0: ownership metadata, agent group targeting). **Planned V3**: OIDC/RBAC enforcement.
-
---
-
-## Evidence Summary Table
-
-| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
-|---|---|---|---|---|---|
-| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
-| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
-| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
-| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
-| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
-| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
-| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
-| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
-| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
-| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
-| **10.3** Audit Protection | Append-only table design, read-only API, DB permissions | API endpoint audit (no PUT/DELETE on events), DB schema | `audit_events` table, PostgreSQL GRANT SELECT | Immutable by design | Available |
-| **10.4** Review & Alert | Dashboard charts, stats API, notifier integrations | Dashboard (renewal trends, status pie, expiration heatmap), `GET /api/v1/stats/*` | Job results, alert config in policies | `GET /api/v1/audit?type=job_*` | Available |
-| **10.7** Retention | 1+ year in PostgreSQL, export/archive procedures | Database query `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '1 year'` | `audit_events` table retention (no auto-delete) | Manual export/archival (operator) | Available |
-| **6.3.1** Secure Coding | Input validation, error handling, no hardcoded secrets, dependency pinning | Code review (validation.go, handlers), error responses | `go.mod` with pinned versions, `.gitignore` | GitHub Actions CI with `go mod verify` | Available |
-| **7.2** Access Control | Ownership metadata, agent groups, interactive approval | `GET /api/v1/certificates` (owner/team), `GET /api/v1/agent-groups` | Certificate owner/team fields, agent group criteria | User identity from auth context | Available in part (V3: RBAC) |
-
---
-
-## Operator Responsibilities
-
-The following control objectives are **outside certctl's scope** and must be managed by your organization:
-
-| Control Objective | Responsibility | Example Actions |
-|---|---|---|
-| **Network Segmentation** | Isolate certctl control plane from cardholder network | Place certctl on separate VLAN, firewall rules |
-| **Physical Security** | Restrict access to servers/databases | Data center access controls, logging |
-| **Personnel Screening** | Background checks for staff with access | HR/employment verification |
-| **Access Control Enforcement** | User authentication & authorization outside API | Implement reverse proxy with OIDC (V3: use certctl Pro RBAC) |
-| **Incident Response** | Procedures for certificate compromise or breach | Document key revocation process, alert escalation |
-| **Disaster Recovery** | Backup and restore procedures | Database backup schedule, offsite replication |
-| **Change Management** | Approval process for config/cert changes | CAB meetings, documented procedures |
-| **Vulnerability Scanning** | ASV scanning, penetration testing, code review | Annual PCI-DSS penetration test |
-| **Key Backup & Escrow** | Secure offline storage of CA private keys (if required) | Hardware security module (HSM) or encrypted vault |
-| **Audit Log Retention** | Long-term archival and protection of audit logs | Export to S3/syslog, retain 3+ years |
-| **QSA Engagement** | Schedule and coordination of compliance assessment | Annual audit with qualified security assessor |
-
---
-
-## V3 Enhancements for PCI-DSS
-
-Certctl v3 (Pro) adds paid features that strengthen PCI-DSS compliance posture:
-
-| Feature | PCI-DSS Benefit |
-|---|---|
-| **OIDC/SSO Authentication** | Centralized identity management, audit integration with corporate directory |
-| **Role-Based Access Control (RBAC)** | Least-privilege enforcement: admin, operator, viewer roles with profile/team gating |
-| **Bulk Revocation by Profile/Owner/Agent** | Rapid incident response (revoke all certs in cardholder network in minutes) |
-| **NATS Event Bus with JetStream Audit Streaming** | Real-time event streaming to SIEM (Splunk, ELK, Datadog) for centralized audit trail |
-| **Certificate Health Scores** | Proactive risk identification (composite scoring: expiration proximity, rotation age, key strength) |
-| **Advanced Search DSL** | Complex audit queries (POST /search with nested AND/OR, regex, field projection) for compliance reporting |
-| **CT Log Monitoring** | Detect unauthorized certificate issuance (security vulnerability detection) |
-| **DigiCert Issuer Connector** | Enterprise CA integration for compliance audits |
-
---
-
-## Next Steps for Compliance
-
-1. **Review this mapping with your QSA** — Confirm which requirements apply to your cardholder data environment.
-
-2. **Configure certctl for your environment**:
-   - Set `CERTCTL_KEYGEN_MODE=agent` in production.
-   - Define certificate profiles with approved key types.
-   - Configure renewal policies with appropriate thresholds (e.g., 30 days for 90-day certs).
-   - Enable notifier integrations (email, Slack, PagerDuty) for alerts.
-   - Plan `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate locations.
-
-3. **Implement operator controls**:
-   - Document certificate management procedures (issuance, renewal, revocation, archival).
-   - Establish API key rotation schedule.
-   - Set up audit log export and archival (monthly to S3, retain 1+ year).
-   - Configure PostgreSQL backups (daily, 1+ year retention).
-   - Plan incident response (who revokes certs, escalation process, timeline).
-
-4. **Test compliance readiness**:
-   - Trigger a test renewal and verify CRL/OCSP publication.
-   - Export audit trail and verify it shows expected events.
-   - Test revocation workflow and confirm OCSP reflects status within 24 hours.
-   - Run discovery scan and verify unknown certs are detected and triaged.
-
-5. **Prepare evidence for QSA**:
-   - API endpoint documentation (OpenAPI spec: `api/openapi.yaml`).
-   - Audit log sample (last 90 days of events).
-   - Configuration export (profiles, policies, issuer/target definitions).
-   - Deployment manifest (showing env var config, no hardcoded secrets).
-   - Test certificates and CRL/OCSP query results.
-
-6. **Plan for V3** (if RBAC/centralized audit required):
-   - Evaluate certctl Pro for OIDC/SSO and NATS audit streaming.
-   - Assess integration with existing identity provider (Okta, Azure AD, etc.).
-
---
-
-## Questions?
-
-For additional guidance on certctl features and PCI-DSS mapping:
- Review the [Architecture Guide](architecture.md) for system design.
- Check [Connectors Documentation](connectors.md) for issuer/target/notifier capabilities.
- Run the [Quick Start Guide](quickstart.md) to see features in action.
- Consult your QSA for final compliance determination.
-
-**Last Updated**: March 24, 2026 (certctl v1.0 with M18b discovery and M19 audit logging)
@@ -1,587 +0,0 @@
-# SOC 2 Type II Compliance Mapping
-
-This guide maps certctl's implemented features to AICPA SOC 2 Trust Service Criteria (TSC). It is **not a SOC 2 certification claim** — rather, it helps security engineers, auditors, and evaluators understand how certctl supports your organization's SOC 2 compliance posture. Use this as evidence input for your own control assessment during SOC 2 audits.
-
-## How to Use This Guide
-
-SOC 2 audits require evidence that your infrastructure meets specific Trust Service Criteria. Auditors ask: "Does your certificate management tooling support CC6.1 logical access controls?" This guide answers by mapping certctl's features to specific criteria and pointing to evidence (API endpoints, configuration, audit trail).
-
-Each section includes:
-
- **The TSC requirement** — what the auditor is looking for
- **certctl's implementation** — which features address it
- **Evidence location** — where to find proof (API endpoint, config variable, source code, audit events)
- **V2 vs V3 status** — whether feature is in the free community edition (V2) or paid Pro edition (V3)
- **Operator responsibility** — aspects your organization must handle outside of certctl
-
-## Contents
-
-1. [How to Use This Guide](#how-to-use-this-guide)
-2. [CC6: Logical and Physical Access Controls](#cc6-logical-and-physical-access-controls)
-   - [CC6.1 — Logical Access Security](#cc61--logical-access-security)
-   - [CC6.2 — Prior to Issuing System Credentials](#cc62--prior-to-issuing-system-credentials)
-   - [CC6.3 — Authentication Policies](#cc63--authentication-policies)
-   - [CC6.7 — Information Transmission Protection](#cc67--information-transmission-protection)
-3. [CC7: System Operations](#cc7-system-operations)
-   - [CC7.1 — System Monitoring](#cc71--system-monitoring)
-   - [CC7.2 — Anomaly Detection](#cc72--anomaly-detection)
-   - [CC7.3 — Incident Response](#cc73--incident-response)
-   - [CC7.4 — Identify and Develop Risk Mitigation Activities](#cc74--identify-and-develop-risk-mitigation-activities)
-4. [A1: Availability](#a1-availability)
-   - [A1.1/A1.2 — Availability and Recovery](#a11a12--availability-and-recovery)
-5. [CC8: Change Management](#cc8-change-management)
-   - [CC8.1 — Change Control](#cc81--change-control)
-6. [Evidence Summary Table](#evidence-summary-table)
-7. [What Requires Operator Action](#what-requires-operator-action)
-8. [V3 Enhancements](#v3-enhancements)
-9. [Conclusion](#conclusion)
-
-## CC6: Logical and Physical Access Controls
-
-### CC6.1 — Logical Access Security
-
-**Requirement**: The entity restricts logical access to digital and information assets and related facilities by applying user identity authentication, registration, access rights, and usage policies.
-
-**certctl Implementation** (V2 — Community Edition):
-
- **API Key Authentication** — All `/api/v1/*` calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
- **Standards-based enrollment and PKI distribution endpoints** — EST (`/.well-known/est/*`, RFC 7030), SCEP (`/scep`, `/scep/*`, RFC 8894), and CRL/OCSP (`/.well-known/pki/crl/{issuer_id}`, `/.well-known/pki/ocsp/{issuer_id}/{serial}`, RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer because these protocols cannot present certctl Bearer tokens. Authentication is enforced in-protocol: EST relies on CSR signature verification plus profile policy (RFC 7030 §3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous); SCEP requires a shared `challengePassword` in the PKCS#10 CSR attributes (OID 1.2.840.113549.1.9.7, RFC 8894 §3.2), validated with `crypto/subtle.ConstantTimeCompare`; CRL and OCSP are intentionally anonymous for relying-party accessibility. CWE-306 (missing authentication for a critical function) is closed for SCEP by `preflightSCEPChallengePassword` in `cmd/server/main.go`, which refuses to start the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware) and is pinned by the 27-subtest regression harness at `cmd/server/finalhandler_test.go`.
- **GUI Authentication** — Web dashboard includes login screen requiring API key entry. Failed auth redirects to login on 401. Auth context persists across page navigation. Logout clears session.
- **Configurable CORS** — API restricts cross-origin requests via `CERTCTL_CORS_ORIGINS` allowlist or wildcard. Preflight caching prevents chatty browser auth flows.
- **Token Bucket Rate Limiting** — Per-IP rate limiting (configurable via `CERTCTL_RATE_LIMIT_RPS` / `CERTCTL_RATE_LIMIT_BURST`) returns 429 Too Many Requests with Retry-After header. Prevents credential stuffing and brute-force attacks.
- **No Password Storage** — certctl does not store user passwords. API keys are the sole authentication mechanism. Your API key generation, distribution, and rotation policies are your responsibility (see "Operator Responsibility" below).
- **Zero-Downtime Key Rotation** — `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `new-key,old-key`). All listed keys are validated with constant-time comparison. Operators can add a new key, migrate clients, then remove the old key — no service restart required for the client migration phase. A single-key warning is logged at startup to encourage rotation configuration.
-
-**Evidence Locations**:
-
- API auth implementation: `internal/api/middleware/auth.go`
- Auth check endpoint: `GET /api/v1/auth/check` (validates credentials)
- Auth info endpoint: `GET /api/v1/auth/info` (returns current auth mode, served without auth so GUI detects mode)
- Rate limiting middleware: `internal/api/middleware/rate_limit.go`
- CORS configuration: `cmd/server/main.go`, search for `CERTCTL_CORS_ORIGINS`
- Final handler dispatch (authenticated vs. unauthenticated routing): `cmd/server/main.go:buildFinalHandler`
- SCEP preflight gate (CWE-306 closure): `cmd/server/main.go:preflightSCEPChallengePassword`
- SCEP service-layer defense-in-depth (rejects enrollment on empty challenge password, `crypto/subtle.ConstantTimeCompare`): `internal/service/scep.go`
- Final handler dispatch regression harness (27 subtests): `cmd/server/finalhandler_test.go`
- OpenAPI spec `security: []` overrides on unauthenticated paths: `api/openapi.yaml` (EST `/cacerts`, `/simpleenroll`, `/simplereenroll`, `/csrattrs`; SCEP `/scep` GET+POST; PKI `/crl/{issuer_id}`, `/ocsp/{issuer_id}/{serial}`)
-
-**V3 Enhancement**:
-
- **OIDC / SSO Integration** — Optional OIDC providers (Okta, Azure AD, Google) with multi-tenant support. API key fallback for service accounts.
- **API Key Scoping** — Per-resource or per-action permissions (e.g., "read certificates from production only" or "issue certs, no revoke")
-
-**Operator Responsibility**:
-
- Generate and securely distribute API keys to authorized users and systems
- Rotate API keys regularly (recommend quarterly)
- Revoke API keys immediately upon employee departure
- Do not commit API keys to version control (use `.env` or secrets management)
- Implement your own IP allowlisting at the firewall if needed (certctl enforces CORS at the HTTP layer, not at network layer)
-
---
-
-### CC6.2 — Prior to Issuing System Credentials
-
-**Requirement**: The entity provisions, modifies, disables, and removes user identities and rights based on an authorization process that considers user responsibility level and changes in those responsibilities.
-
-**certctl Implementation** (V2):
-
- **Ownership Attribution** — Certificates can be assigned to an owner (email + name). Owner information is stored and audited (see CC7.2). Ownership is tracked through the lifecycle (issuance, renewal, deployment, revocation). Ownership reassignment is audited via the immutable audit trail.
- **Team Assignment** — Owners can be organized into teams. Certificate policies can route notifications to team email addresses.
- **Audit Trail Attribution** — Every API call records the actor (extracted from the API key or auth context). The audit trail is immutable — no retroactive modification of who did what.
-
-**Evidence Locations**:
-
- Ownership domain model: `internal/domain/certificate.go` (OwnerID field)
- Owner CRUD API: `GET /api/v1/owners`, `POST /api/v1/owners`, `DELETE /api/v1/owners/{id}`
- Team CRUD API: `GET /api/v1/teams`, `POST /api/v1/teams`, `DELETE /api/v1/teams/{id}`
- Audit trail API: `GET /api/v1/audit` (actor field in every record)
-
-**V3 Enhancement**:
-
- **RBAC (Role-Based Access Control)** — Predefined roles (Admin, Operator, Viewer) with profile-gated permissions. Administrators manage role assignments.
-
-**Operator Responsibility**:
-
- Map certctl's ownership model to your organizational structure (departments, teams, on-call rotations)
- Establish a formal access request and approval process
- Remove ownership access when team members depart
- Document your access review process (audit trail shows *who* made changes, but you must justify *why*)
-
---
-
-### CC6.3 — Authentication Policies
-
-**Requirement**: The entity determines, documents, communicates, and enforces authentication policies that support the identification and authentication of authorized internal and external users and the transmission of user credentials.
-
-**certctl Implementation** (V2):
-
- **API Key Policy** — All `/api/v1/*` access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup. The standards-based enrollment and PKI distribution endpoints (EST, SCEP, CRL, OCSP) are served unauthenticated at the HTTP layer per their respective RFCs; see CC6.1 for the full authentication contract and CWE-306 closure via `preflightSCEPChallengePassword`.
- **Agent Authentication** — Agents authenticate to the server via API keys (same mechanism as users). Agent credentials are separate from user API keys.
- **Private Key Policy** — Agent-side key generation is the default (`CERTCTL_KEYGEN_MODE=agent`). Server-side keygen (`CERTCTL_KEYGEN_MODE=server`) requires explicit configuration and logs a warning: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only".
- **Password Policy** — Not applicable; certctl uses API keys exclusively. Password management is delegated to your organization's IAM system if you integrate OIDC/SSO (V3).
-
-**Evidence Locations**:
-
- Auth type configuration: `internal/config/config.go`, `CERTCTL_AUTH_TYPE` env var
- Startup logging: `cmd/server/main.go` (logs auth mode at server startup)
- Keygen mode configuration: `internal/config/config.go`, `CERTCTL_KEYGEN_MODE` env var
- Keygen mode warning: `cmd/server/main.go` and `cmd/agent/main.go`
-
-**V3 Enhancement**:
-
- **OIDC Policy** — Mandatory MFA when OIDC is enabled
- **API Key Expiration** — Automatic key rotation policies (e.g., 90-day expiration for user keys, no expiration for long-lived service account keys)
-
-**Operator Responsibility**:
-
- Document your API key generation and distribution policy
- Establish a formal change control process for auth configuration changes
- Test authentication failures (e.g., expired keys, malformed tokens) in a non-production environment
- Integrate certctl authentication into your organization's IAM audit reports (who has API keys, when were they issued, who has revoked them)
-
---
-
-### CC6.7 — Information Transmission Protection
-
-**Requirement**: The entity restricts the transmission, movement, and removal of information in a manner that prevents unauthorized disclosure, whether through digital or non-digital means.
-
-**certctl Implementation** (V2):
-
- **TLS for Control Plane** — All API communication occurs over HTTPS (TLS 1.2+). Server uses `tls.Dial()` for outbound connections to issuers and targets. Configuration: `CERTCTL_SERVER_HOST` (default `127.0.0.1`) + `CERTCTL_SERVER_PORT` (default `8080`; Docker Compose maps to `8443`).
- **Agent-to-Server Communication** — Agents submit CSRs and heartbeats over HTTPS to the server using the same TLS stack.
- **Private Key Isolation** — Agents generate ECDSA P-256 private keys locally (`crypto/ecdsa` + `crypto/elliptic`). Private keys are never transmitted to the server — agents submit CSRs only. Private keys are stored on agent filesystem (`CERTCTL_KEY_DIR`, default `/var/lib/certctl/keys`) with 0600 (owner read/write only) permissions. Server-side keygen mode logs a development warning; production must use agent-side keygen.
- **Certificate Storage** — Signed certificates are stored in PostgreSQL as PEM text (along with metadata). Certificates are not secrets and may be transmitted plaintext. Private keys are never stored on the control plane in production (agent-side keygen mode).
- **Deployment via Target Connectors** — Target connectors write certificates and keys to local filesystem or network appliance APIs. For NGINX/Apache httpd, files are written with restrictive permissions (0600 for keys). For F5/IIS (V3+), credentials are scoped to a proxy agent in the same network zone — the server never holds network appliance credentials.
-
-**Evidence Locations**:
-
- TLS configuration: deploy certctl behind a TLS-terminating reverse proxy (NGINX, HAProxy, or cloud load balancer) or use a TLS sidecar
- Agent keygen mode: `cmd/agent/main.go` (ECDSA key generation, filesystem storage with 0600)
- Private key handling: `internal/connector/target/nginx/nginx.go` and similar (cert/key file write)
- Server-side keygen deprecation: `internal/service/renewal.go` (log warning when enabled)
-
-**V3 Enhancement**:
-
- **Hardware Security Module (HSM) Support** — Optional HSM backend for CA key storage (SubCA and Local CA modes)
- **Secrets Rotation** — Encrypted key rotation without server restart
-
-**Operator Responsibility**:
-
- Enable TLS on the control plane in production (deploy behind a TLS-terminating reverse proxy or load balancer with valid certificates)
- Enforce TLS on agent-to-server communication via firewall rules (no cleartext HTTP)
- Protect agent filesystem key storage with:
-  - File-level permissions (already 0600)
-  - Encrypted filesystems (LUKS, BitLocker, or cloud provider equivalents)
-  - Backup encryption (keys backed up to vault or HSM, never in cleartext backups)
- Restrict PostgreSQL access to authorized services only (network isolation, authentication)
- For target systems, ensure network traffic from agents to targets is encrypted (TLS, IPsec, or VPN)
-
---
-
-## CC7: System Operations
-
-### CC7.1 — System Monitoring
-
-**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
-
-**certctl Implementation** (V2):
-
- **Health Endpoint** — `GET /health` returns 200 OK with service status. Consumed by Docker health checks and Kubernetes probes.
- **Readiness Endpoint** — `GET /ready` returns 200 OK when the database is connected and migrations are applied.
- **Background Scheduler Monitoring** — 12 background loops (8 always-on + 4 opt-in) run on a fixed schedule. Authoritative topology in `docs/architecture.md`:
-  - Renewal loop (always-on, 1 hour): scans for certificates approaching renewal threshold
-  - Job processor loop (always-on, 30 seconds): picks up pending/waiting jobs and advances their state
-  - Job retry loop (always-on, 5 minutes, `CERTCTL_SCHEDULER_RETRY_INTERVAL`): retries Failed jobs (I-001)
-  - Job timeout reaper loop (always-on, 10 minutes, `CERTCTL_JOB_TIMEOUT_INTERVAL`): fails AwaitingCSR/AwaitingApproval jobs past timeout (I-003)
-  - Agent health check loop (always-on, 2 minutes): pings agents to detect downtime
-  - Notification dispatcher loop (always-on, 1 minute): sends queued alerts
-  - Notification retry loop (always-on, 2 minutes, `CERTCTL_NOTIFICATION_RETRY_INTERVAL`): exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005)
-  - Short-lived cert expiry loop (always-on, 30 seconds): marks expired short-lived credentials
-  - Network scanner loop (opt-in, 6 hours, `CERTCTL_NETWORK_SCAN_ENABLED`): scans enabled TLS endpoints for certificate discovery
-  - Digest emailer loop (opt-in, 24 hours, `CERTCTL_DIGEST_INTERVAL`): sends scheduled certificate digest email to configured recipients
-  - Endpoint health loop (opt-in, 60 seconds, `CERTCTL_HEALTH_CHECK_INTERVAL`): continuous TLS health probes (M48)
-  - Cloud discovery loop (opt-in, 6 hours, `CERTCTL_CLOUD_DISCOVERY_INTERVAL`): cloud secret manager certificate discovery (M50)
-  Each loop includes `atomic.Bool` idempotency guards, error handling, and structured slog failure logs.
- **Metrics Endpoints** — Two formats for monitoring integration:
-  - `GET /api/v1/metrics` — JSON object with gauges, counters, and uptime for custom dashboards
-  - `GET /api/v1/metrics/prometheus` — Prometheus exposition format (`text/plain; version=0.0.4`) for native scraping by Prometheus, Grafana Agent, Datadog, and other OpenMetrics-compatible collectors
-  - **Gauges** — `certctl_certificate_total`, `certctl_certificate_active`, `certctl_certificate_expiring`, `certctl_certificate_expired`, `certctl_certificate_revoked`, `certctl_agent_total`, `certctl_agent_active`, `certctl_job_pending`
-  - **Counters** — `certctl_job_completed_total`, `certctl_job_failed_total`
-  - **Uptime** — `certctl_uptime_seconds` (seconds since server start)
-  All values are point-in-time snapshots computed from database tables.
- **Structured Logging** — All scheduler operations, API calls, and connector actions log via `slog` (Go's structured logger). Logs include timestamp, level (DEBUG/INFO/WARN/ERROR), structured fields (e.g., `actor`, `resource_id`, `latency_ms`), and request IDs for tracing.
- **Request ID Propagation** — Each HTTP request gets a unique ID (`X-Request-ID` header). The ID is included in all correlated logs, making it easy to trace a single request through multiple service layers.
-
-**Evidence Locations**:
-
- Health/readiness endpoints: `internal/api/handler/health.go`
- Background scheduler: `internal/scheduler/scheduler.go` (Start method)
- Metrics endpoint: `internal/api/handler/metrics.go`
- Stats API endpoints (for detailed time-series): `internal/api/handler/stats.go`
-  - `GET /api/v1/stats/summary` — dashboard KPIs
-  - `GET /api/v1/stats/certificates-by-status` — cert counts by status
-  - `GET /api/v1/stats/expiration-timeline?days=N` — cert expiry distribution
-  - `GET /api/v1/stats/job-trends?days=N` — job completion/failure rates
-  - `GET /api/v1/stats/issuance-rate?days=N` — cert issuance volume
- Structured logging middleware: `internal/api/middleware/middleware.go`
-
-**Operator Responsibility**:
-
- Configure log aggregation (e.g., ELK, Datadog, Splunk) to centralize certctl logs
- Set up alerting on scheduler loop failures (e.g., "renewal loop failed to complete within 2h")
- Configure health check monitoring (e.g., Prometheus scrape of `/health` and `/ready`)
- Establish thresholds for metrics (e.g., alert if `pending_jobs > 50` or `agents_healthy < total_agents`)
- Document your log retention policy (audit requirement often mandates 1+ years)
- Integrate certctl metrics into your broader observability stack (Grafana dashboards, SLO tracking)
-
---
-
-### CC7.2 — Anomaly Detection
-
-**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
-
-(This criterion overlaps CC7.1 and extends it to specific anomaly response mechanisms.)
-
-**certctl Implementation** (V2):
-
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, URL path (query parameters intentionally excluded — see security note), actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp. **Security: Query parameters are excluded from the audit path** because they may contain cursor tokens, API keys, or sensitive filter values; since the audit trail is append-only with no deletion, any sensitive data recorded would persist permanently.
- **Audit Trail API** — `GET /api/v1/audit?actor=...&action=...&resource_id=...&created_after=...&created_before=...` allows searching for anomalous patterns (e.g., "who accessed certificate XYZ and when?", "did anyone revoke certs at 2 AM?").
- **Expiration Threshold Alerting** — Certificate renewal policies define alert thresholds (days before expiry): default `[30, 14, 7, 0]`. When a certificate approaches a threshold, a notification is enqueued. Deduplication prevents duplicate alerts for the same cert at the same threshold. Auto status transition: cert moves to `Expiring` status at 30 days, `Expired` at 0 days.
- **Certificate Status Auto-Transitions** — When a cert is issued, it's `Active`. As expiry approaches, status auto-transitions to `Expiring` (at 30d threshold). At expiry, status becomes `Expired`. Revoked certs move to `Revoked`. These transitions are recorded in the audit trail.
- **Notification Routing** — Alerts are sent via configured notifiers (Email, Slack, Teams, PagerDuty, OpsGenie). Certificates are routed to their owner's email address (or team email if no individual owner). This allows on-call teams to react to anomalies (e.g., "your production cert will expire in 7 days, request renewal now").
- **Deployment Rollback** — If a deployment fails or an older certificate needs to be reactivated, operators can trigger a "rollback" via the GUI. This redeploys a previous certificate version to the target. Rollback actions are audited.
-
-**Evidence Locations**:
-
- Audit middleware: `internal/api/middleware/audit.go`
- Audit trail API: `internal/api/handler/audit.go`, `GET /api/v1/audit`
- Expiration alerting: `internal/service/renewal.go` (CheckRenewal method)
- Notification dispatcher: `internal/scheduler/scheduler.go` (notificationTicker)
- Status transitions: `internal/service/certificate.go` (auto status update logic)
- Audit trail CLI export: `certctl-cli audit export --format csv` / `--format json`
-
-**V3 Enhancement**:
-
- **SIEM Export** — Real-time audit event streaming to SIEM systems (via NATS event bus with JetStream sink)
- **Anomaly Rules Engine** — Configurable rules (e.g., "alert if certificate revoked by non-admin", "alert if >10 certs issued in < 1 hour")
-
-**Operator Responsibility**:
-
- Integrate audit trail into your SIEM / log analysis platform
- Define alerting rules and thresholds for anomalies (e.g., "revocation of critical cert", "mass issuance")
- Establish a formal incident response workflow (audit trail shows *what* happened; you must decide *what to do* about it)
- Regularly review audit logs (e.g., monthly compliance audit of who accessed what)
- Configure email/Slack/Teams integration so on-call teams are notified of cert expirations immediately
- Encrypt audit trail backups (ACID guarantees don't prevent theft of database backups)
-
---
-
-### CC7.3 — Incident Response
-
-**Requirement**: The entity detects, investigates, and responds to incidents by executing a defined incident response and management process that includes preparation, detection and analysis, containment, eradication, recovery, and post-incident activities.
-
-**certctl Implementation** (V2):
-
- **Revocation API** — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
-  - `unspecified` — catch-all
-  - `keyCompromise` — private key was exposed
-  - `caCompromise` — CA itself was compromised (rare)
-  - `affiliationChanged` — certificate no longer applies to the organization
-  - `superseded` — newer cert is in use
-  - `cessationOfOperation` — service is shutting down
-  - `certificateHold` — temporary revocation (can be "unhold" by reissue)
-  - `privilegeWithdrawn` — access rights revoked
-  Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
- **CRL Endpoint** — `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
- **OCSP Responder** — `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
-  - Certificate owner (email)
-  - Configured webhooks (if you have a SIEM that subscribes)
-  - Slack/Teams channels (if notifiers are configured)
- **Bulk Revocation for Fleet-Wide Incidents** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. Essential for incident response: key compromise affecting multiple certs, CA distrust events, decommissioning a team's infrastructure. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring audit trail and notifications for every certificate.
- **Short-Lived Cert Exemption** — Certificates with TTL < 1 hour (configured in profile) skip CRL/OCSP publication. Expiry is the revocation mechanism for short-lived certs (e.g., Kubernetes pod certs, session tokens).
- **Deployment Rollback** — If a revoked cert is still deployed (shouldn't happen, but race conditions exist), operators can manually redeploy a previous version via the GUI. Rollback is audited.
-
-**Evidence Locations**:
-
- Revocation API: `internal/api/handler/certificates.go`, `POST /api/v1/certificates/{id}/revoke`
- Revocation domain model: `internal/domain/revocation.go` (RevocationReason type with RFC 5280 mapping)
- CRL generation: `internal/service/certificate.go` (GenerateDERCRL method)
- OCSP signing: `internal/service/certificate.go` (GetOCSPResponse method)
- Revocation notifications: `internal/service/notification.go` (SendRevocationNotification)
- Short-lived exemption: `internal/domain/revocation.go` (IsShortLivedCert check)
-
-**V3 Enhancement**:
-
- **Revocation Automation** — Trigger revocation based on external events (e.g., employee termination, security breach alert from CT Log monitoring)
-
-**Operator Responsibility**:
-
- Establish an incident response policy (e.g., "keyCompromise → immediate deployment to new cert + notify CISO")
- Ensure CRL/OCSP are accessible to all systems using the certs (e.g., CDN or highly-available endpoints if you host on-premises)
- Test revocation workflow in staging (verify that revoked certs are actually blocked by clients)
- Document justification for revocation (audit trail records *that* a cert was revoked, but not *why* — you must document it separately)
- Integrate revocation notifications into your on-call rotation (don't let revocation alerts get lost)
-
---
-
-### CC7.4 — Identify and Develop Risk Mitigation Activities
-
-**Requirement**: The entity identifies, develops, and implements risk mitigation activities for risks arising from potential business disruptions.
-
-**certctl Implementation** (V2):
-
- **Renewal Job Tracking** — Renewal jobs track the certificate, target agents, and issuance outcome. Failed renewals are retried (configurable backoff). Job state diagram: Pending → Running → Completed (or Failed). Failed jobs trigger notifications.
- **Agent Health Monitoring** — Health check loop (every 2m) pings all agents via heartbeat. If an agent misses 3 consecutive heartbeats, it's marked as `Unhealthy`. Unhealthy agents are excluded from new deployments.
- **Job Cancellation** — Operators can cancel pending jobs via `POST /api/v1/jobs/{id}/cancel`. Useful when a renewal is already in progress elsewhere (multi-instance deployments) or when a certificate is being phased out.
- **Interactive Approval** — Renewal/issuance jobs can be put in `AwaitingApproval` status. An authorized operator reviews the pending cert and approves or rejects it. Rejection records a reason in the audit trail. This provides a separation of duty between requestor and approver.
- **Scheduled Scanning** — Agents scan configured directories for existing certs (M18b discovery). Operators triage discovered certs (claim = "we manage this now", dismiss = "this is unmanaged and we're OK with that"). Triage decisions are audited.
-
-**Evidence Locations**:
-
- Job state machine: `internal/domain/job.go` (JobStatus enum)
- Job retry logic: `internal/scheduler/scheduler.go` (jobProcessorTicker)
- Agent health check: `internal/scheduler/scheduler.go` (healthCheckTicker)
- Job cancellation: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/cancel`
- Approval workflow: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/approve` / `reject`
- Discovery scan results: `internal/api/handler/discovery.go`, `GET /api/v1/discovered-certificates`
-
-**Operator Responsibility**:
-
- Monitor renewal job success rate (are certs being renewed before expiry?)
- Set up alert for unhealthy agents (missing 3+ heartbeats = broken agent, take action)
- Establish a formal approval policy (who can approve certs? do they need to involve CISO?)
- Test job cancellation and recovery flows in staging
- Review discovered certs regularly (are there unmanaged certs that should be managed?)
- Document your disaster recovery process (what if control plane database is corrupted?)
-
---
-
-## A1: Availability
-
-### A1.1/A1.2 — Availability and Recovery
-
-**Requirement**: The entity obtains or generates, uses, retains, and disposes of information to enable the entity to meet its objectives and respond to its responsibility to provide information.
-
-**certctl Implementation** (V2):
-
- **Health Probes** — `/health` and `/ready` endpoints support container orchestration (Docker Compose, Kubernetes, etc.). Docker Compose defines health checks for the server and database. Kubernetes would use liveness/readiness probes pointing to these endpoints.
- **Database Migrations (Idempotent)** — PostgreSQL migrations use `IF NOT EXISTS` and `ON CONFLICT ... DO NOTHING` patterns. Migrations can be safely reapplied — no risk of doubling data or dropping tables mid-migration.
- **Agent Panic Recovery** — Agent binary includes panic recovery in job execution loops. If an agent crashes during a deployment, the control plane marks the job as failed and can retry on a healthy agent.
- **Exponential Backoff** — Agent-to-server communication uses exponential backoff (starting at 1s, capped at 5m) to handle transient network failures. This prevents thundering herd when the control plane is temporarily down.
- **Docker Compose Deployment** — Includes health checks for server and database. Services auto-restart on failure.
- **PostgreSQL Connection Pooling** — Server uses `database/sql` with configurable `MaxOpenConns` and `MaxIdleConns` (default 25/5). Prevents connection exhaustion.
-
-**Evidence Locations**:
-
- Health endpoints: `internal/api/handler/health.go`
- Database migrations: `migrations/` directory (all use `IF NOT EXISTS`, idempotent patterns)
- Agent panic recovery: `cmd/agent/main.go` (defer recover() in job execution)
- Exponential backoff: `cmd/agent/main.go` (heartbeat and work poll backoff logic)
- Connection pooling: `cmd/server/main.go` (SetMaxOpenConns, SetMaxIdleConns)
-
-**V3 Enhancement**:
-
- **Multi-Region HA** — Control plane federation with etcd consensus (operator can run N replicas)
- **PostgreSQL HA** — Replication standby with automatic failover (operator responsibility to configure)
-
-**Operator Responsibility**:
-
- Configure PostgreSQL backups (e.g., WAL archiving, daily full backups). Certctl stores certificates but *also* stores renewal policies, audit trail, deployment history.
- Test backup/restore process in staging (broken backups are discovered during incidents)
- Monitor disk usage (PostgreSQL will fail if `/var` fills up)
- Plan capacity (how many certs, agents, jobs can your PostgreSQL handle? Certctl is tested with 10k+ certs, 100+ agents, but your infra may differ)
- Set up high-availability PostgreSQL if you need zero-downtime upgrades
- Implement network segmentation (only authorized services can reach certctl API and database)
-
---
-
-## CC8: Change Management
-
-### CC8.1 — Change Control
-
-**Requirement**: The entity identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions.
-
-**certctl Implementation** (V2):
-
- **Certificate Profiles** — Named profiles define allowed key types, max TTL, required SANs, and permitted EKUs. Changes to profiles are common (e.g., "increase max TTL from 1 year to 3 years"). All profile changes are audited (who changed what, when). Profile updates are versioned.
- **Policy Engine** — Renewal policies define alert thresholds and approval workflows. Policy changes (e.g., "lower alert threshold from 30 days to 14 days") are audited. Policies have violation rules (e.g., "flag certs longer than 3 years") — violations are recorded in the audit trail.
- **Target Configuration** — When a new target (NGINX server, HAProxy load balancer) is added, it's registered with a name and configuration (JSON). Target deletions require confirmation (to prevent accidental removal). All target changes are audited.
- **Immutable Audit Trail** — Every change (profile, policy, target, cert, agent, owner, team, approval, revocation, deployment) is recorded in `audit_events`. Audit records are append-only; no retroactive modification is possible. Audit trail is encrypted at rest (operator responsibility).
- **GitHub Actions CI** — Pull requests must pass:
-  - Go unit tests (`go test ./...`) with coverage gates (service layer ≥30%, handler layer ≥50%)
-  - Go vet (static analysis)
-  - Frontend TypeScript type checking (`tsc`)
-  - Frontend Vitest unit tests
-  - Frontend Vite build (ensures no broken imports)
-  Only after all checks pass can the PR be merged and deployed.
-
-**Evidence Locations**:
-
- Profile CRUD: `internal/api/handler/profiles.go`, `GET /api/v1/profiles` / `POST` / `PUT` / `DELETE`
- Policy CRUD: `internal/api/handler/policies.go`
- Target CRUD: `internal/api/handler/targets.go`
- Audit trail: `internal/api/handler/audit.go`, `GET /api/v1/audit` (records action, actor, resource_id, timestamp)
- CI configuration: `.github/workflows/ci.yml` (test, vet, coverage gates, build checks)
-
-**V3 Enhancement**:
-
- **Change Approval Workflow** — Optional approval gate before profile/policy changes go live
- **Feature Flags** — Enable/disable new features without redeployment (backward compatibility during rolling upgrades)
-
-**Operator Responsibility**:
-
- Implement formal change control (ticket system, approval, peer review)
- Document the business justification for profile/policy changes
- Test changes in a non-production environment before deploying to production
- Have a rollback plan (can you revert a profile change instantly if it breaks issuance?)
- Include certctl configuration changes in your change log (for audits and incident investigations)
- Version control your certctl configuration (Docker Compose file, environment variables) so you can track changes
-
---
-
-## Evidence Summary Table
-
-| SOC 2 Criterion | certctl Feature | Evidence Location | V2 (Free) | V3 (Pro) | Operator Responsibility |
-|---|---|---|---|---|---|
-| **CC6.1** Logical Access Security | API Key Authentication (SHA-256 hashed, constant-time comparison) | `internal/api/middleware/auth.go` | ✅ | Enhanced | API key generation, distribution, rotation |
-| | GUI Login with API Key | `web/src/pages/LoginPage.tsx` | ✅ | Enhanced (OIDC) | NA |
-| | CORS Allowlist | `CERTCTL_CORS_ORIGINS` env var | ✅ | ✅ | Configure appropriately |
-| | Token Bucket Rate Limiting | `internal/api/middleware/rate_limit.go` | ✅ | ✅ | Monitor for brute-force attempts |
-| **CC6.2** Prior to Issuing System Credentials | Ownership Attribution | `GET /api/v1/owners`, audit trail records owner assignment | ✅ | Enhanced (RBAC) | Map to org structure, remove on departure |
-| | Team Assignment | `GET /api/v1/teams` | ✅ | ✅ | NA |
-| | Actor Attribution in Audit Trail | `GET /api/v1/audit` (actor field) | ✅ | ✅ | Justify all changes via separate documentation |
-| **CC6.3** Authentication Policies | API Key Enforcement | `CERTCTL_AUTH_TYPE=api-key` (default) | ✅ | Enhanced (OIDC, MFA) | Document policy, test failures, integrate into IAM audit |
-| | Agent Authentication | Separate API keys for agents | ✅ | ✅ | Rotate agent keys, monitor compromise |
-| | Agent-Side Key Generation | `CERTCTL_KEYGEN_MODE=agent` (default) | ✅ | ✅ | Protect agent filesystem keys via encryption/backup |
-| | Private Key Policy | Server-side keygen logs warning, disabled in production | ✅ | ✅ | Never use server-side keygen in production |
-| **CC6.7** Information Transmission Protection | TLS for Control Plane | Deploy behind TLS-terminating reverse proxy | ✅ | ✅ | Enable TLS in production via reverse proxy |
-| | Agent-to-Server HTTPS | Agents use HTTPS for all API calls | ✅ | ✅ | Enforce TLS via firewall rules |
-| | Private Key Isolation | Agent-side keygen (ECDSA P-256), keys stored 0600 on agent FS | ✅ | ✅ | Encrypt agent filesystems, backup securely |
-| | Pull-Only Deployment | Server never initiates outbound to agents/targets | ✅ | Enhanced (HSM, proxy agents) | Encrypt agent↔target comms, isolate proxy agents |
-| **CC7.1** System Monitoring | Health Endpoint | `GET /health`, `GET /ready` | ✅ | ✅ | Integrate into monitoring (Prometheus, DataDog) |
-| | Metrics JSON Endpoint | `GET /api/v1/metrics` (gauges, counters, uptime) | ✅ | ✅ | Set thresholds, configure alerting |
-| | Stats API (time-series) | `GET /api/v1/stats/*` (summary, status, expiration, jobs, issuance) | ✅ | ✅ | Integrate into dashboards, SLO tracking |
-| | Structured Logging | `slog` middleware with request IDs | ✅ | ✅ | Aggregate logs to SIEM, define retention policy |
-| | Background Scheduler | 12 loops (8 always-on: renewal 1h, jobs 30s, job retry 5m I-001, job timeout 10m I-003, health 2m, notifications 1m, notif retry 2m I-005, short-lived 30s; 4 opt-in: network scan 6h, digest 24h, endpoint health 60s M48, cloud discovery 6h M50) | ✅ | ✅ | Alert on scheduler loop failures |
-| **CC7.2** Anomaly Detection | Immutable API Audit Trail | `internal/api/middleware/audit.go`, `GET /api/v1/audit` | ✅ | Enhanced (SIEM export) | Integrate into SIEM, search for anomalies, archive long-term |
-| | Expiration Threshold Alerting | Configurable per-policy (default 30/14/7/0 days) | ✅ | ✅ | Configure thresholds, integrate notifications |
-| | Status Auto-Transitions | Active → Expiring (30d) → Expired (0d) | ✅ | ✅ | Monitor status changes in audit trail |
-| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
-| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
-| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
-| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
-| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
-| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
-| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
-| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
-| | Agent Health Monitoring | Health check loop (ping every 2m, mark unhealthy after 3 misses) | ✅ | ✅ | Alert on unhealthy agents, investigate |
-| | Job Cancellation | `POST /api/v1/jobs/{id}/cancel` | ✅ | ✅ | Test in staging |
-| | Interactive Approval | AwaitingApproval state, `POST /api/v1/jobs/{id}/approve\|reject` | ✅ | ✅ | Define approval policy, audit decisions |
-| | Certificate Discovery | Agents scan directories, triage (claim/dismiss) | ✅ | ✅ | Review discovered certs regularly |
-| **A1.1/A1.2** Availability and Recovery | Health Probes (Docker, Kubernetes) | `/health` and `/ready` endpoints | ✅ | ✅ | Use in container orchestration |
-| | Idempotent Migrations | `IF NOT EXISTS`, `ON CONFLICT ... DO NOTHING` | ✅ | ✅ | Test migration replay in staging |
-| | Agent Panic Recovery | Panic recovery in job loops | ✅ | ✅ | Monitor agent crashes in logs |
-| | Exponential Backoff | Agent heartbeat/work poll backoff (1s → 5m) | ✅ | ✅ | Monitor for control plane downtime |
-| | PostgreSQL Connection Pooling | MaxOpenConns=25, MaxIdleConns=5 (configurable) | ✅ | ✅ | Monitor connection usage |
-| **CC8.1** Change Control | Certificate Profiles | CRUD API + GUI, profile changes audited | ✅ | ✅ | Formal change control, test in staging |
-| | Policy Engine + Violations | CRUD API + GUI, policy changes audited | ✅ | ✅ | Document justification, implement approval workflow |
-| | Target Registration | CRUD API + GUI, changes audited | ✅ | ✅ | Confirm deletions, version control config |
-| | Immutable Audit Trail | Append-only `audit_events` table | ✅ | ✅ | Encrypt at rest, archive long-term, no manual edits |
-| | GitHub Actions CI | Unit tests, vet, coverage gates, build checks | ✅ | ✅ | Review PRs before merge, maintain test quality |
-
---
-
-## What Requires Operator Action
-
-**certctl is a tool, not a complete compliance solution.** Your organization must handle:
-
-1. **Physical Security** — Protect the infrastructure (servers, network) running certctl. Certctl can't control who has physical access to your datacenter.
-
-2. **Personnel Background Checks** — Before granting anyone API key access, conduct background checks per your policy. Certctl records *who* accessed *what*, but doesn't verify that people are trustworthy.
-
-3. **Formal Incident Response Plan** — Certctl provides incident detection (anomalies in audit trail) and tools for response (revocation, rollback), but you must define *when* to use them and *who* decides.
-
-4. **Access Review and Removal** — Certctl stores ownership, teams, and API keys. You must:
-   - Regularly review who has access (quarterly or semi-annually)
-   - Immediately revoke API keys for departing employees
-   - Audit that removed access is actually removed (test that old keys fail)
-
-5. **Log Retention and Archival** — Certctl logs to stdout (Docker) and stores audit events in PostgreSQL. You must:
-   - Ship logs to a long-term archive (SIEM, S3, or equivalent)
-   - Define retention policy (often 1-7 years per industry regulation)
-   - Encrypt archived logs
-   - Test that you can retrieve logs from archive (restoration drills)
-
-6. **Encryption at Rest** — PostgreSQL data (including audit trail) is stored on disk. You must:
-   - Enable transparent data encryption (TDE) on your database VM
-   - Encrypt container persistent volumes (if using Kubernetes)
-   - Encrypt database backups
-
-7. **Network Segmentation** — Certctl API and database must be protected by network access controls. You must:
-   - Firewall the control plane (only authorized services can connect)
-   - Use VPN or private networks for agent-to-server communication
-   - Isolate proxy agents (for F5, IIS, etc.) in the same network zone as their targets
-
-8. **Capacity Planning** — Certctl's performance scales with your PostgreSQL. You must:
-   - Estimate certificate inventory size (10k, 100k, 1M certs?)
-   - Test Certctl with your expected scale in staging
-   - Monitor disk usage, CPU, memory
-   - Plan for growth (add PostgreSQL replicas, increase connection pool, etc.)
-
-9. **Disaster Recovery** — Certctl data lives in PostgreSQL. You must:
-   - Back up PostgreSQL regularly (daily or hourly, depending on RPO)
-   - Test restore process in staging (broken backups discovered during incidents)
-   - Have a runbook for failover to replica or recovery from backup
-   - Document RTO/RPO targets (how long can cert management be down? how much data can you afford to lose?)
-
-10. **Integration with Your IAM** — If using OIDC/SSO (V3), you must:
-    - Configure your OIDC provider (Okta, Azure AD, Google)
-    - Map user groups to Certctl roles (Admin, Operator, Viewer)
-    - Manage MFA policy (enforce MFA if required)
-    - Audit user provisioning/deprovisioning
-
-11. **Documentation and Runbooks** — Certctl documents *what it does* (this guide), but you must document:
-    - Your organization's certificate lifecycle policy (who requests, who approves, who deploys)
-    - How to respond to specific incidents (cert compromise, CA compromise, agent down, renewal failed)
-    - How to operate certctl (day-to-day tasks, escalation procedures)
-    - Contact info for on-call teams
-
---
-
-## V3 Enhancements
-
-**certctl Pro (V3, paid edition) adds features that significantly strengthen SOC 2 evidence:**
-
- **OIDC / SSO Integration** — Integrate with Okta, Azure AD, Google to replace API keys with federated identity. Enables MFA enforcement and centralized access management. Auditors love federated identity (easier to remove access at source).
-
- **Role-Based Access Control (RBAC)** — Predefined roles (Admin: full access; Operator: issue/renew/revoke, no policy changes; Viewer: read-only) with profile-gated enforcement. Allows separation of duties (e.g., junior operator can't change global policy).
-
- **NATS Event Bus** — Real-time audit streaming to your SIEM. Hybrid model: HTTP for synchronous APIs, NATS for async events (cert.issued, cert.expiring, agent.heartbeat, job.completed). JetStream persistence for replay and durability.
-
- **SIEM Export** — Automated export of audit trail to Splunk, ELK, DataDog, etc. (webhooks, syslog, or pull-based APIs). Makes it easy for security teams to hunt for anomalies.
-
- **Advanced Search DSL** — `POST /api/v1/search` with tree-based filters (nested AND/OR, regex, field projection). Enables complex compliance queries (e.g., "all certs issued in the last 30 days by team X that are longer than 1 year").
-
- **Bulk Revocation** — Revoke all certs issued by a profile, owner, or agent in one operation. Critical for large-scale incidents (e.g., "a team's CA key was compromised, revoke all their certs").
-
- **Certificate Health Scores** — Composite risk scoring (e.g., "this cert has no short-lived TTL enforcement, extends past your policy max, and hasn't been renewed in 2 years" → health=30%). Helps prioritize remediation.
-
- **Compliance Scoring** — Audit readiness reporting per certificate (e.g., "compliance=95% — missing only a 3-year max-TTL constraint"). Exportable compliance report.
-
- **DigiCert Issuer Connector** — OV/EV certificate issuance for public-facing services (web servers, CDNs). Complements Local CA for internal use.
-
- **CT Log Monitoring** — Passive detection of unauthorized cert issuance. Monitors public CT logs for certs matching your domains and alerts if unexpected certs appear (e.g., attacker obtained a cert for your domain).
-
- **F5 BIG-IP Implementation** — Full target connector with iControl REST API. Agents can deploy certs to F5 load balancers.
-
- **IIS Implementation** — Dual-mode: agent-local PowerShell (default) for servers with agents, or proxy agent WinRM (agentless targets). Full Windows Server integration.
-
---
-
-## Conclusion
-
-certctl provides a strong foundation for SOC 2 compliance with API key authentication, immutable audit logging, automated alerting, and revocation capabilities. However, SOC 2 audits require evidence across your entire infrastructure — certctl is one piece. Use this guide to map certctl features to your audit questionnaire, then work with your auditors to identify gaps that must be filled by your own organizational policies and controls.
-
-For a deeper SOC 2 discussion or a mock audit against this guide, contact your certctl Pro support team.
@@ -1,122 +0,0 @@
-# Compliance Mapping Guides
-
-certctl is a certificate lifecycle management tool, not a compliance product. It doesn't make you compliant — your organization, policies, and processes do that. What certctl provides is tooling that supports the technical controls auditors and evaluators look for when assessing certificate and key management practices.
-
-These guides map certctl's features to three widely referenced compliance frameworks. They're designed for security engineers, IT auditors, and procurement teams evaluating certctl for environments with regulatory requirements.
-
-## What's Covered
-
-**[SOC 2 Type II](compliance-soc2.md)** — Maps certctl features to AICPA Trust Service Criteria. Covers logical access controls (CC6), system operations and monitoring (CC7), change management (CC8), and availability (A1). Most relevant for organizations undergoing SOC 2 audits where certificate management is in scope.
-
-**[PCI-DSS 4.0](compliance-pci-dss.md)** — Maps certctl features to PCI Data Security Standard version 4.0 requirements. Covers data-in-transit protection (Req 4), cryptographic key management (Req 3), authentication (Req 8), audit logging (Req 10), secure development (Req 6), and access control (Req 7). Most relevant for organizations handling cardholder data where TLS certificates protect transmission channels.
-
-**[NIST SP 800-57](compliance-nist.md)** — Maps certctl's key management practices to NIST Special Publication 800-57 Part 1 Rev 5 (2020). Covers key generation, storage, cryptoperiods, key state lifecycle, algorithm selection, key transport, and revocation. Most relevant for organizations aligning with US federal cryptographic guidance or using NIST as a key management baseline.
-
-## What These Guides Are Not
-
-These are mapping guides, not certification claims. certctl is not SOC 2 certified, PCI-DSS validated, or NIST-assessed. The guides document how certctl's technical implementation supports the controls these frameworks require — they do not replace your auditor's assessment, your organization's policies, or your security team's judgment.
-
-The guides also clearly identify gaps where certctl's current implementation doesn't fully align with a framework's recommendations, features planned for future versions, and areas where operator action is required regardless of what certctl provides.
-
-## How to Use These Guides
-
-If you're evaluating certctl for a regulated environment, start with the framework your auditor cares about. Each guide includes an evidence summary table mapping specific compliance criteria to certctl features, API endpoints, and configuration — the kind of specifics your auditor will ask for.
-
-If you're preparing for an audit and certctl is already deployed, use the "Operator Responsibilities" section of each guide to identify what your organization must manage beyond what certctl provides.
-
-## Quick Reference
-
-| Framework | Primary Concern | Key certctl Features |
-|---|---|---|
-| SOC 2 Type II | Trust service criteria for SaaS/infrastructure | API audit trail, auth controls, monitoring, change management |
-| PCI-DSS 4.0 | Cardholder data protection | TLS lifecycle, key management, immutable logging, access control |
-| NIST SP 800-57 | Cryptographic key management | Agent-side keygen, key isolation, algorithm selection, revocation |
-
-## Audit-Trail Integrity & Privacy (Bundle 6)
-
-Two complementary controls protect the `audit_events` table against tampering and minimize PII exposure. Both apply automatically — no operator action is required at install time, but operators must understand the contract before responding to a legal-hold or retention request.
-
-### Append-Only Enforcement (HIPAA §164.312(b))
-
-<!-- Source: migrations/000018_audit_events_worm.up.sql -->
-
-`audit_events` rows cannot be modified or deleted by the application role. Two layers:
-
-| Layer | Mechanism | Surface |
-|---|---|---|
-| **DB trigger** | `audit_events_block_modification()` raises `check_violation` on `BEFORE UPDATE OR DELETE` | Catches any UPDATE / DELETE — including direct `psql` from the app role |
-| **App-role grant** | `REVOKE UPDATE, DELETE ON audit_events FROM certctl` | Defence-in-depth; the app role can't even attempt the modification |
-
-**Verification.** From a `psql` session connected as the `certctl` app role:
-
-```sql
-UPDATE audit_events SET actor = 'tampered' WHERE id = 'audit-001';
-- ERROR:  audit_events is append-only (Bundle-6 / M-017 / HIPAA §164.312(b))
-- HINT:   Use a compliance superuser role for legitimate retention operations.
-```
-
-**Compliance superuser pattern.** Legitimate retention work (legal hold, GDPR right-to-be-forgotten, statutory purges) requires a separate PostgreSQL role provisioned out-of-band that bypasses the trigger. Certctl does NOT auto-create this role — operators provision it per their compliance policy. Suggested shape:
-
-```sql
-- One-time setup by a DBA. Stored procedure pattern keeps the
-- compliance superuser audit-able too: every invocation should
-- itself land in audit_events.
-CREATE ROLE certctl_compliance LOGIN PASSWORD '<strong-secret>';
-GRANT UPDATE, DELETE ON audit_events TO certctl_compliance;
-- (optional) provision SECURITY DEFINER stored procedures that
-- (a) record the retention reason in audit_events as the FIRST step
-- (b) then perform the UPDATE/DELETE
-- (c) all under the certctl_compliance role's grants.
-```
-
-### Body Redaction (GDPR Art. 32, CWE-532)
-
-<!-- Source: internal/service/audit_redact.go -->
-
-`AuditService.RecordEvent` routes every `details` map through `RedactDetailsForAudit` BEFORE marshaling to the JSONB column. Two deny-lists:
-
-| Category | Match | Replacement | Examples |
-|---|---|---|---|
-| **Credentials** | case-insensitive key match | `"[REDACTED:CREDENTIAL]"` | `api_key`, `password`, `token`, `*_pem`, `eab_secret`, `acme_account_key`, `signature` |
-| **PII** | case-insensitive key match | `"[REDACTED:PII]"` | `email`, `phone`, `ssn`, `dob`, `name`, `address`, `postal_code`, `ip_address` |
-
-Nested maps and arrays are walked recursively — sensitive keys at any depth get scrubbed. The redactor is mutation-free (the caller's original map is unchanged) so service-layer code that reuses the map elsewhere is safe.
-
-**Operator visibility — `redacted_keys` array.** The redacted map includes a `redacted_keys` array listing every dotted-path that was scrubbed. This surfaces the redaction footprint to compliance auditors without exposing values. Example before/after:
-
-```jsonc
-// Caller's input map (e.g., from a service handler):
-{
-  "action": "create_issuer",
-  "issuer_id": "iss-acme-prod",
-  "config": {
-    "endpoint": "https://acme.example.com",
-    "eab_secret": "abc123secret",
-    "contact": { "email": "ops@example.com", "role": "admin" }
-  }
-}
-
-// Persisted in audit_events.details:
-{
-  "action": "create_issuer",
-  "issuer_id": "iss-acme-prod",
-  "config": {
-    "endpoint": "https://acme.example.com",
-    "eab_secret": "[REDACTED:CREDENTIAL]",
-    "contact": { "email": "[REDACTED:PII]", "role": "admin" }
-  },
-  "redacted_keys": ["config.eab_secret", "config.contact.email"]
-}
-```
-
-**Maintenance.** When introducing a new credential-bearing field anywhere in the codebase, add the key name to `credentialKeys` (or `piiKeys`) in `internal/service/audit_redact.go`. The unit test suite in `audit_redact_test.go` exercises every entry and proves case-insensitivity + JSON round-trip safety.
-
-## certctl Pro (V3) Enhancements
-
-Several compliance-relevant features are planned for certctl Pro:
-
- **OIDC/SSO** — Enterprise identity provider integration (SOC 2 CC6.1, PCI-DSS 8.3)
- **RBAC** — Role-based access control with admin/operator/viewer roles (SOC 2 CC6.3, PCI-DSS 7.2)
- **NATS Audit Streaming** — Real-time audit event streaming to SIEM systems (SOC 2 CC7.2, PCI-DSS 10.2)
- **Bulk Revocation** — Fleet-wide incident response capability (NIST SP 800-57 Section 5.4)
- **Health/Compliance Scoring** — Automated compliance posture assessment per certificate
@@ -1,5 +1,7 @@
 # Advanced Demo: Certificate Lifecycle End-to-End

+> Last reviewed: 2026-05-05
+
 This demo goes beyond browsing pre-loaded data. You'll create a team, register an owner, set up an issuer, create a certificate, trigger renewal, and watch everything appear in the dashboard in real time. Each step includes a technical explanation of what's happening inside certctl and why the system is designed that way.

 **Time**: 15-20 minutes
@@ -363,7 +365,7 @@ curl -s -X POST $API/api/v1/certificates \
 | `issuer_id` | Links to the issuer connector that will sign this certificate. Determines which CA backend is used. |
 | `renewal_policy_id` | Links to a `renewal_policies` row that defines: how many days before expiry to renew (`renewal_window_days`), whether auto-renewal is enabled (`auto_renew`), max retries, and retry interval. The default policy (`rp-default`) renews 30 days before expiry. |
 | `status` | Set to `Pending` because the certificate hasn't been issued yet. The scheduler will pick it up, or you can trigger renewal manually. |
-| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"pci": "true"` for compliance scoping). |
+| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"environment": "production"` for fleet scoping). |

 **Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details.

@@ -603,7 +605,7 @@ curl -s "$API/api/v1/audit?created_after=2026-03-24T09:00:00Z" | jq '.data | len

 The audit middleware (M19) records every HTTP request: method, path, status code, actor, request body SHA-256 hash, and latency. This creates a complete API audit trail without blocking responses (logging happens asynchronously).

-**Why immutable audit:** Compliance frameworks (SOC 2 Type II, PCI-DSS, ISO 27001) require tamper-evident audit logs. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
+**Why immutable audit:** tamper-evident audit logs are a hard requirement when an attacker has compromised the API server. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.

 **Check the dashboard.** The "Audit" view shows the full timeline of all actions across the system with filtering and CSV/JSON export.

@@ -701,7 +703,7 @@ curl -s -X POST $API/api/v1/certificates \

 **Why `environment` matters:** The environment field isn't just metadata — it feeds the policy engine. A policy rule with type `AllowedEnvironments` can restrict which environments are valid. If someone tries to create a certificate with `environment: "yolo"`, the policy engine flags a violation. In a mature deployment, you'd enforce policies strictly: production certificates must use a trusted CA (not Local CA), staging certificates can use Let's Encrypt staging, and development certificates can use the Local CA.

-**Why `pci: true` in tags:** Tags are free-form, but they enable powerful filtering and compliance scoping. A security team could query `GET /api/v1/certificates?tags.pci=true` (not implemented yet, but the JSONB column supports it) to find all PCI-scoped certificates and verify they meet compliance requirements.
+**Why arbitrary tags in metadata:** Tags are free-form, but they enable powerful filtering and fleet scoping. A security team could query `GET /api/v1/certificates?tags.regulated=true` (not implemented yet, but the JSONB column supports it) to find all certificates marked regulated and verify they meet whatever requirements that label maps to.

 **Refresh the dashboard** — you'll see the new payment gateway certificate. Try filtering by environment or status to see how both certificates appear alongside the demo data.

@@ -778,7 +780,7 @@ Check existing violations:
 curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq .
 ```

-**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific non-compliance.
+**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific violation.

 **In the dashboard**, click "Policies" in the sidebar to see all active rules and which certificates are violating them.

@@ -844,7 +846,7 @@ curl -s -X POST $API/api/v1/profiles \

 **How it works:** Certificate profiles are stored in the `certificate_profiles` table with a `allowed_key_algorithms` JSONB column that defines which key types and minimum sizes are acceptable. When a certificate is assigned to a profile, the profile constraints are enforced during CSR validation. The `max_validity_days` field controls the maximum certificate lifetime — profiles with values translating to under 1 hour enable short-lived certificate mode, where certs are exempt from CRL/OCSP.

-**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce compliance requirements across the fleet.
+**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce policy across the fleet.

 **In the dashboard**, click "Profiles" in the sidebar to see and manage certificate profiles.

@@ -894,17 +896,17 @@ Approve or reject them:
 # Approve a job
 curl -s -X POST $API/api/v1/jobs/JOB_ID/approve \
  -H "Content-Type: application/json" \
-  -d '{"reason": "Verified key type meets compliance requirements"}' | jq .
+  -d '{"reason": "Verified key type meets policy"}' | jq .

 # Reject a job
 curl -s -X POST $API/api/v1/jobs/JOB_ID/reject \
  -H "Content-Type: application/json" \
-  -d '{"reason": "Key type does not meet PCI requirements"}' | jq .
+  -d '{"reason": "Key type does not meet policy"}' | jq .
 ```

 **How it works:** When a renewal policy has `auto_renew` set to false, renewal jobs enter the `AwaitingApproval` state instead of being processed immediately. An operator must explicitly approve or reject the job via the API or the GUI. Approved jobs transition to `Pending` and are picked up by the job processor. Rejected jobs move to `Cancelled` with the provided reason recorded in the audit trail.

-**Why interactive approval:** Not every certificate renewal should be automatic. PCI-scoped certificates, certs with specific compliance requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
+**Why interactive approval:** Not every certificate renewal should be automatic. High-value certificates, certs with specific policy requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.

 **In the dashboard:** Click "Jobs" in the sidebar, filter by status "AwaitingApproval", and you'll see a list of renewal jobs waiting for approval. Each job shows the certificate, issuer, and requested validity period. Click a job to open its detail view and see the Approve / Reject buttons with a reason text field. After approval or rejection, the job status updates in real-time and the audit trail records the decision.

@@ -987,7 +989,7 @@ export CERTCTL_API_KEY="test-key-123"

 ## Part 15: MCP Server for AI Integration (M18a)

-certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with Claude, Cursor, and other AI assistants:
+certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with any MCP-compatible AI client:

 ```bash
 # Build the MCP server
@@ -1008,19 +1010,19 @@ export CERTCTL_API_KEY="test-key-123"
 - **Binary support** — handles DER-encoded CRL and OCSP responses without mangling
 - **Error translation** — converts HTTP errors to user-readable messages

-**Example usage from Claude:**
+**Example usage:**

 ```
 User: What certificates are expiring in the next 30 days?

-Claude uses the MCP tools to:
+The AI client uses the MCP tools to:
  1. Call tools.listCertificates with filters: {status: "Expiring"}
  2. Parse the response
  3. Display: "mc-api-prod expires in 12 days. mc-cdn-prod expires in 8 days..."

 User: Revoke mc-payments due to key compromise

-Claude uses the MCP tools to:
+The AI client uses the MCP tools to:
  1. Call tools.revokeCertificate with id="mc-payments" reason="keyCompromise"
  2. Return the audit trail entry showing revocation recorded
 ```
@@ -1,5 +1,7 @@
 # Understanding Certificates: A Beginner's Guide

+> Last reviewed: 2026-05-05
+
 If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential.

 ## Contents
@@ -123,7 +125,7 @@ At no point does the private key leave the agent. This is a fundamental security

 Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.

-**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
+**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for audit reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.

 ### Deployment Targets

@@ -220,7 +222,7 @@ certctl implements revocation using three complementary mechanisms:

 **Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`. The CRL is **pre-generated** by a scheduler-driven loop (`crlGenerationLoop`, default interval 1 hour, configurable via `CERTCTL_CRL_GENERATION_INTERVAL`) and persisted in the `crl_cache` table — HTTP fetches read from the cache rather than rebuilding per request, so a busy CA does not DOS itself at scale. Concurrent regeneration requests for the same issuer are coalesced via an in-tree singleflight gate.

-**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
+**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](../reference/protocols/crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.

 Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.

@@ -242,7 +244,7 @@ Every action in certctl — issuing a certificate, renewing one, deploying to a

 ### Audit Trail

-Every action is logged: who did it, what changed, when, and why. This is essential for compliance (SOC 2, PCI-DSS, ISO 27001) and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
+Every action is logged: who did it, what changed, when, and why. This is essential for audit and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.

 ### Notifications

@@ -256,7 +258,7 @@ The CLI supports both table and JSON output formats (`--format table` or `--form

 ### MCP Server (AI Integration)

-certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
+certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"

 The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.

@@ -279,7 +281,7 @@ This gives you a three-step triage workflow:

 Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.

-This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
+This is a prerequisite for multi-CA migration, audit reviews, and building confidence that you've found all the certificates that matter.

 ### Observability

@@ -291,4 +293,4 @@ The agent fleet overview page groups agents by OS, architecture, and version, sh

 Now that you understand the concepts, head to the [Quick Start Guide](quickstart.md) to get certctl running locally in under 5 minutes. You'll see a pre-loaded dashboard with demo certificates, explore the API, and understand how everything fits together.

-For a deeper look at the system design, see the [Architecture Guide](architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](mcp.md). For the full API reference, see the [OpenAPI Spec Guide](openapi.md).
+For a deeper look at the system design, see the [Architecture Guide](../reference/architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](../reference/mcp.md). For the full API reference, see the [OpenAPI Spec Guide](../reference/api.md).
@@ -1,5 +1,7 @@
 # Deployment Examples

+> Last reviewed: 2026-05-05
+
 Five turnkey docker-compose scenarios, each runnable in under 5 minutes. Pick the one closest to your setup.

 ## Which Example Should I Use?
@@ -30,9 +32,9 @@ cp .env.example .env   # Edit with your domain and email
 docker compose up -d
 ```

-The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../examples/acme-nginx/acme-nginx.md).
+The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../../examples/acme-nginx/acme-nginx.md).

-**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](migrate-from-certbot.md).
+**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](../migration/from-certbot.md).

 ---

@@ -50,9 +52,9 @@ cp .env.example .env   # Edit with domain, email, DNS provider credentials
 docker compose up -d
 ```

-The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
+The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).

-**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](migrate-from-acmesh.md).
+**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](../migration/from-acmesh.md).

 ---

@@ -69,7 +71,7 @@ cd examples/private-ca-traefik
 docker compose up -d    # Self-signed mode (no .env needed for demo)
 ```

-The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../examples/private-ca-traefik/private-ca-traefik.md).
+The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../../examples/private-ca-traefik/private-ca-traefik.md).

 ---

@@ -86,7 +88,7 @@ cd examples/step-ca-haproxy
 docker compose up -d
 ```

-The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../examples/step-ca-haproxy/step-ca-haproxy.md).
+The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../../examples/step-ca-haproxy/step-ca-haproxy.md).

 ---

@@ -103,9 +105,9 @@ cd examples/multi-issuer
 docker compose up -d
 ```

-The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../examples/multi-issuer/multi-issuer.md).
+The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../../examples/multi-issuer/multi-issuer.md).

-**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](certctl-for-cert-manager-users.md).
+**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](../migration/cert-manager-coexistence.md).

 ---

@@ -117,4 +119,4 @@ These 5 scenarios cover the most common deployment patterns, but certctl support

 **Targets:** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell or WinRM proxy), Postfix, Dovecot, F5 BIG-IP (coming soon).

-See [Connector Reference](connectors.md) for configuration details on every issuer and target.
+See [Connector Reference](../reference/connectors/index.md) for configuration details on every issuer and target.
@@ -1,5 +1,7 @@
 # Quick Start Guide

+> Last reviewed: 2026-05-05
+
 Certificate lifespans are dropping to **47 days by 2029**. At that cadence, a team managing 100 certificates is processing 7+ renewals per week — every week, forever. Manual processes break. certctl automates the entire lifecycle: issuance, renewal, deployment, revocation, and audit — with zero human intervention.

 This guide gets you running in 5 minutes and walks you through everything certctl does.
@@ -120,7 +122,7 @@ curl --cacert "$CA" https://localhost:8443/health
 {"status":"healthy"}
 ```

-If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/tls.md`](tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/upgrade-to-tls.md`](upgrade-to-tls.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
+If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/operator/tls.md`](../operator/tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/archive/upgrades/to-tls-v2.2.md`](../archive/upgrades/to-tls-v2.2.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.

 ## Open the Dashboard

@@ -130,7 +132,7 @@ Open **https://localhost:8443** in your browser. Your browser will warn about th
 >
 > **Key rotation:** `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `CERTCTL_AUTH_SECRET=new-key,old-key`). Both keys are valid simultaneously, enabling zero-downtime rotation: add the new key, roll clients over, then remove the old key.

-The dashboard comes pre-loaded with 35 demo certificates across 5 issuers, 8 agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization.
+The dashboard comes pre-loaded with demo data covering certificates across multiple issuers, agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization. (Re-derive exact counts via `grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l`.)

 ### What you're looking at

@@ -322,7 +324,7 @@ curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve
 # Reject a pending job
 curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
  -H "Content-Type: application/json" \
-  -d '{"reason": "Key type does not meet compliance requirements"}' | jq .
+  -d '{"reason": "Key type does not meet policy requirements"}' | jq .
 ```

 ## Certificate Discovery
@@ -436,7 +438,7 @@ export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA"   # MCP is env-vars-only; no CLI flag
 ./mcp-server
 ```

-Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
+Exposes the full REST API via MCP over stdio transport. Ask your MCP client: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."

 ## Demo Data Reference

@@ -447,7 +449,7 @@ Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certif
 | Issuers | 5 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, ZeroSSL (EAB), Custom OpenSSL CA |
 | Agents | 9 | 8 real agents (linux/darwin/windows, amd64/arm64) + server-scanner (network discovery) |
 | Targets | 8 | NGINX prod, NGINX staging, NGINX data, HAProxy, Apache, IIS, Traefik, Caddy |
-| Certificates | 35 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
+| Certificates | 32 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
 | Jobs | 50+ | 90 days of issuance, renewal, deployment jobs + 2 AwaitingApproval |
 | Discovered Certs | 12 | Unmanaged (filesystem + network), Managed (linked), Dismissed |
 | Discovery Scans | 8 | Historical + recent agent filesystem scans + network TLS scans |
@@ -480,7 +482,7 @@ A suggested 5-minute flow:
 6. **Agent fleet** — "Agents handle key generation locally (ECDSA P-256). Private keys never leave your infrastructure."
 7. **Discovery** — "Agents scan filesystems, server probes TLS endpoints. We find what you're not managing yet."
 8. **Bulk operations** — "Select multiple certs, renew or revoke in bulk. At 47-day lifespans with hundreds of certs, this is essential."
-9. **Audit trail** — "Every action recorded. Export to CSV/JSON for compliance."
+9. **Audit trail** — "Every action recorded. Export to CSV/JSON for review."
 10. **CLI + MCP** — "Terminal users get `certctl-cli`. AI assistants get MCP integration. Everything is API-first."

 ## Tear Down
@@ -496,7 +498,7 @@ The `-v` flag removes the PostgreSQL data volume for a clean slate.
 **Ready to deploy with your stack?** The [Deployment Examples](examples.md) page has 5 turnkey docker-compose scenarios — pick the one closest to your setup and have it running in minutes. It also covers migration paths from Certbot, acme.sh, and cert-manager.

 - **[Deployment Examples](examples.md)** — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer
- **[Advanced Demo](demo-advanced.md)** — Issue a real certificate via the Local CA end-to-end
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
- **[Connector Reference](connectors.md)** — Configuration for all 7 issuers and 10 targets
+- **[Advanced Demo](advanced-demo.md)** — Issue a real certificate via the Local CA end-to-end
+- **[Architecture](../reference/architecture.md)** — How the control plane, agents, and connectors work together
+- **[Connector Reference](../reference/connectors/index.md)** — Configuration for all 7 issuers and 10 targets
 - **[Concepts Guide](concepts.md)** — TLS certificates, CAs, and private keys explained from scratch
@@ -1,5 +1,7 @@
 # Why certctl?

+> Last reviewed: 2026-05-05
+
 Certificate management is broken at every scale between "one domain on Let's Encrypt" and "Fortune 500 budget for Venafi." certctl fills that gap: a self-hosted platform that automates the entire certificate lifecycle, works with any CA, deploys to any server, and keeps private keys on your infrastructure. It's free, source-available, and you own everything.

 ## The Math That Forces the Decision
@@ -32,17 +34,22 @@ This isn't a premium feature. It's the default behavior, free. Most alternatives

 ### 2. CA-Agnostic Issuer Architecture

-certctl works with any certificate authority, not just ACME providers. Nine issuer connectors ship today, all free:
+certctl works with any certificate authority, not just ACME providers. Twelve issuer connectors ship today, all free:

 - **ACME v2** (Let's Encrypt, ZeroSSL, Google Trust Services, Buypass) — HTTP-01, DNS-01, DNS-PERSIST-01 challenges, External Account Binding, ACME Renewal Information (RFC 9773), certificate profile selection
 - **HashiCorp Vault PKI** — `/v1/{mount}/sign/{role}` API, token auth
 - **DigiCert CertCentral** — async order model, OV/EV support
 - **Sectigo SCM** — async order model, DV/OV/EV support, 3-header auth
 - **Google Cloud CAS** — Certificate Authority Service, OAuth2 service account auth, CA pool selection
+- **AWS ACM Private CA** — managed private CA on AWS, IAM-authenticated, SDK-waiter for issuance
+- **Entrust Certificate Services** — Entrust CA Gateway with mTLS auth, approval-pending support
+- **GlobalSign Atlas HVCA** — region-pinned commercial CA with dual mTLS + API key/secret auth
+- **EJBCA / Keyfactor** — self-hosted open-source / Keyfactor enterprise CA, mTLS or OAuth2
 - **step-ca** (Smallstep) — native /sign API with JWK provisioner auth
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root)
+- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root); supports multi-level CA tree mode
 - **OpenSSL / Custom CA** — delegate signing to any shell script
- **EST enrollment** (RFC 7030) — device certs for WiFi/802.1X, MDM, IoT
+
+EST (RFC 7030) and SCEP (RFC 8894) are protocol surfaces, not separate issuers — they dispatch to whichever issuer above is configured for the EST/SCEP profile.

 Every connector implements the same interface. Running multiple CAs in parallel — Let's Encrypt for public certs, Vault for internal services, your enterprise CA for legacy systems — is configuration, not code.

@@ -56,19 +63,19 @@ A reload command can exit 0 while the certificate doesn't take effect — wrong

 The three differentiators above get the headlines, but the feature surface is wider than most paid platforms:

-**13 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix, Dovecot, SSH (agentless), Windows Certificate Store, and Java Keystore. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
+**15 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix/Dovecot (dual-mode), SSH (agentless), Windows Certificate Store, Java Keystore, Kubernetes Secrets, AWS Certificate Manager, and Azure Key Vault. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.

 **Network certificate discovery** — active TLS scanning of CIDR ranges finds certificates you didn't know existed. Agents also scan local filesystems for PEM/DER files. Everything feeds into a triage workflow where you claim, dismiss, or import discovered certs into management.

-**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete. Mapped to SOC 2, PCI-DSS 4.0, and NIST SP 800-57 compliance frameworks with published evidence guides.
+**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete.

 **Policy engine** — 5 rule types (allowed issuers, allowed domains, required metadata, allowed environments, renewal lead time) with violation tracking and severity levels.

-**PKI compliance** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
+**Revocation infrastructure** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.

 **Prometheus metrics** — `/api/v1/metrics/prometheus` in standard exposition format. Works with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics.

-**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via Claude, Cursor, or any MCP-compatible client. No other certificate platform offers this.
+**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via any MCP-compatible client. No other certificate platform offers this.

 **Full REST API** — OpenAPI 3.1-documented operations covering the entire platform. CLI tool with 10 subcommands. Helm chart for Kubernetes deployment. Scheduled certificate digest emails. Certificate export in PEM and PKCS#12. S/MIME support with EKU-aware issuance.

@@ -82,7 +89,7 @@ ACME clients solve one slice of the problem — issuance and renewal from ACME C

 ### vs. Agent-Based SaaS

-The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 9 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
+The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 12 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.

 ### vs. Commercial PKI Platforms

@@ -110,7 +117,7 @@ cd certctl/deploy && docker compose up -d
 # Dashboard at https://localhost:8443 (self-signed cert — pin deploy/test/certs/ca.crt)
 ```

-See the [Quickstart Guide](quickstart.md) for a full walkthrough, or explore the [5 turnkey examples](../examples/) for specific scenarios (ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer).
+See the [Quickstart Guide](quickstart.md) for a full walkthrough, or explore the [5 turnkey examples](../../examples/) for specific scenarios (ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer).

 ## License

@@ -0,0 +1,97 @@
+# Git history normalization — 2026-05-13
+
+> Last reviewed: 2026-05-13
+
+This page documents a one-time normalization of certctl's git history
+that landed on `master` on 2026-05-13. If you are reading this because
+your clone failed to fast-forward, or because a commit SHA you bookmarked
+no longer resolves, this is the explanation.
+
+## What changed
+
+Every commit's `author` and `committer` metadata was rewritten to a
+single canonical identity (`shankar0123 <skreddy040@gmail.com>`). The
+14 pre-rewrite author identities — operator name variants plus
+AI/automation identities (Claude, Copilot, cowork agent, certctl-bot,
+etc.) — collapsed to that one canonical author.
+
+No source-code content was changed by the rewrite. Every line of code
+in every commit is byte-for-byte identical to its pre-rewrite version.
+Only the `author` and `committer` metadata fields were touched; commit
+messages, subject lines, milestone IDs (M49, L-1, etc.), and every
+other line of every commit's body are preserved verbatim.
+
+## Why
+
+Two reasons:
+
+1. **LLC ownership transfer.** The codebase is now legally owned by
+   **certctl LLC**, which the operator incorporated to hold rights in
+   the project. The BSL 1.1 Licensor field in `LICENSE` flipped from a
+   natural-person name to `certctl LLC` in the same change set. Uniform
+   per-commit authorship under one canonical operator identity makes
+   the chain of title between the codebase and the LLC unambiguous.
+
+2. **Pre-traction cleanup.** The rewrite cost of git-history
+   normalization scales with how many external clones and references
+   have calcified against specific commit SHAs. Doing it now, before
+   the project has a large external surface, minimizes disruption to
+   downstream consumers.
+
+## What is preserved
+
+A complete off-platform bundle backup of the pre-rewrite tree is held
+by the operator (off-repo, not pushed). It contains every original
+commit SHA, every original author identity, and the full ref graph as
+it existed before the rewrite. The bundle is the immutable
+preservation record and is recoverable forever.
+
+An `archive/pre-author-normalization-2026-05-13` tag briefly existed
+on origin pointing at the pre-rewrite tip but was removed when the
+operator opted to clean the contributor graph of pre-rewrite
+authorship signal. The bundle remains as the canonical archive — any
+forensic question about pre-rewrite state can be answered by loading
+the bundle into a fresh clone (`git clone pre-rewrite-2026-05-13.bundle`).
+
+## Recovering after the rewrite
+
+If you had a clone of certctl from before 2026-05-13, your local
+history diverged from origin's at the rewrite. Easiest recovery:
+
+```bash
+cd certctl
+git fetch origin
+git fetch origin --tags
+git reset --hard origin/master
+```
+
+This force-aligns your local tree with the new origin. Any local
+branches you had based on pre-rewrite history will need rebasing onto
+the new master.
+
+If you need to inspect the pre-rewrite state for a forensic or
+diligence question, contact the operator directly — the off-platform
+bundle is the canonical archive and is available on request.
+
+## Container images and release tarballs
+
+ghcr.io container images that were published before the rewrite
+(`ghcr.io/certctl-io/certctl-{server,agent}:<old-tag>`) remain pullable
+indefinitely. Their OCI source-SHA labels reference commit SHAs that
+no longer resolve in the public origin — the images themselves still
+work; only the source-SHA back-reference is now orphan. New release
+images published after the rewrite reference current SHAs normally.
+
+If you downloaded a release tarball before the rewrite, the tarball's
+contents are unchanged; only its associated `git` SHA differs from the
+current `v2.x.y` tag (which has been re-pointed to the rewritten
+commit at the same logical point in history).
+
+## Operational note for contributors
+
+Future contributions to certctl should be authored under the
+operator's canonical git identity. Pull requests from external
+contributors will need a Contributor License Agreement (CLA) workflow,
+which the project will set up before accepting external PRs. Until
+then, the project does not solicit or accept external code
+contributions.
@@ -1,5 +1,14 @@
 # Caddy Integration Walkthrough

+> Last reviewed: 2026-05-05
+
+> **Use this walkthrough when** you're already running Caddy 2.7+ and
+> want it to ACME-issue from certctl (your internal CA, your private
+> PKI, or a local sub-CA chained under an enterprise root) instead of
+> Let's Encrypt. The Caddyfile changes are minimal; the load-bearing
+> piece is trusting certctl's bootstrap CA so Caddy's ACME client can
+> talk to certctl over HTTPS.
+
 End-to-end recipe for issuing certs from a certctl-server deployment
 through Caddy 2.7+. Target audience: operator running Caddy on a VM
 or container who wants Caddy to ACME-issue from certctl instead of
@@ -10,7 +19,7 @@ Let's Encrypt.
 - A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
  and at least one profile whose `acme_auth_mode` is set. Profile
  setup is identical to the cert-manager walkthrough — see
-  [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
+  [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
  Step 2.
 - Caddy 2.7.x or later. `caddy version` should show 2.7.0+.
 - Network reachability: Caddy → certctl-server's HTTPS listener (port
@@ -149,7 +158,7 @@ psql -c "SELECT actor, action, resource_id FROM audit_events
  legitimately high throughput.
 - **Caddy logs `urn:ietf:params:acme:error:rejectedIdentifier`** →
  the SAN list includes an identifier the certctl profile policy
-  rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](./acme-server.md#certificate-readyfalse-with-rejectedidentifier).
+  rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#certificate-readyfalse-with-rejectedidentifier).
 - **`badNonce` in Caddy logs** → clock skew or multi-replica certctl
  without sticky sessions; same fix as the cert-manager walkthrough.

@@ -165,8 +174,8 @@ rm -rf ~/.local/share/caddy/certificates/certctl.example.com-*

 ## See also

- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
+- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
+- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
  K8s-native equivalent.
 - [Caddy upstream ACME docs](https://caddyserver.com/docs/automatic-https#acme-issuer)
  — verify behavior pinned here against Caddy 2.7.x semantics.
@@ -1,11 +1,22 @@
 # cert-manager Integration Walkthrough

+> Last reviewed: 2026-05-05
+
+> **Use this walkthrough when** you're already running cert-manager
+> 1.15+ in Kubernetes and want it to issue certs from certctl (your
+> internal CA, your private PKI, or a local sub-CA chained under an
+> enterprise root) via the standard ACME `ClusterIssuer` model. If
+> you want certctl to coexist with cert-manager rather than replace
+> its issuer backend, see
+> [`docs/migration/cert-manager-coexistence.md`](cert-manager-coexistence.md)
+> instead.
+
 End-to-end recipe for issuing certs from a certctl-server deployment
 through cert-manager 1.15+. Target audience: Kubernetes operator who
 has never deployed certctl before and wants a working
 `Certificate` → `Secret` flow on their cluster in under 30 minutes.

-The Phase 5 integration test (`make acme-cert-manager-test`) automates
+The cert-manager integration test (`make acme-cert-manager-test`) automates
 exactly the recipe below. The YAML snippets in this doc are byte-equal
 to the files under `deploy/test/acme-integration/` — re-running the
 test from a fresh clone produces the same results documented here.
@@ -13,7 +24,7 @@ test from a fresh clone produces the same results documented here.
 ## Prereqs

 - A Kubernetes cluster (kind / k3d / EKS / GKE / AKS / on-prem). For
-  local trial, `kind v0.20+` works exactly the way the Phase 5 test
+  local trial, `kind v0.20+` works exactly the way the integration test
  uses it. The kind config lives at
  [`deploy/test/acme-integration/kind-config.yaml`](../deploy/test/acme-integration/kind-config.yaml).
 - `kubectl` v1.27+, `helm` v3.13+.
@@ -26,7 +37,7 @@ test from a fresh clone produces the same results documented here.

  which is the same idempotent installer the integration test uses.
 - A certctl Helm chart published to a registry your cluster can pull
-  from. The Phase 5 test uses an `image.tag=test` placeholder; production
+  from. The integration test uses an `image.tag=test` placeholder; production
  deployments use the actual image tag for your release line.

 ## Step 1 — Deploy certctl-server
@@ -64,7 +75,7 @@ curl -X POST https://certctl-test.default.svc.cluster.local:8443/api/profiles \
 ```

 Auth-mode tradeoffs are covered in
-[`docs/acme-server.md` § Auth-mode decision tree](./acme-server.md#auth-mode-decision-tree).
+[`docs/acme-server.md` § Auth-mode decision tree](../reference/protocols/acme-server.md#auth-mode-decision-tree).
 For first-time deployments, `trust_authenticated` is the right default.

 ## Step 3 — Capture the certctl bootstrap CA
@@ -83,12 +94,12 @@ cat deploy/test/certs/ca.crt | base64 -w0
 Capture the output for Step 4. This is **the** single biggest first-
 time-deploy footgun on the cert-manager integration path. The reference
 recipe lives in
-[`docs/acme-server.md` § TLS trust bootstrap](./acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
+[`docs/acme-server.md` § TLS trust bootstrap](../reference/protocols/acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).

 ## Step 4 — Apply the ClusterIssuer

 ```yaml
-# Phase 5 — sample ClusterIssuer for the certctl trust_authenticated
+# sample ClusterIssuer for the certctl trust_authenticated
 # auth mode (RFC 8555 §6 + certctl auth_mode=trust_authenticated, where
 # the JWS-authenticated ACME account is trusted to issue any identifier
 # the profile policy permits — no per-identifier ownership challenges).
@@ -158,7 +169,7 @@ HTTP-01 to work.
 ## Step 5 — Apply the Certificate

 ```yaml
-# Phase 5 — Certificate resource the integration test applies and
+# Certificate resource the integration test applies and
 # waits for. The certctl-test-trust ClusterIssuer (trust_authenticated
 # mode) issues the cert without any solver round-trip; the resulting
 # Secret 'test-com-tls' is asserted to carry tls.crt + tls.key.
@@ -218,7 +229,7 @@ psql -c "SELECT created_at, action, resource_type, resource_id
 ## Common failure modes

 These are operator-side; full troubleshooting reference is in
-[`docs/acme-server.md` § Troubleshooting](./acme-server.md#troubleshooting).
+[`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#troubleshooting).

 - `400 Bad Request: badNonce` → clock skew between certctl-server and
  cert-manager, or a multi-replica certctl fleet without sticky
@@ -243,12 +254,12 @@ helm uninstall certctl-test

 ## See also

- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
- [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md) —
+- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
+- [`docs/acme-server-threat-model.md`](../reference/protocols/acme-server-threat-model.md) —
  security posture.
- [`docs/acme-caddy-walkthrough.md`](./acme-caddy-walkthrough.md) —
+- [`docs/acme-caddy-walkthrough.md`](./acme-from-caddy.md) —
  Caddy-side recipe.
- [`docs/acme-traefik-walkthrough.md`](./acme-traefik-walkthrough.md) —
+- [`docs/acme-traefik-walkthrough.md`](./acme-from-traefik.md) —
  Traefik-side recipe.
 - [`deploy/test/acme-integration/`](../deploy/test/acme-integration/) —
-  Phase 5 integration test (the same recipe, automated).
+  cert-manager integration test (the same recipe, automated).
@@ -1,5 +1,14 @@
 # Traefik Integration Walkthrough

+> Last reviewed: 2026-05-05
+
+> **Use this walkthrough when** you're already running Traefik 3.0+
+> (Kubernetes or VM) and want it to ACME-issue from certctl (your
+> internal CA, your private PKI, or a local sub-CA chained under an
+> enterprise root) instead of Let's Encrypt. The Traefik static config
+> changes are minimal; the load-bearing piece is `serversTransport.rootCAs`
+> so Traefik trusts certctl's bootstrap CA on every outbound ACME call.
+
 End-to-end recipe for issuing certs from a certctl-server deployment
 through Traefik 3.0+. Target audience: operator running Traefik (in
 Kubernetes or on a VM) who wants to use certctl as their ACME source
@@ -10,7 +19,7 @@ of truth instead of Let's Encrypt.
 - A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
  and at least one profile whose `acme_auth_mode` is set. Profile
  setup is identical to the cert-manager walkthrough — see
-  [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
+  [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
  Step 2.
 - Traefik 3.0+ (the v2 API surface for ACME is also supported but the
  `serversTransport.rootCAs` reference below is v3-shaped).
@@ -191,8 +200,8 @@ sudo rm /etc/traefik/acme-certctl.json

 ## See also

- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
+- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
+- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
  cert-manager equivalent.
 - [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
  verify behavior pinned here against Traefik 3.0+ semantics.
@@ -0,0 +1,294 @@
+# Migrating API keys to RBAC (v2.0.x → v2.1.0)
+
+> Last reviewed: 2026-05-09
+
+This is the upgrade guide for an existing certctl deployment moving
+from v2.0.x's "every API key is admin or not" model to v2.1.0's
+RBAC primitive. Everything keeps working through the upgrade - the
+migration backfills every existing API key to the
+`r-admin` role on first boot, so the pre-existing automation that
+was using those keys does not change behavior. **However**, most
+keys do not need full admin power; this guide walks the operator
+through the post-upgrade scope-down flow.
+
+## ⚠️ SECURITY: AUDIT YOUR API KEYS
+
+v2.1.0 maps **every** existing `CERTCTL_API_KEYS_NAMED` entry
+(and every legacy `CERTCTL_AUTH_SECRET`-synthesized key) to the
+`r-admin` role on the first boot after migration 000029 applies.
+This is the safe-for-back-compat default - your CI / agents / scripts
+keep working without changes - but if you don't downgrade keys, every
+key in your fleet has full admin permissions including bulk-revoke,
+CRL admin, and CA hierarchy management.
+
+**Run the scope-down flow before tagging the next release.** The
+release notes for v2.1.0 lead with this callout for a reason.
+
+## Upgrade flow
+
+### 1. Apply the migration
+
+The migration runner is idempotent. Re-applying is a no-op if the
+schema is already at the target version. The five RBAC migrations
+that ship in v2.1.0:
+
+| Migration | What it does |
+|---|---|
+| `000029_rbac.up.sql` | Creates `tenants`, `roles`, `permissions`, `role_permissions`, `actor_roles`. Seeds 7 default roles + 33-permission catalogue + the synthetic `actor-demo-anon` admin grant. Backfills every named API key into `actor_roles` with the `r-admin` role. |
+| `000030_rbac_admin_perms.up.sql` | Seeds 5 admin-only fine-grained permissions (`cert.bulk_revoke`, `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage`) into `r-admin` only. |
+| `000031_api_keys.up.sql` | Creates the `api_keys` table for runtime-minted keys (day-0 bootstrap path). |
+| `000032_audit_category.up.sql` | Adds `event_category` column to `audit_events` with the closed enum (`cert_lifecycle` / `auth` / `config`). |
+| `000033_approval_kinds.up.sql` | Adds `approval_kind` + `payload` to `issuance_approval_requests` for the approval-bypass closure. |
+
+The v2.1.0 server applies these on first boot. No operator
+action is required other than running the upgrade.
+
+### 2. Verify the backfill landed
+
+```bash
+# Inspect the seeded actor_roles rows. You should see one row per
+# entry in CERTCTL_API_KEYS_NAMED (Admin=true keys → r-admin,
+# Admin=false keys → r-viewer) plus the seeded actor-demo-anon
+# admin row.
+psql -d certctl -c "SELECT actor_id, role_id, granted_by, granted_at FROM actor_roles ORDER BY granted_at;"
+```
+
+If the table is empty, the boot-loader hook in
+`cmd/server/auth_backfill.go::backfillNamedKeyActorRoles` did not
+run; re-check that `CERTCTL_AUTH_TYPE` is `api-key` (the boot
+hook is gated on `cfg.Auth.Type != none`).
+
+### 3. List + scope-down keys
+
+The `certctl-cli` ships a four-mode scope-down command. Pick the
+mode that matches your fleet size + automation posture.
+
+#### Interactive walk
+
+```bash
+certctl-cli auth keys scope-down
+```
+
+Walks every actor (skips the synthetic `actor-demo-anon`) and
+prompts for a target role. Empty input keeps the existing role.
+Type one of `admin`, `operator`, `viewer`, `agent`, `mcp`, `cli`,
+`auditor` to replace.
+
+#### Non-interactive JSON config (Helm post-upgrade hook)
+
+```bash
+cat > scope-down.json <<EOF
+{
+  "ci-bot":         "operator",
+  "agent-prod-1":   "agent",
+  "agent-prod-2":   "agent",
+  "monitoring-bot": "viewer",
+  "compliance-bot": "auditor"
+}
+EOF
+
+certctl-cli auth keys scope-down --non-interactive ./scope-down.json
+```
+
+Empty role values revoke every current grant WITHOUT granting a
+replacement; assign roles selectively with
+`certctl-cli auth keys assign`.
+
+#### Audit-driven suggestion
+
+```bash
+# Preview suggestions based on the last 30 days of audit history
+certctl-cli auth keys scope-down --suggest
+
+# Apply the suggestions
+certctl-cli auth keys scope-down --suggest --apply
+```
+
+The classifier (pure function in `internal/cli/auth_scope_down.go::SuggestRoleFromAuditEvents`)
+walks the actor's audit events and emits one of:
+
+| Suggestion | Trigger |
+|---|---|
+| `admin` | Any auth.role.* / auth.key.* / ca.hierarchy.* / *.bulk_revoke / *.admin action |
+| `mcp` | All observed actions are MCP-shaped (`mcp.*`) |
+| `viewer` | All observed actions are read-only (`*.read` or `*.list`) |
+| `agent` | All observed actions are agent-shaped (`agent.*`, `cert.read`, `cert.issue`) |
+| `operator` | Cert / profile / target lifecycle mutations without admin signals |
+
+The classifier is conservative - when in doubt, it prefers the
+narrower role. The operator confirms each suggestion before any
+mutation lands (unless `--apply` is set).
+
+### 4. Mint a fresh admin via bootstrap (optional, for fresh deployments)
+
+If you're standing up a fresh deployment instead of upgrading an
+existing one, the bootstrap path mints the first admin key without
+needing the operator to know the env-var format:
+
+```bash
+# Set the bootstrap token in the server environment.
+export CERTCTL_BOOTSTRAP_TOKEN=$(openssl rand -hex 32)
+
+# Boot the server. Logs include "bootstrap endpoint enabled".
+docker compose up -d
+
+# Mint the first admin key.
+curl -X POST $URL/api/v1/auth/bootstrap \
+  -H 'Content-Type: application/json' \
+  -d '{"token":"'$CERTCTL_BOOTSTRAP_TOKEN'","actor_name":"first-admin"}'
+```
+
+The response carries the plaintext `key_value` once. Capture it
+and use it as the Bearer token for subsequent calls. Subsequent
+bootstrap calls return HTTP 410 Gone.
+
+See [`docs/operator/rbac.md`](../operator/rbac.md) for the full
+bootstrap flow + the threat model.
+
+## What changes for code that called `IsAdmin`
+
+In v2.0.x, the five admin handlers checked `auth.IsAdmin(ctx)`
+directly in the body. v2.1.0 moved those checks to
+the router via the `auth.RequirePermission` middleware (wrapped
+through the `rbacGate` helper in
+`internal/api/router/router.go`). The behavior contract is
+unchanged: `r-admin`-roled callers reach the handler, anyone else
+gets HTTP 403 BEFORE the body runs.
+
+If your code consumed `auth.IsAdmin` directly (it shouldn't - 
+the helper is internal), the new convention is:
+
+1. Wrap the route in `rbacGate(reg.Checker, "<perm>", handler)`
+   in `router.go`.
+2. Add the perm to `migrations/000030_rbac_admin_perms.up.sql`
+   (or `migrations/000029_rbac.up.sql`'s catalogue).
+3. Grant the perm to the right default roles.
+
+The five admin-only fine-grained perms stay on `r-admin` only by
+default. Operators delegate by creating custom roles with the
+specific perm.
+
+## Helm-specific upgrade
+
+The certctl Helm chart applies migrations on container start via
+the standard migrations runner. No chart changes are required;
+the `helm upgrade` command runs identically:
+
+```bash
+helm upgrade certctl certctl/certctl \
+  --version <new-version> \
+  --reuse-values
+```
+
+Post-upgrade, the boot loader runs the named-key actor-role
+backfill against the `CERTCTL_API_KEYS_NAMED` env-var-injected
+into the deployment. The "AUDIT YOUR API KEYS" callout applies - 
+add a post-upgrade Job to your release pipeline that runs
+`certctl-cli auth keys scope-down --non-interactive` against a
+checked-in JSON config, so the role narrowing is deterministic
+across upgrade rollouts.
+
+Example post-upgrade Job:
+
+```yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: certctl-scope-down
+spec:
+  template:
+    spec:
+      containers:
+ - name: scope-down
+        image: ghcr.io/certctl-io/certctl-cli:<tag>
+        command:
+ - certctl-cli
+ - auth
+ - keys
+ - scope-down
+ - --non-interactive
+ - /config/scope-down.json
+        envFrom:
+ - secretRef:
+              name: certctl-cli-credentials
+        volumeMounts:
+ - name: scope-down-config
+            mountPath: /config
+      volumes:
+ - name: scope-down-config
+          configMap:
+            name: certctl-scope-down-config
+      restartPolicy: OnFailure
+```
+
+The ConfigMap holds the `{actor_id: role_id}` map; the Secret
+holds the API key the Job uses to call `/v1/auth/keys/.../roles`.
+
+## Docker Compose-specific upgrade
+
+For `deploy/docker-compose.yml` deployments:
+
+1. Pull the new images: `docker compose pull`
+2. Verify your `CERTCTL_AUTH_TYPE` value before restarting. If it
+   was `none` (the demo path), the post-upgrade server will boot
+   in demo mode again - the synthetic `actor-demo-anon` admin
+   covers every request, no scope-down is meaningful. If you're
+   moving from `none` to `api-key` mode, set
+   `CERTCTL_API_KEYS_NAMED` first, then restart.
+3. `docker compose up -d` to apply.
+4. `docker compose logs certctl-server | grep -i 'loaded persisted api_keys'`
+   to verify the boot loader ran. The first-boot log line includes
+   the count of keys loaded into the runtime keystore.
+5. Run `certctl-cli auth keys scope-down` against the running
+   server.
+
+The five examples in `examples/` (acme-nginx, private-ca-traefik,
+step-ca-haproxy, multi-issuer, acme-wildcard-dns01) all run in
+demo mode (`CERTCTL_AUTH_TYPE=none`) and are unaffected by the
+RBAC migration - the synthetic actor-demo-anon admin grant covers
+every request.
+
+## Verifying the upgrade landed
+
+After the scope-down flow completes:
+
+1. `certctl-cli auth me` while authenticated as each named key
+   confirms the right `effective_permissions` for that role.
+2. `psql -c "SELECT actor_id, array_agg(role_id ORDER BY role_id) FROM actor_roles GROUP BY actor_id;"`
+   gives the full picture in one query.
+3. The audit trail
+   (`GET /api/v1/audit?category=auth`)
+   shows the `auth.role.assign` and `auth.role.revoke` rows for
+   every change you made - confirm via the GUI's
+   `/audit?category=auth` view.
+4. Read the updated [`docs/operator/rbac.md`](../operator/rbac.md)
+   for day-2 RBAC management.
+
+## Rollback
+
+If the upgrade goes wrong, the down migrations exist in lockstep:
+
+```bash
+# Roll back via your migration runner (golang-migrate, Atlas, etc.).
+# Migrations 000029-000033 each have a .down.sql that reverses the
+# .up.sql. Down migrations are destructive on data added by the up
+# migration (api_keys rows, role grants on actors, profile-edit
+# approvals); take a backup first.
+```
+
+After rollback, the v2.0.x binary works against the v2.0.x
+schema unchanged. The operator's API keys still authenticate (the
+in-memory hash table is rebuilt from `CERTCTL_API_KEYS_NAMED` on
+boot regardless of schema version).
+
+## Cross-references
+
+- [`docs/operator/rbac.md`](../operator/rbac.md) - the operator
+  how-to for the new RBAC primitive
+- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) - 
+  what the new controls defend against
+- [`docs/reference/profiles.md`](../reference/profiles.md) - the
+  approval-bypass closure on `RequiresApproval` profile edits
+- [`docs/operator/security.md`](../operator/security.md) - the
+  full security posture
+- `CHANGELOG.md` - the v2.1.0 release notes lead with this guide
@@ -1,5 +1,7 @@
 # certctl for cert-manager Users

+> Last reviewed: 2026-05-05
+
 You run cert-manager inside Kubernetes and it works well for in-cluster certificates. But you also have VMs, bare-metal servers, network appliances, and legacy systems outside the cluster. cert-manager can't reach those. This guide shows how certctl complements cert-manager to give you unified certificate visibility and automation across your entire infrastructure.

 ## Not a Replacement
@@ -96,7 +98,7 @@ Go to **Policies** → **+ New Policy** to create enforcement rules:
 - **Severity:** `high`
 - **Config:** set your enforcement parameters

-Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other compliance rules across your fleet.
+Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other policy rules across your fleet.

 ### 6. View Unified Inventory

@@ -139,7 +141,7 @@ For now: cert-manager handles Kubernetes, certctl handles everything else. They

 ## Next Steps

-1. Run through the [Quick Start](./quickstart.md) for a 5-minute demo
-2. Try the [Multi-Issuer example](../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
-3. Explore [Architecture](./architecture.md#agents) for deployment patterns
+1. Run through the [Quick Start](../getting-started/quickstart.md) for a 5-minute demo
+2. Try the [Multi-Issuer example](../../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
+3. Explore [Architecture](../reference/architecture.md#agents) for deployment patterns
 4. Check the [Helm Chart](../deploy/helm/certctl/) for production Kubernetes deployment
@@ -1,5 +1,7 @@
 # Migrate from acme.sh to certctl

+> Last reviewed: 2026-05-05
+
 You use acme.sh to automate Let's Encrypt renewal across multiple servers. It works — but without centralized visibility, deployment verification, or policy enforcement.

 This guide walks through moving your acme.sh workload to certctl while keeping your existing DNS provider setup.
@@ -269,7 +271,7 @@ certctl automatically falls back to DNS-01 if the CA doesn't support dns-persist

 ## Next Steps

- Try the [Wildcard DNS-01 example](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
- See [Connector Reference](connectors.md) for advanced ACME options (EAB, ARI, custom timeouts)
+- Try the [Wildcard DNS-01 example](../../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
+- See [Connector Reference](../reference/connectors/index.md) for advanced ACME options (EAB, ARI, custom timeouts)
 - See [Discovery Guide](concepts.md#certificate-discovery) for managing discovered certificates at scale
- See all [Deployment Examples](./examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
+- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
@@ -1,5 +1,7 @@
 # Migrating from Certbot to certctl

+> Last reviewed: 2026-05-05
+
 You have 50 Let's Encrypt certificates across 10 servers, managed by a mix of Certbot cron jobs and manual renewals. Certbot handles issuance, but you lack inventory visibility, centralized alerting, and audit trails. This guide walks you through moving to certctl while keeping your existing certificates and ACME account.

 ## Why Migrate
@@ -167,7 +169,7 @@ certctl will stop renewing that cert when the policy is disabled. Certbot resume

 ## Next Steps

- Try the [ACME + NGINX example](../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
- Review the [Concepts Guide](./concepts.md) for terminology (profiles, policies, agents, jobs)
- Explore [Network Discovery](./quickstart.md#network-discovery-agentless) to find certificates you didn't know about
- See all [Deployment Examples](./examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
+- Try the [ACME + NGINX example](../../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
+- Review the [Concepts Guide](../getting-started/concepts.md) for terminology (profiles, policies, agents, jobs)
+- Explore [Network Discovery](../getting-started/quickstart.md#network-discovery-agentless) to find certificates you didn't know about
+- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
@@ -0,0 +1,261 @@
+# Enable OIDC SSO
+
+> Last reviewed: 2026-05-10
+
+This guide walks an operator already running certctl with API-key auth + RBAC through enabling OIDC SSO. The path is additive: API-key auth keeps working unchanged; OIDC sits alongside as a second authentication surface for human users.
+
+If you are upgrading from a pre-RBAC (v2.0.x) deployment, finish [`api-keys-to-rbac.md`](api-keys-to-rbac.md) first. If you have not deployed certctl at all, start with [`getting-started/quickstart.md`](../getting-started/quickstart.md). For the canonical mental model + per-flow threat coverage, see [`security.md`](../operator/security.md) and [`auth-threat-model.md`](../operator/auth-threat-model.md).
+
+## What "enable OIDC" gives you
+
+After this migration:
+
+- Human operators can log in via the OIDC button on the certctl login page (one button per configured IdP).
+- The IdP authenticates the user; certctl validates the returned ID token, mints a session cookie, and redirects to the dashboard.
+- IdP groups → certctl roles are operator-configured (e.g. `engineering@example.com` → `r-operator`).
+- Every login emits an audit row (`auth.oidc_login_succeeded`) attributing the action to the federated user, NOT to a shared API key.
+- The first user from a configured admin group (when `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` is set) becomes admin per tenant; one-shot per the admin-existence probe.
+
+What does NOT change:
+
+- API keys keep working. Existing automation continues to authenticate via `Authorization: Bearer` exactly as before.
+- The break-glass admin path stays default-OFF.
+- The auditor split + approval workflow + RBAC primitive are unchanged.
+
+## Pre-requisites
+
+**On certctl side:**
+
+- Server build ≥ v2.1.0. Confirm via `curl https://<your-host>:8443/api/v1/version`.
+- `CERTCTL_CONFIG_ENCRYPTION_KEY` set in the server environment. This is the passphrase that encrypts the OIDC `client_secret` at rest. Use a stable, secrets-manager-stored value at least 32 random bytes long. **The server refuses to start if the key is missing AND any source='database' rows already exist** (CWE-311 fail-closed gate). Set this before doing anything else.
+- An admin actor available to drive the configuration. The actor needs the `auth.oidc.create` + `auth.oidc.edit` permissions; `r-admin` carries both by default. Get one via the day-0 bootstrap path if you don't have one yet.
+- HTTPS-only control plane (post-v2.2 milestone — this is the default). The OIDC redirect URI MUST be `https://`.
+
+**On IdP side:**
+
+- A Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace tenant where you can register an OIDC application. Free dev tiers work for evaluation. See the per-IdP runbook at [`oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md).
+- Network reachability from certctl-server to the IdP's `/.well-known/openid-configuration` discovery endpoint. The certctl service fetches discovery + JWKS at provider creation and at every `RefreshKeys` call.
+
+## Step-by-step
+
+### 1. Pin `CERTCTL_CONFIG_ENCRYPTION_KEY`
+
+If your deployment already has it set (the CWE-311 fail-closed gate enforces this for any source='database' issuer/target row), skip this step. If you don't:
+
+```bash
+# Generate a 32-byte random key + base64-encode it.
+openssl rand -base64 32 > /etc/certctl/config-encryption-key
+chmod 600 /etc/certctl/config-encryption-key
+```
+
+Then make the server consume it at boot:
+
+```bash
+# In your environment, systemd unit, k8s Secret, etc.
+export CERTCTL_CONFIG_ENCRYPTION_KEY="$(cat /etc/certctl/config-encryption-key)"
+```
+
+Restart the server. Confirm the boot log does NOT show the `ErrEncryptionKeyRequired` warning. If it does, the server refuses to start because there's pre-existing source='database' material that needs to be re-sealed; see [`docs/operator/security.md`](../operator/security.md) for the re-encryption flow.
+
+### 2. Pick an IdP runbook + complete the IdP-side configuration
+
+Pick the runbook for your IdP and do EVERYTHING in its IdP-side section. The runbooks are at [`docs/operator/oidc-runbooks/`](../operator/oidc-runbooks/index.md). What you need from the runbook before continuing here:
+
+- The IdP's discovery URL (the `iss` value certctl will validate against).
+- An OIDC client ID + client secret. Save the secret; you'll paste it into certctl in step 3.
+- At least one IdP group with the users who should be allowed to log in. The runbook walks the group-claim mapper config.
+- The IdP-side group claim shape — most IdPs emit `string-array` under a `groups` key, but Auth0 uses namespaced URL keys (`https://your-namespace/groups`) and Entra ID emits group OBJECT IDs (GUIDs) instead of names. The runbook calls out the per-IdP shape.
+
+### 3. Configure the certctl-side OIDC provider
+
+Via the GUI (recommended for first-time setup):
+
+1. Sign in as an admin actor.
+2. Navigate to **Auth → OIDC Providers** in the sidebar.
+3. Click **Configure provider**.
+4. Fill in the form using the values from step 2's runbook.
+5. Click **Save**.
+
+If the discovery doc fetch fails, the modal surfaces the error inline. Most-common cause: a typo in the issuer URL.
+
+Or via the CLI / MCP:
+
+```bash
+curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
+  -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "Keycloak",
+    "issuer_url": "https://keycloak.example.com/realms/certctl",
+    "client_id": "certctl",
+    "client_secret": "<paste-the-secret>",
+    "redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
+    "groups_claim_path": "groups",
+    "groups_claim_format": "string-array",
+    "scopes": ["openid", "profile", "email"],
+    "iat_window_seconds": 300,
+    "jwks_cache_ttl_seconds": 3600
+  }'
+```
+
+The MCP equivalent (`certctl_auth_create_oidc_provider`) accepts the same JSON shape.
+
+### 4. Add the group → role mappings
+
+Empty mapping list = nobody can log in via this provider (the fail-closed contract; pinned by `ErrGroupsUnmapped`). Add at least one mapping BEFORE announcing the SSO endpoint to users.
+
+Via the GUI: **Auth → OIDC Providers → <provider> → Group → role mappings → Add**.
+
+Via the API:
+
+```bash
+curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
+  -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "provider_id": "<provider-id-from-step-3>",
+    "group_name": "engineering@example.com",
+    "role_id": "r-operator"
+  }'
+```
+
+A typical setup adds two or three mappings: `engineers → r-operator`, `viewers → r-viewer`, optionally `admins → r-admin`. For Entra ID, use group object IDs (GUIDs) NOT names; for Auth0, use the bare group name from inside the namespaced claim array.
+
+### 5. (Optional) Configure first-admin bootstrap
+
+If your deployment has no admin actor yet AND you want the first OIDC-authenticated user from a specific group to become admin (instead of using the env-var-token bootstrap path), set:
+
+```bash
+export CERTCTL_BOOTSTRAP_ADMIN_GROUPS=admins
+export CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID=<provider-id-from-step-3>
+```
+
+Restart the server. The first user with the `admins` group claim from that provider becomes admin on login per tenant. Subsequent logins go through normal group-role mapping. Audit row on every grant (`bootstrap.oidc_first_admin`).
+
+If you already have an admin actor (likely — you needed one to run step 3), the bootstrap hook silently falls through to normal mapping; no harm done. The probe is one-shot per tenant and can't double-grant.
+
+### 6. Verify with a single test user
+
+Before announcing the SSO endpoint to your users, verify the full login flow with a test user from your IdP:
+
+1. Open `https://<your-certctl-host>:8443/login` in a fresh incognito window.
+2. The page should render `Sign in with <provider>` button(s) above the API-key form. If not, check that `getAuthInfo` is returning the `oidc_providers` field — `curl https://<your-host>:8443/api/v1/auth/info` should show the configured provider(s).
+3. Click the provider button. The browser redirects to the IdP, you authenticate, and the IdP redirects back. You should land on the certctl dashboard.
+4. Navigate to **Auth → Sessions**. You should see a row with your own actor ID and the current timestamp.
+5. Confirm the audit row:
+
+   ```bash
+   curl https://<your-host>:8443/api/v1/audit?category=auth \
+     -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
+     | jq '.events[] | select(.action == "auth.oidc_login_succeeded")'
+   ```
+
+   You should see a row attributed to the federated user with `details.provider_id` matching your configuration.
+
+If any step fails, see the **Troubleshooting** section below.
+
+### 7. Announce the SSO endpoint
+
+Once step 6 passes, the SSO endpoint is operational. Tell your users to log in via `https://<your-host>:8443/login` and click the provider button. API-key auth continues to work for automation; the two paths coexist.
+
+Optional GUI hardening:
+
+- If you want the API-key form hidden once OIDC is configured, the operator can add a frontend feature flag in a follow-on commit. Default behavior keeps both paths visible (the API-key form stays for break-glass + Bearer-mode deploys).
+- If you want to revoke a user's session immediately (e.g. an employee left), use **Auth → Sessions → All actors (admin) → <user> → Revoke**. The next request from that user's browser fails 401.
+
+## Rollback
+
+If you need to disable OIDC:
+
+1. Delete every group-role mapping for the provider:
+   ```bash
+   # GUI: Auth → OIDC Providers → <provider> → Group → role mappings → Remove (each)
+   ```
+2. Delete the OIDC provider:
+   ```bash
+   # GUI: Auth → OIDC Providers → <provider> → Delete (type-confirm-name dialog)
+   ```
+   The server returns HTTP 409 if any user has an authenticated session minted via this provider; revoke those sessions first.
+3. The `Sign in with <provider>` button disappears from the login page on the next `getAuthInfo` round-trip (typically the next page load).
+4. Existing sessions continue to work until idle/absolute expiry. To force-revoke them, **Auth → Sessions → All actors (admin) → revoke each row**.
+
+API-key auth continues to work throughout this rollback; you do not need to re-bootstrap or change any other configuration.
+
+## Troubleshooting
+
+**"Discovery doc fetch failed" at provider creation.**
+The most common cause is a typo in the issuer URL. Curl the URL manually:
+```bash
+curl -v https://<idp-host>/<path>/.well-known/openid-configuration
+```
+If that returns 404, fix the issuer URL.
+
+**"IdP downgrade-attack defense" rejected provider creation.**
+Your IdP advertises HS256/HS384/HS512 or `none` in `id_token_signing_alg_values_supported`. Configure the IdP to advertise only RS256 / RS512 / ES256 / ES384 / EdDSA before re-creating the provider in certctl. The relevant runbook section walks this.
+
+**Login redirects to IdP, user authenticates, but the callback redirects back to `/login` with "no roles assigned".**
+The user authenticated successfully but their groups didn't match any configured mapping (`ErrGroupsUnmapped`). Check:
+- The user is a member of the IdP group you mapped.
+- The group-claim mapper is configured correctly at the IdP (the runbook walks per-IdP).
+- The group name in your certctl mapping exactly matches what the IdP emits — case-sensitive, no leading slash for Keycloak full-path-OFF.
+
+Decode the ID token at jwt.io against the IdP's JWKS to see exactly what's in the `groups` claim.
+
+**`ErrIssuerMismatch` even though the discovery doc looks correct.**
+The `iss` claim in the ID token must match `OIDCProvider.IssuerURL` byte-for-byte. Some IdPs include / omit a trailing slash; check the per-IdP runbook section on `iss` formatting.
+
+**`oidc: pre-login session not found or already consumed`.**
+The user clicked the OIDC login button, then the browser tab idled past the 10-minute pre-login TTL OR the user opened the IdP login in a new tab and consumed the row from the first one. Have them retry from the login page.
+
+**`oidc: state parameter mismatch (replay or forgery)`.**
+Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns `ErrPreLoginNotFound`. Have them retry from the login page.
+
+**`Sessions revoked but the user can still hit the API.`**
+Check the session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `__Host-certctl_session` cookie wasn't actually cleared on the client, the cookie hits the server's session middleware which returns 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.
+
+**JWKS rotation: an IdP rotated its signing key and existing users start failing login.**
+Click **Refresh discovery cache** on the OIDC provider detail page (or `POST /api/v1/auth/oidc/providers/<id>/refresh`). The certctl service re-fetches discovery + JWKS. New tokens validate immediately. The Keycloak integration test exercises this drill end to end.
+
+**Database row count drift.**
+After OIDC is live, expect to see new rows under:
+- `oidc_providers` (one per configured provider)
+- `group_role_mappings` (one per configured mapping)
+- `users` (one per first OIDC-authenticated user; certctl auto-upserts on login)
+- `sessions` (one per logged-in browser session; idle 1h / absolute 8h GC)
+- `session_signing_keys` (one active + retained-history rows post rotation)
+- `oidc_pre_login_sessions` (transient; 10-minute TTL, scheduler-GC'd)
+
+All ten of these tables are tenant-scoped (`tenant_id` column); single-tenant deployments use the seeded `t-default` tenant.
+
+## What you can do next
+
+- Run [`docs/operator/oidc-runbooks/<your-idp>.md`](../operator/oidc-runbooks/index.md) end to end to fill in the validation checklist + sign-off line.
+- Read [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) for the steady-state + cold-cache performance baselines.
+- Review the [`auth-threat-model.md`](../operator/auth-threat-model.md) OIDC + sessions + break-glass sections to understand the failure modes the federated-identity surface defends against.
+- Schedule a rotation reminder for the OIDC `client_secret` (typically 6-12 months; the IdP doesn't auto-rotate it). Edit the provider via the GUI when the time comes; leaving `client_secret` blank in the edit form preserves the existing ciphertext, providing a value rotates.
+
+## `__Host-` cookie rename (BREAKING)
+
+v2.1.0 carries a wire-format change to the three auth cookies: they now carry the `__Host-` prefix. The cookie names are:
+
+- `__Host-certctl_session` (was `certctl_session`)
+- `__Host-certctl_csrf` (was `certctl_csrf`)
+- `__Host-certctl_oidc_pending` (was `certctl_oidc_pending`)
+
+The rename gains browser-enforced subdomain-takeover defense: a `__Host-*` cookie can only be set with `Path=/` + `Secure` + no `Domain` attribute, and the browser rejects any subdomain attempt to overwrite it. The protection is free (the existing cookies already met the prerequisites) but the wire-format change means:
+
+- **Every active session is invalidated by the deploy that lands this change.** Operators see one re-authentication prompt; subsequent logins issue the new `__Host-*`-prefixed cookie.
+- **The pre-login cookie's Path widens from `/auth/oidc/` to `/`** — required by the `__Host-` prefix. The cookie lifetime is unchanged (10 minutes) and is only ever consumed by the callback handler; the wider path scope is harmless.
+- **No operator action required beyond accepting the one-time re-login window.** The GUI's CSRF cookie reader was updated in lockstep; existing bookmarked deep links work without modification.
+
+If you have GUI customizations that read `document.cookie` directly, update them to look for `__Host-certctl_csrf` (the lookup in `web/src/api/client.ts` is the in-tree reference).
+
+## Cross-references
+
+- [`docs/operator/oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md) — per-IdP setup guides.
+- [`docs/operator/security.md`](../operator/security.md) — overall auth surface including this OIDC layer.
+- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) — threat model.
+- [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) — performance baselines.
+- [`docs/reference/auth-standards-implemented.md`](../reference/auth-standards-implemented.md) — RFC + CWE evidence list.
+- `internal/auth/oidc/` — OIDC service implementation.
+- `internal/auth/session/` — session minting + middleware + signing-key rotation.
--- a/Show More
+++ b/Show More