7 commits across Phases 0-7:
a31cef3 chore(ci): start bundle — baseline counts
0ab6bc4 feat(ci): item-1 complete-path config-coverage guard
e3a9317 feat(ci): item-2 cross-surface contract parity (internal/ciparity)
3fe5111 feat(ci): item-5 doc rot detector (90d warn / 120d fail)
3ede1b7 feat(ci): item-6 cold-DB compose smoke script
255f61e ci(workflows): wire bundle guards into ci.yml
9f7b5d8 docs(contributor): document the bundle's guards
What this closes:
Item 1 (complete-path config-coverage):
- scripts/ci-guards/complete-path-config-coverage.sh
- internal/config/coverage_test.go (Go-side)
- scripts/ci-guards/complete-path-config-coverage-exceptions.yaml
Pins every CERTCTL_* env var defined in config.go to have at least
one consumer outside internal/config/. Closes the lying-field bug
class (canonical: 2026-04-29 SCEP MustStaple Phase 5.6).
Item 2 (cross-surface contract parity):
- internal/ciparity/ (new stdlib-only package, 4 tests)
- scripts/ci-guards/surface-parity-mcp-exemptions.yaml
Pins the MCP tool catalogue floor (150) + naming convention + no
duplicates. CLI verb sweep is informational only per decision 0.9.
Router ↔ OpenAPI parity stays at the existing
TestRouter_OpenAPIParity in internal/api/router/.
Item 5 (doc rot detector):
- scripts/ci-guards/doc-rot-detector.sh
- scripts/ci-guards/doc-rot-detector-exceptions.yaml
90-day warn, 120-day fail (vs HEAD commit timestamp for
reproducibility). docs/archive/ allowlisted in bulk. No bootstrap
sweep needed — all 90 docs were ≤ 7 days old at branch creation.
Item 6 (cold-DB compose smoke):
- scripts/ci-guards/cold-db-compose-smoke.sh
- New .github/workflows/ci.yml job 'cold-db-compose-smoke'
- 15-min wall-clock cap; dumps service logs on failure
Catches the 2026-05-09 migration 000045 broken-INSERT bug class
that the warm-DB integration suite missed (commit def4be9).
Verification in sandbox:
- 32 of 33 shell guards green; cold-DB skipped (no Docker — runs
in its dedicated GH Actions job)
- gofmt clean across all new Go files
- go vet clean for internal/ciparity/ + internal/config/
- go test -short -count=1 PASS: ciparity 0.027s, config 0.664s
- YAML lint clean on ci.yml
- All 7 commits authored by shankar0123 <skreddy040@gmail.com>
Operator follow-up (sandbox couldn't run):
- 'make verify' from workstation (golangci-lint full pass)
- 'go test -race -count=10' parity
- First successful 'cold-db-compose-smoke' job run + add it to
master branch-protection required-checks list
- Phase 6 negative-test ladder pushed to GH Actions (4 branches:
one per guard introducing the regression)
Spec: cowork/auditable-codebase-bundle-prompt.md
Per-phase results: cowork/auditable-codebase-bundle/RESULTS.md
Audit-Closes: post-v2.1.0-anti-rot/item-1
Audit-Closes: post-v2.1.0-anti-rot/item-2
Audit-Closes: post-v2.1.0-anti-rot/item-5
Audit-Closes: post-v2.1.0-anti-rot/item-6
Three doc changes for the bundle's discoverability:
1. New docs/contributor/ci-guards.md (185 lines)
Entry-point doc for new contributors. Explains the four categories
of guards (code-shape, contract-parity, build/dep, operational),
the discipline that keeps them honest (allowlist + expiration),
and how to add a new one. Cross-references scripts/ci-guards/README.md
for the exhaustive list.
2. scripts/ci-guards/README.md — added a 'Forward-looking guards'
subsection naming complete-path-config-coverage, doc-rot-detector,
and cold-db-compose-smoke with their item references + a
one-sentence description of what each catches. Replaced the
stale '22 guards' header with 'Count: re-derive via ls' per the
no-version-stamped-numbers convention from CLAUDE.md.
3. docs/README.md — wired ci-guards.md into the Contributor section
navigation table.
Bumped 'Last reviewed:' to 2026-05-12 on the two docs touched
(docs/README.md, docs/contributor/ci-pipeline.md).
Verified: doc-rot-detector.sh green at 91 docs scanned, 89 dated, 0
warns, 0 fails.
Audit-Closes: post-v2.1.0-anti-rot/item-1
Audit-Closes: post-v2.1.0-anti-rot/item-2
Audit-Closes: post-v2.1.0-anti-rot/item-5
Audit-Closes: post-v2.1.0-anti-rot/item-6
Three changes to .github/workflows/ci.yml:
1. Add internal/ciparity/... to the Go Test with Coverage package
list. The four surface-parity tests run alongside everything else
and contribute to the coverage report.
2. Skip cold-db-compose-smoke.sh in the existing generic
regression-guards loop (under go-build-and-test). The script needs
Docker + a fresh postgres volume; including it here would always
fail because that job doesn't bring up compose.
The other two new Bundle guards
(complete-path-config-coverage.sh, doc-rot-detector.sh) are
plain-shell + Python and need no Docker — the existing
'for g in scripts/ci-guards/*.sh' loop auto-picks them up.
3. New top-level job: 'cold-db-compose-smoke'
- needs: go-build-and-test (don't waste compute if the basics are red)
- 15-min wall-clock cap (image pull + compose-up + probe + teardown)
- Dumps compose logs on failure for postgres + certctl-server +
certctl-agent + certctl-tls-init so the failure is actionable
without a re-run.
Validated:
- python3 -c 'import yaml; yaml.safe_load(...)' → yaml ok
Operator follow-up:
- Add 'cold-db-compose-smoke' to the master branch-protection
required-checks list once the first successful run lands.
Audit-Closes: post-v2.1.0-anti-rot/item-6
scripts/ci-guards/cold-db-compose-smoke.sh — wipes the postgres
volume (docker compose down -v), brings the stack up cold, mints a
day-0 admin via /api/v1/auth/bootstrap, issues + renews + revokes a
test certificate, asserts the three audit rows exist, tears down.
Catches the bug class fixed by commit def4be9 (the 2026-05-09
migration 000045 broken INSERT that the warm-DB integration suite
missed). The 2026-04-30 migration regression class generally.
Tunables via environment:
- COLD_DB_SMOKE_STARTUP_TIMEOUT (default 300s/svc)
- COLD_DB_SMOKE_PROBE_TIMEOUT (default 180s)
- COLD_DB_SMOKE_SERVER_URL (default https://localhost:8443)
- COLD_DB_SMOKE_CACERT (default deploy/test/certs/ca.crt)
On failure: dumps `docker compose logs --tail 200` for postgres,
certctl-server, certctl-agent, certctl-tls-init so the CI failure is
actionable without a re-run.
Sandbox VERIFICATION: bash syntax-check (bash -n) passes. Full smoke
run NOT executed in the sandbox — no Docker available here. The
operator runs it from their workstation as the Phase 6 negative-test
ladder (introducing a broken migration; confirming the script fails
with the migration error in the dumped logs).
CI wiring (.github/workflows/ci.yml::cold-db-compose-smoke job)
lands in the next commit (Phase 5).
Audit-Closes: post-v2.1.0-anti-rot/item-6
scripts/ci-guards/doc-rot-detector.sh — walks every *.md under docs/,
parses the '> Last reviewed: YYYY-MM-DD' blockquote convention
established by the 2026-05-04 docs overhaul, emits:
- ::warning:: GitHub annotation when a doc is >= 90 days old
(heads-up; non-blocking).
- ::error:: + exit 1 when >= 120 days (build-blocking).
Uses HEAD commit timestamp (git log -1 --format=%cs) as 'now' rather
than wall clock — keeps the guard reproducible on a release that's
been on a shelf.
Verified in sandbox:
- Clean run: 90 docs scanned, 88 dated (2 in docs/archive/
allowlisted in bulk), 0 missing field, 0 warns, 0 fails.
- Negative test (backdated docs/README.md to 2025-12-01, 162d):
fires with '::error::Docs older than 120 days (build-blocking)'
+ three remediation paths listed.
Allowlist at scripts/ci-guards/doc-rot-detector-exceptions.yaml:
- 'docs/archive/' bulk-allowlisted (intentionally frozen content)
- Per-doc entries require name + justification + expiration date;
expired entries fail the guard.
Bootstrap sweep NOT required — baseline survey at branch creation
shows oldest doc is 7 days old (2026-05-05); zero docs over either
threshold today. Forward-looking insurance only.
Audit-Closes: post-v2.1.0-anti-rot/item-5
internal/ciparity/ — new stdlib-only package with four tests:
1. TestSurfaceParity_MCPToolCatalogue (HARD GATE):
- Every MCP tool name conforms to certctl_<word>(_<word>)*
- No duplicate names across the five tools*.go files
- Total tools ≥ mcpBaselineFloor (150; current count 155)
Catches accidental tool deletions + naming-convention drift.
2. TestSurfaceParity_CLICommandCatalogue (INFORMATIONAL):
Walks cmd/cli/main.go's switch-case dispatcher. Logs the 31
distinct verbs. Per frozen decision 0.9, warn-only until the CLI
surface stabilizes.
3. TestSurfaceParity_OpenAPI_MCPHeuristicCoverage (INFORMATIONAL):
Reports the fraction of OpenAPI ops whose path tokens overlap
with MCP tool name tokens. Trend metric; current coverage 92%.
4. TestSurfaceParity_Summary (INFORMATIONAL):
One-glance count of router routes / OpenAPI ops / MCP tools / CLI
verbs. Easy eyeball for a PR reviewer.
Verified in sandbox:
- gofmt clean
- go vet clean
- go test -short -count=1: all four PASS in 0.017s
Stdlib-only by design — the tests read source files with os.ReadFile +
regexp + go/ast. Keeps the test runnable without pulling in the rest
of the codebase's transitive deps; fast self-contained signal.
Router ↔ OpenAPI parity (TestRouter_OpenAPIParity) stays in
internal/api/router/openapi_parity_test.go where it already lives.
This bundle does not duplicate it.
Allowlist scaffold at scripts/ci-guards/surface-parity-mcp-exemptions.yaml
for the day TestSurfaceParity_OpenAPI_MCP* is promoted from
informational to hard gate.
Audit-Closes: post-v2.1.0-anti-rot/item-2
Shell guard verified working in sandbox:
- Green on clean repo: 'OK — every CERTCTL_* env var (194) has at least
one non-config-package consumer.'
- Red on injected orphan: '::error::Orphan env vars — defined in
config.go but no consumer found outside internal/config/' with three
remediation paths listed.
Go test internal/config/coverage_test.go written but NOT verified —
sandbox Go 1.25.9 < go.mod's 1.25.10 requirement; toolchain
auto-download fails (disk full). Operator must run `make verify` from
workstation before merge.
Allowlist scaffold at scripts/ci-guards/complete-path-config-coverage-exceptions.yaml.
Every entry requires name + justification + expires fields; expired
entries fail the guard.
Catches the lying-field bug class — env var defined in config.go that no
business-logic code reads. The 2026-04-29 SCEP MustStaple Phase 5.6 gap
(domain field shipped, service layer never read profile.MustStaple) is
the canonical case this guard would have caught at commit time.
Audit-Closes: post-v2.1.0-anti-rot/item-1
Earlier versions were either link-soup or so tight they read as
boilerplate. This pass aims for CMO-grade copy:
- Paragraph 1: lede that combines the early-access label with the
design-partner ask — sets the tone in one line.
- Paragraph 2: what's production-quality today, with the RBAC + OIDC
doc links inline (no bold, no link-soup). Names the v2.1.0 layer
on top.
- Paragraph 3: the ask — production deployments wanted, framed
explicitly as 'we can't manufacture this exposure in CI'. Honest
about the federated-identity surface being where the new exposure
lives. Mutual-value framing.
- Paragraph 4: the actionable bit — file issues liberally, with the
why ('how the platform earns the right to drop early-access').
Three inline doc links (RBAC, OIDC runbook index, file-issues).
Same factual content, warmer voice, paragraph cadence with
breathing room between.
Quieter version of the Status block — single blockquote, three short
sentences, three inline links (RBAC, OIDC, file-issues). Drops:
- The Local-CA / ACME / agent-deployment / CRUD / audit feature pile
(those live in the doc table immediately below)
- The 6-IdP enumeration (Keycloak / Authentik / Okta / Auth0 / Entra
ID / Google Workspace) — operators find that in the OIDC runbook
index, now linked inline
- The double 'in early-access' phrasing
- 'HMAC-signed server-side sessions with __Host- cookies and CSRF
rotation; OIDC Back-Channel Logout; Argon2id break-glass admin' —
the spec details belong in the auth-threat-model + security docs,
not the front-page status
Same early-access framing, same issue-link CTA, far more readable.
The previous version crammed 5 bold-emphasized inline links plus
inline code into a single paragraph — visually loud and hard to
scan. Rewrite as two short paragraphs:
- First paragraph: what's production-quality + what's still
maturing. No links, em-dash cadence for breathing room.
- Second paragraph: v2.1.0 OIDC + sessions + break-glass slice
with a single issue-link tail. Drops the bold-link sandwich
in favor of plain prose; the doc-nav table directly below
handles per-doc routing.
Same content, same early-access framing, far less visual noise.
Phase-9 docker compose smoke surfaced a latent production-breaking
bug introduced by commit 89b910a (H-6 atomic pending-job claim). The
ClaimPendingByAgentID query in internal/repository/postgres/job.go
combined UNION ALL with FOR UPDATE SKIP LOCKED in a single statement.
Postgres rejects this with:
ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT
Every agent work-poll returns HTTP 500 in any real deployment where
an agent is actually polling. From the compose log:
request_id=6da47015-... GET /api/v1/agents/agent-demo-1/work
status=500 duration_ms=2
The schema-per-test unit harness in internal/repository/postgres/
*_test.go never inserted jobs and polled, so the SQL execution path
was never exercised. The bug has been latent in master since 89b910a
landed.
Fix: split the UNION ALL into two separate FOR UPDATE SKIP LOCKED
queries within the existing transaction. The H-6 atomicity invariant
(concurrent pollers never see the same Pending row) is preserved
because:
1. The two queries run inside the same transaction (tx).
2. Each query independently locks its result rows with
FOR UPDATE SKIP LOCKED.
3. The subsequent UPDATE that flips Pending -> Running runs in
the same transaction, so the rows stay invisible to concurrent
callers from initial SELECT through final COMMIT.
4. The transaction is the unit of consistency, not the single
SQL statement.
Two queries:
- Branch 1 (direct): jobs.agent_id = + status='Pending' +
type='Deployment'. ORDER BY created_at ASC, FOR UPDATE SKIP LOCKED.
- Branch 2 (fallback): jobs.agent_id IS NULL + INNER JOIN
deployment_targets dt ON jobs.target_id = dt.id WHERE
dt.agent_id = . ORDER BY j.created_at ASC, FOR UPDATE OF j
SKIP LOCKED (FOR UPDATE OF needed because the join brings in dt).
Branch 3 (AwaitingCSR) is unchanged — already a single SELECT,
not affected by the UNION restriction.
Inline comment explains the fix's load-bearing-ness so a future
refactor doesn't merge them back into one UNION query.
Verify (sandbox): go vet clean; go test -short -count=1 PASS on
internal/repository/postgres/. Workstation re-runs 'docker compose
up' to confirm the agent's GET /work returns 200 with the next
pending-deployment claim.
Note: this is NOT a regression introduced by Auth Bundle 2 or the
2026-05-11 audit fixes; it's a pre-existing latent defect from H-6.
Including in v2.1.0 because shipping with a broken agent work-poll
would block the demo path on day one of release.
The v2.1.0 release-gate Phase-9 docker compose smoke run against a
fresh Postgres surfaced two real defects in the migration files that
testcontainers schema-per-test never exercised. Both reproduce by
running 'docker compose down -v && docker compose up --build'
against the current master tree.
Bug A — migration 000045_users_deactivated_at.up.sql is malformed.
The 000029 schema defines:
permissions (id TEXT PRIMARY KEY, name TEXT NOT NULL UNIQUE,
namespace TEXT NOT NULL)
role_permissions (..., permission_id TEXT NOT NULL REFERENCES ..., ...)
But 000045 was written as:
INSERT INTO permissions (name) VALUES ... -- missing id + namespace
INSERT INTO role_permissions (role_id, permission, ...) VALUES ...
^^ wrong column name
On a cold-DB run this fails immediately with:
pq: null value in column "id" of relation "permissions"
violates not-null constraint
Fix: provide id + namespace columns, use permission_id (the actual
column name), ON CONFLICT (id) DO NOTHING. The new permission ids
follow the existing 'p-auth-*' prefix convention (p-auth-user-read +
p-auth-user-deactivate) used by 000029.
Bug B — migration 000029_rbac.up.sql is not idempotent post-000043.
000029 originally created actor_roles with:
UNIQUE (actor_id, actor_type, role_id, tenant_id)
Audit 2026-05-10 HIGH-10 closure / migration 000043 drops that
constraint and re-creates it WITH scope columns:
UNIQUE (actor_id, actor_type, role_id, scope_type, scope_id, tenant_id)
The migration runner (internal/repository/postgres/db.go::RunMigrations)
is naive — no tracker table — and re-runs every *.up.sql file on
every server boot. On the second-and-later boots, 000029's seed
INSERT for actor-demo-anon-admin still references the
pre-000043 constraint name in its ON CONFLICT clause:
ON CONFLICT (actor_id, actor_type, role_id, tenant_id) DO NOTHING
Postgres errors out with:
pq: there is no unique or exclusion constraint matching the
ON CONFLICT specification
Fix: pin the conflict target to the row's primary key 'id' column
(always present, never altered). The seed row's deterministic id
'ar-demo-anon-admin' makes ON CONFLICT (id) work under both pre-
and post-000043 schemas.
Why testcontainers schema-per-test missed these:
Each test in internal/repository/postgres/*_test.go spins up a
fresh schema and applies every .up.sql in order ONCE. The full
'000029 -> 000043 -> retry 000029' cascade never happens because
migrations don't re-run within a test. Phase-9 docker compose
smoke is the only test path that exercises the server-restart-
on-error retry, which is exactly the missing coverage.
Verify (sandbox): go test ./internal/repository/postgres/ PASS.
Workstation re-runs 'docker compose down -v && docker compose up'
to confirm both bugs are closed.
Phase-10 live-IdP smoke (post-iss-param fix landing in 360e744) advanced
4 of 6 integration tests to green. The remaining 2 — the realm-key
rotation tests — failed with:
admin-cli token: HTTP 401
at the master-realm token endpoint. Root cause: Keycloak 26.x has TWO
admin-bootstrap env-var pairs and the right pair depends on the launch
command:
- 'start' (production): KC_BOOTSTRAP_ADMIN_USERNAME +
KC_BOOTSTRAP_ADMIN_PASSWORD
- 'start-dev': KEYCLOAK_ADMIN + KEYCLOAK_ADMIN_PASSWORD
The fixture sets KC_BOOTSTRAP_ADMIN_USERNAME + KC_BOOTSTRAP_ADMIN_PASSWORD
but runs 'start-dev'. The bootstrap pair is silently ignored in dev-mode,
leaving the master realm with no admin user → admin-cli token endpoint
returns 401 → RotateRealmKeys can't authenticate to the Admin API.
The 4 auth-code flow tests passed because they authenticate the engineer /
viewer test users INSIDE the certctl realm (created by the realm import),
which doesn't need a master admin.
Fix: set BOTH pairs as belt-and-braces. The legacy KEYCLOAK_ADMIN pair
covers start-dev today; the KC_BOOTSTRAP_ADMIN_* pair keeps a future flip
to 'start' working. Inline comment in the fixture explains the why so a
future reader doesn't drop one back.
Verify (sandbox): go vet -tags=integration clean; gofmt clean. Workstation
re-runs 'make keycloak-integration-test' to confirm the 2 rotation tests
now reach + execute the Admin API successfully.
Phase-10 live-IdP smoke (post-Enabled-true fix landing in 1b52998)
surfaced the next layer: 5 of 6 testcontainers-Keycloak integration
tests failed with 'oidc: provider advertises iss-parameter support
but callback omitted it'.
Root cause: Keycloak's discovery doc advertises
authorization_response_iss_parameter_supported=true. The Audit
2026-05-10 MED-17 closure (RFC 9207) gates the callback path:
when the IdP advertises iss-param support, HandleCallback requires
a non-empty callbackIss arg that matches the provider's IssuerURL,
else ErrIssParamMissing. The 7 HandleCallback call sites in the
integration tests were passing '' for the callbackIss arg — the
synthetic test code never simulated the real browser's
'?iss=<issuer>' query param.
Fix: replace '' with fx.IssuerURL at all 7 sites:
- integration_keycloak_test.go: 5 sites
(TestKeycloakIntegration_AuthCodeFlow_HappyPath,
TestKeycloakIntegration_LogoutRevokesSession,
TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey
pre+post HandleCallback,
TestKeycloakIntegration_UnmappedGroupsFailsClosed)
- integration_keycloak_rotate_test.go: 2 sites
(TestKeycloakIntegration_MED6_AutoRefreshOnKidMiss pre+post)
Inline note on the first site explains the rationale so future
test-writers don't drop back to ''.
Verify (sandbox): go vet -tags=integration ./internal/auth/oidc/...
clean; gofmt clean; grep for remaining empty-iss callsites returns
0 matches. Workstation re-runs 'make keycloak-integration-test' to
confirm the 5 affected tests advance past the iss-param check
against a real Keycloak 26.x.
Phase-10 live-IdP smoke re-run (after the alg-downgrade relax landed in
fefeccf) surfaced the next layer: 5 of 6 testcontainers-Keycloak
integration tests failed with 'oidc: provider is disabled'.
Root cause: the OIDCProvider struct literal in
internal/auth/oidc/testfixtures/keycloak.go omits the Enabled field.
Enabled was added by Audit 2026-05-11 MED-9 (Bundle 2 Fix 13 Phase B);
pre-fix the field didn't exist and HandleAuthRequest always proceeded.
Post-fix the default zero-value false gates every integration test
behind ErrProviderDisabled at service.go L478.
Fix: add Enabled: true to the struct literal + inline comment explaining
why the field is required for integration tests. The check is the right
behavior for production (operator-driven disable kill-switch); just
needed to be reflected in the testfixture.
Verify (sandbox): go vet -tags=integration ./internal/auth/oidc/...
clean. Workstation re-runs 'make keycloak-integration-test' to confirm
the 5 affected tests now pass against a real Keycloak 26.x.
Phase-10 live-IdP smoke (Keycloak 26.x via testcontainers-go) revealed
the IdP-bind alg-downgrade check was too strict for real-world IdPs.
6 of the integration tests in internal/auth/oidc/integration_keycloak*_test.go
were failing with:
oidc: IdP advertises weak signing algorithms (HS*/none);
refusing to use as defense against downgrade attacks: HS256
Keycloak 26.x (and several other real-world IdPs — Auth0 when HS-mode is
enabled, some Authentik configs) advertise EVERY alg they're capable of
in the discovery doc's id_token_signing_alg_values_supported field, even
when the realm only signs with RS256 in practice. Pre-fix the IdP-bind
check refused on ANY HS* or 'none' advertisement → no real Keycloak deploy
could ever bind a provider row, hence the integration-test failures.
The strict-deny check was defense-in-depth on top of the load-bearing
per-token alg-pin at sig-verify time (isDisallowedAlg, service.go L1177):
that check rejects every ID token whose JWS header carries an alg outside
DefaultAllowedAlgs, regardless of what the discovery doc advertises.
A forged HS256 token signed with the IdP's RS256 pubkey as HMAC secret
is rejected at sig-verify time → the actual algorithm-confusion attack
is closed by the per-token pin, NOT by the discovery-doc check.
Fix: relax the IdP-bind check to refuse only when the intersection of
advertised vs DefaultAllowedAlgs is EMPTY (the pathological all-weak-alg
IdP case). Keycloak (RS256 + HS256 advertised) now binds successfully;
an HS-only IdP still fails closed.
Changes:
- internal/auth/oidc/service.go: rewrite the alg-check loop at L1067 in
getOrLoad / RefreshKeys to compute the intersection set; refuse only
when no acceptable alg is advertised. ErrIdPDowngradeAdvertised
docstring updated to reflect new contract. DefaultAllowedAlgs
docstring + the package-level design-comment block at L40-72 updated
with v2.1.0-relaxed semantics callouts.
- internal/auth/oidc/test_discovery.go: TestDiscovery dry-run validator
rewritten to surface HS*/none alongside RS* as an informational note
('note: IdP advertises weak algorithms %v alongside acceptable ones')
rather than a hard-fail error. HS-only / none-only still hard-fails.
- internal/auth/oidc/service_test.go: TestService_IdPDowngradeDefense_*
tests updated. Renamed:
- RejectsHSAdvertised → RS256PlusHS256_BindsSuccessfully (positive)
- RejectsNoneAdvertised → RejectsHSOnlyAdvertised (intersection-empty)
- RefreshKeys_CatchesPostLoadDowngrade rotated to HS-only post-load
- internal/auth/oidc/coverage_fill_test.go: TestTestDiscovery_AlgDowngradeDetected
split into _HS256AlongsideRS256_BindsWithNote (positive, asserts note
but no hard-fail) + _HSOnly_StillTrips_HardFail (intersection-empty).
- docs/operator/auth-threat-model.md: OIDC token-validation alg-allow-list
section rewritten to call out the load-bearing-defense hierarchy
(per-token pin first, IdP-bind check defense-in-depth) and document
the v2.1.0 relaxation rationale.
- CHANGELOG.md: ### Security entry under Unreleased.
Verify: go test ./internal/auth/oidc/ -short PASS; gofmt clean; go vet
clean. The Keycloak integration tests should now pass when the operator
re-runs 'make keycloak-integration-test'.
Audit 2026-05-10 HIGH-8 closure landed a parseWWWAuthenticateCause()
call in api/client.ts (line 144) that reads res.headers.get(...) on the
401 path. The two test files in web/src/api/ both provide a Response
mock with no headers property, so every 401 test threw 'Cannot read
properties of undefined (reading get)' instead of the expected
'Authentication required'.
13 tests fail without this fix: 12 in client.error.test.ts (one per
401-mapped endpoint helper) + 1 in client.test.ts (the auth-required
event-dispatch test).
Fix: add headers: { get: () => null } to both mockErrorResponse helpers.
The null return short-circuits parseWWWAuthenticateCause to the default
'Authentication required' message, so every existing 401 assertion
keeps passing.
Four scripts/ci-guards/*.sh trips on dev/auth-bundle-2 vs master:
1. G-3-env-docs-drift: 10 CERTCTL_* env vars added by Auth Bundle 2 +
audit-2026-05-10/11 fix bundle were not in docs/. Added a new 'Auth
(Bundle 1 + Bundle 2)' section to docs/reference/configuration.md
covering CERTCTL_SESSION_BIND_USER_AGENT, CERTCTL_SESSION_GC_INTERVAL,
CERTCTL_OIDC_BCL_MAX_AGE_SECONDS, CERTCTL_OIDC_PRELOGIN_REQUIRE_UA/IP,
CERTCTL_DEMO_MODE_ACK, CERTCTL_TRUSTED_PROXIES + _COUNT (synthesised),
CERTCTL_BOOTSTRAP_* set, CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD. Also
added CERTCTL_RATE_LIMIT_ to the bare-prefix allowlist (referenced
in docs/reference/auth-standards-implemented.md prose).
2. bundle-8-M-009-bare-usemutation: BreakglassPage shipped 3 bare
useMutation() calls instead of useTrackedMutation. Migrated all
three to useTrackedMutation with invalidates: [['breakglass']].
3. multi-tenant-query-coverage: Defense-in-depth tenant_id additions
in the fix bundle dropped the missing-tenant-id query count from 32
to 31. Ratcheted baseline 32 -> 31 (forward-only invariant).
4. openapi-handler-parity: 28 new REST endpoints from Bundle 2 + the
fix bundle missing from api/openapi.yaml. Added them to
api/openapi-handler-exceptions.yaml with per-route 'why:'
justifications. OpenAPI schema generation deferred to pre-v2.2.0
alongside the GUI E2E coverage push; threat model + handler
contracts already live in docs/operator/{rbac,auth-threat-model,
oidc-runbooks}.md.
After this commit every script in scripts/ci-guards/*.sh exits 0.
Five golangci-lint v2 findings surfaced when running the v2.1.0 release
gate (auth-bundle-2 → master pre-flight). Each is mechanical:
1. govet/printf-style misuse — internal/auth/oidc/service_test.go used
integer literal 501 in http.Error; switched to http.StatusNotImplemented.
2. staticcheck SA1019 — internal/auth/breakglass/reflect_helper_test.go
referenced reflect.Ptr; the canonical name since Go 1.18 is
reflect.Pointer.
3. staticcheck ST1020 — internal/repository/postgres/auth.go
ActorRoleRepository.Revoke had a doc comment that did not begin with
the method name. Prepended 'Revoke drops actor_roles rows.' to the
comment so it now starts with the method name.
4. staticcheck ST1022 — internal/api/handler/auth_session_oidc.go
DefaultBCLVerifierMaxAge docstring was attached to the DefaultBCLVerifier
type docstring. Moved the const docstring directly above the const
declaration, separated by a blank line.
5. unused — internal/auth/session/bench_test.go declared
benchSessionMinSamples and never referenced it; the bench loop relies
on Go's default b.N scaling. Replaced the const block with a comment
describing the rationale.
Lint clean (golangci-lint v2.12.2 with the .golangci.yml config) on the
five edited packages.
Phase 1 (make verify) of cowork/v2.1.0-release-gate.md surfaced three
files with pre-existing gofmt drift that pre-dated the 2026-05-11 fix
bundle work:
internal/auth/oidc/domain/types.go
internal/auth/oidc/integration_keycloak_rotate_test.go
internal/auth/oidc/test_discovery.go
The 2026-05-11 Fix 08 fmt-cleanup commit (b8fac59) fixed four files
that the merge introduced; these three were noted as pre-existing
master drift and intentionally left untouched at the time. The
v2.1.0 release-gate spec's Phase 1 requires zero gofmt output from
'go fmt ./...' (Makefile::verify form), so the drift must close
before tagging.
Pure whitespace alignment, no semantic change.
Audit 2026-05-11 Fix 13 closure. The HIGH-2 closure on
dev/auth-bundle-2 documented four RotateCSRFTokenForActor call
sites — login completion (fresh by construction), Assign/Revoke
RoleToKey (wired at internal/api/handler/auth.go:498 + 546),
Logout, and an explicit operator endpoint. The 2026-05-11
adversarial review observed only 3 of the 4: Logout did NOT
rotate the actor's sibling sessions post-revoke.
Threat closed: a token captured pre-logout (browser DevTools,
malicious extension, session-storage leak) could be replayed
against the user's other-device/other-browser sessions until
those sessions hit their own idle/absolute expiry. Rotation on
logout defeats this — the captured token is dead the moment
the user clicks 'Sign out' anywhere.
What this changes:
* internal/api/handler/auth_session_oidc.go::SessionMinter
interface gains RotateCSRFTokenForActor(ctx, actorID,
actorType string) int. Nil-safe semantics by convention —
the production wiring is *session.Service which already
implements the method; rotation NEVER errors (returns int
count, swallows per-row failures via the underlying
Service.RotateCSRFToken) so it can't block the surrounding
Revoke that triggered it.
* internal/api/handler/auth_session_oidc.go::Logout calls
RotateCSRFTokenForActor after Revoke(sess.ID) succeeds. The
auth.session_revoked audit row gains a csrf_rotated detail
key carrying the count so SOC/SIEM can correlate logout
events with CSRF churn on sibling sessions.
* The no-cookie + invalid-cookie 204 short-circuit paths
skip rotation. No session row exists to rotate against;
the caller is already unauthenticated. Rotation on those
paths would do nothing useful and pollute the audit log.
Test coverage in internal/api/handler/auth_session_oidc_test.go:
* TestLogout_RotatesCSRFForActor — happy path. Mocks
rotateCSRFReturnCount=2; asserts Revoke fires before
rotation, rotation fires exactly once with caller's
(actor_id, actor_type), audit details carry csrf_rotated=2.
* TestLogout_NoCookie_SkipsCSRFRotation — pins the 204
short-circuit branch when there's no cookie. Rotation count
stays at 0.
* TestLogout_InvalidCookie_SkipsCSRFRotation — pins the 204
short-circuit branch when Validate rejects the cookie.
Same rationale: no session row, no rotation.
The stubSession test fake gains RotateCSRFTokenForActor with
call-recording fields; the phase5StubAudit gains a details
slice append-aligned 1:1 with events so the happy-path test
can index into the latest entry and assert the count.
Spec Phase 3 (explicit operator endpoint) — intentionally
NOT shipped. The three automatic triggers (login + role-
mutation + logout) cover the HIGH-2 threat model; operators
who want a nuclear option can use the existing
RevokeAllForActor flow which forces re-login → fresh session
→ fresh CSRF. Adding a dedicated POST /api/v1/auth/sessions/
rotate-csrf admin endpoint would be defense-in-depth without
new attack-surface coverage. Documented in the audit-doc
annotation.
Verify gate:
* gofmt -l — clean
* go vet ./internal/api/handler/... — clean
* go build ./cmd/server/... ./internal/... — clean (production
*session.Service satisfies the extended interface
out of the box)
* go test -short -count=1 ./internal/api/handler/...
./internal/auth/session/... — all green; 3 new Logout
cases + the 2 pre-existing Logout cases all pass.
Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md
flips the HIGH-2 row from 'CLOSED 2026-05-10 (3/4 call sites
wired)' to 'A-B-3 verified 2026-05-11: HIGH-2 fully closed
across all four documented call sites.'
Refs cowork/auth-bundles-fixes-2026-05-11/13-verify-logout-csrf-rotation.md.
Audit 2026-05-11 Fix 12 closure. The original GUI-batch commit
191384c claimed 'npx tsc --noEmit PASS' but shipped no Vitest
cases for the new surfaces, leaving the regression-prevention
layer wide open. This closure backfills 35 cases across five
files; the next refactor of KeysPage's assign modal that drops
scope_type, or the AuthProvider demo-banner predicate that
gets flipped to !authRequired, surfaces in CI instead of
silently shipping.
What's added:
* web/src/pages/auth/UsersPage.test.tsx (NEW, 8 cases) — pins
the MED-11 closure's UsersPage flow: active rows render the
Active status pill, deactivated rows render dimmed with the
Deactivated <timestamp> status, Deactivate button fires the
API call after confirm() returns true and is a no-op on
false, Reactivate button works inversely, provider filter
narrows the underlying authListUsers call (undefined vs
provider-id), empty list renders the placeholder, loading
renders 'Loading users…'.
* web/src/pages/auth/AuthSettingsPage.test.tsx (EXTENDED, +4
cases) — the pre-existing 2 cases only exercised identity +
bootstrap status; the runtime-config panel (MED-12 closure)
had no test. New cases cover: per-key row rendering,
alphabetical sort (stable for log-scraping correlation),
empty-value '(empty)' placeholder, 403 rejected query
silently hides the panel (non-admins shouldn't see the
shell).
* web/src/pages/auth/KeysPage.test.tsx (EXTENDED, +8 cases) —
the HIGH-10 GUI half added scope picker + scope_id input +
expires_at datetime-local to the assign modal but the
pre-existing test only asserted (actor, role). New cases
pin the third opts arg shape: global hides scope_id input,
profile/issuer scope reveal scope_id + mark required,
trimmed scope_id round-trips into the body, global omits
scope_id (undefined NOT empty string), empty expires_at
omits the field, filled expires_at gets :00Z appended for
RFC3339 promotion, whitespace-only scope_id fires the
'scope_id is required' typed error WITHOUT calling the
API, actor-demo-anon row hides both assign and revoke
affordances.
* web/src/pages/auth/RoleDetailPage.test.tsx (NEW, 9 cases) —
no test file pre-Fix 12. Pins the MED-8 scope picker for
AddPermissionForm: global hides scope_id, profile reveals +
gates the Add button until scope_id is filled, submit POSTs
{permission, scope_type: profile, scope_id} with whitespace
trimming, global submit omits scope keys entirely, issuer
scope path, Add button stays disabled without a permission
selection. Plus the LOW-11 default-role delete-button hide:
r-admin renders the role-delete-disabled-tooltip + NO
role-delete-button, r-auditor same, custom role renders the
delete button. The DEFAULT_ROLE_IDS set tracking the
migration-seeded role ids is the load-bearing client-side
decision so a future drift between migrations and the GUI
set surfaces here too.
* web/src/components/AuthProvider.test.tsx (NEW, 5 cases) —
the LOW-1 demo banner had no test for its visibility
predicate. Pins all four authType branches (none → visible,
api-key → hidden, oidc → hidden, loading → hidden to avoid
flash) plus the rejected-getAuthInfo branch: the catch
treats failure as an old-server-fallback to demo mode (no
authType mutation, loading flips false), so the banner
SHOWS — that's the actual behavior, and pinning it prevents
a future change from silently hiding the banner when the
/auth/info endpoint is unreachable.
Spec deviations: Phase 6 (Layout.test.tsx users-nav) and
Phase 7 (per-Fix tests for Fixes 03/05/07/09/10) live on those
fixes' own branches — already authored there. Including them
here would have produced merge conflicts.
Verify gate:
* tsc --noEmit — clean
* vitest run touched files — 40/40 pass (8 + 6 + 12 + 9 + 5,
including the 2 + 4 + 4 pre-existing cases in the extended
AuthSettingsPage + KeysPage files)
* full suite (162 tests across 15 files) green — no regression
from the panel-mount-in-existing-page setup or the new
mocked-module entries.
Refs cowork/auth-bundles-fixes-2026-05-11/12-test-vitest-gui-coverage.md.
Audit 2026-05-11 Fix 11 closure. The MED-11 closure shipped
web/src/pages/auth/UsersPage.tsx and wired the /auth/users route
in web/src/main.tsx, but the sidebar nav never gained a
corresponding entry. Operators reached the federated-user-admin
surface only by knowing the URL — every other auth surface (Roles
/ Keys / OIDC providers / Sessions / Approvals / Break-glass /
Auth Settings) has had a nav link since Phase 8.
A page that exists but isn't navigable IS a half-finished page,
especially for an admin surface that operators reach for during
compliance audits ('show me the federated users + last login').
30 minutes closes the inconsistency.
What this changes:
* web/src/components/Layout.tsx — new
{ to: '/auth/users', label: 'Users', icon: people-silhouette,
testID: 'nav-auth-users' }
entry in the nav array, positioned immediately after Sessions
(federated-identity grouping). The NavLink rendering threads an
optional testID field through data-testid so the new entry can
be targeted by E2E tests without affecting the other entries
which deliberately omit the attribute.
* Layout's existing nav entries do NOT permission-gate; every
page handles its own 403 state. UsersPage already returns an
ErrorState directing the user to auth.user.read for callers
without the perm. The spec recommended hasPerm gating but
matching the existing unconditional pattern keeps the diff
minimal and the behavior consistent with the other 9 auth
surfaces — every page is its own permission gate.
Tests added in web/src/components/Layout.test.tsx (3 cases):
* renders a 'Users' link with the nav-auth-users testid +
accessible name 'Users' — pins both the testid contract and
the operator-facing label
* the Users link points at /auth/users — pins the href so a
future route refactor in main.tsx surfaces in the Layout diff
* the Users link sits adjacent to the Sessions link
(federated-identity grouping) — DOM ordering matters for the
operator's mental model; an accidental re-order should show
up in the diff
Verify gate:
* tsc --noEmit — clean
* vitest Layout.test.tsx — 7/7 pass (4 pre-existing Setup-guide
tests + 3 new Users-nav tests)
Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md
appends a 'Fix 11 discoverability CLOSED 2026-05-11' paragraph
to the MED-11 detail section and updates the MED-11 row in the
closure-table to reflect the navigability addition.
Refs cowork/auth-bundles-fixes-2026-05-11/11-med-users-sidebar-nav.md.
Audit 2026-05-11 Fix 10 closure. MED-7's backend endpoint
GET /api/v1/auth/oidc/providers/{id}/jwks-status (commit 172b30b)
shipped the per-provider verifier counters on dev/auth-bundle-2
but the GUI never called it — authOIDCJWKSStatus in the API
client was dead code. The audit doc had prematurely flipped the
MED-7 row to CLOSED; this closure makes the claim true.
Operator gap before this fix: operators investigating 'why is
login failing for this IdP?' could not see last_refresh_at,
rejected_jws_count, or last_error from the GUI. They had to drop
to curl.
New shared component web/src/pages/auth/OIDCJWKSStatusPanel.tsx
queries the endpoint via TanStack Query and renders six dt/dd
rows with operator-readable sentinels for each empty case:
* Last refresh — RFC 3339 timestamp; '(never — cold cache)'
sentinel when the IdP has never been hit.
* Refresh count — cumulative since process boot.
* Rejected JWS count — number of ID tokens that failed signature
verification. Step-changes correlate to IdP key rotations.
* Last error — most recent JWKS-refresh failure (sanitized — no
token content). Red treatment when non-empty; '(none)' sentinel
for healthy state.
* RFC 9207 iss param — 'supported by IdP' / 'not advertised'.
Informational only; the operator-side verifier still demands
the param by default.
* Current KIDs — cache contents; '(not exposed — query jwks_uri
directly)' sentinel when the backend declines to expose the
list (the backend may withhold them for opacity).
Refresh-now button:
* Calls POST /api/v1/auth/oidc/providers/{id}/refresh
(RefreshKeys path), then invalidates the panel's query so the
freshly-updated counters render without a page reload.
* Refresh failures surface as an inline red rectangle and do NOT
hide the existing snapshot — partial visibility is better than
no visibility.
* Hidden when the optional canRefresh prop is false. The
OIDCProviderDetailPage mount wires canRefresh to
useAuthMe().hasPerm('auth.oidc.edit') so viewer-class callers
see the read-only panel.
Permission gating:
* The backend endpoint is gated auth.oidc.list. Callers without
the permission get HTTP 403; the panel's TanStack query is
configured with retry: 0 so a 403 doesn't drown the page in
retries, and the panel returns null when the query errors —
hiding silently for callers who can't see the data.
* The Refresh-now button is hidden for callers without
auth.oidc.edit. Read-only callers still see the panel +
counters.
Mount: OIDCProviderDetailPage.tsx between the read-only field
display section and the Actions section. canRefresh wired to
the canEdit boolean already computed at the page level.
9 Vitest tests in OIDCJWKSStatusPanel.test.tsx:
* LoadingState — query in flight, Loading… visible.
* HappyPath — all six dt/dd pairs visible with operator-readable
values; current KIDs joined comma-separated.
* 403 — authOIDCJWKSStatus errors, panel returns null, no DOM
artifacts left behind.
* RefreshNow — calls refreshOIDCProvider('op-okta'), invalidates
the status query, the panel re-fetches and re-renders with the
new refresh_count (mock returns different snapshots on the
two calls).
* RefreshNow surfaces refresh-failure inline without hiding the
panel (preserves the existing snapshot so the operator can
read pre-failure state).
* NeverRefreshed — last_refresh_at='' renders the cold-cache
sentinel rather than a blank cell.
* CurrentKIDsEmpty — empty list renders the 'not exposed'
sentinel rather than a blank cell.
* LastError — non-empty last_error renders with red treatment.
* CanRefreshFalse — panel + counters render; Refresh-now button
is gone.
Verify gate:
* tsc --noEmit — clean
* vitest OIDCJWKSStatusPanel.test.tsx — 9/9 pass
* vitest OIDCProviderDetailPage.test.tsx — 19/19 pass (panel
mount does not break existing tests because the unmocked
authOIDCJWKSStatus call in those tests rejects, the panel
returns null, and the rest of the page renders normally)
Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md
flips MED-7 from the premature CLOSED claim to a properly-staged
'Backend CLOSED 2026-05-10 + GUI half CLOSED 2026-05-11'
annotation describing the panel + tests.
Refs cowork/auth-bundles-fixes-2026-05-11/10-med-jwks-status-panel.md.
Audit 2026-05-11 Fix 09 closure. MED-5's backend dry-run endpoint
(POST /api/v1/auth/oidc/test, gated auth.oidc.create) shipped on
dev/auth-bundle-2 (commit b4b9879) but the GUI never called it —
authOIDCTestProvider in web/src/api/client.ts was dead code.
Operator gap before this fix: complete the create form blind, save,
then click 'Refresh' to discover whether the issuer URL worked.
Discovery failures left a broken provider row in the DB that had
to be deleted before retrying. The MED-5 backend exists to short-
circuit this — surface the dry-run result before commit.
New shared component web/src/pages/auth/OIDCTestConnectionPanel.tsx
calls authOIDCTestProvider against the live form state (issuer URL
+ client ID + parsed scopes) and renders a four-row status panel
inline:
* ✓/✗ Discovery fetched (with issuer-echo from the well-known doc)
* ✓/✗ JWKS reachable (with the discovered jwks_uri)
* ✓/⚠ Supported algs (warning glyph when the IdP advertises none —
distinct from a discovery failure)
* ✓/· RFC 9207 iss-parameter advertised (informational · glyph
rather than ✗ because the spec is SHOULD, not MUST)
Backend per-leg errors[] flow into an inline bullet list. A
top-level rectangle catches network/fetch failures separately.
The Run button is disabled when the issuer URL is empty or
whitespace-only. The component does NOT persist anything — safe
to run repeatedly before the operator clicks Save.
The panel is mounted in two places:
* OIDCProvidersPage create modal (between the form fields and the
Create button) — short-circuits the blind-save footgun for new
provider configs.
* OIDCProviderDetailPage edit form (between the field grid and
the Save button) — load-bearing for verifying IdP rotations
(Keycloak realm rename, Okta tenant move, certctl side-by-side
hostname change) without committing first.
A testIDSuffix prop (default 'create' / 'edit') gives each mount
point a distinct data-testid namespace so both panels can coexist
on a hypothetical page that uses both without DOM-id collisions.
8 Vitest tests in OIDCTestConnectionPanel.test.tsx:
* RunButton — disabled until issuer URL is non-empty
* RunButton — also disabled when issuer URL is whitespace-only
* RunButton — enabled when issuer URL is non-empty
* HappyPath — all four primary checks render green with detail
rows for authorization_url / token_url / userinfo_endpoint
(asserts both the glyph contract AND the mocked POST body shape)
* FailurePath — discovery=false renders ✗ on discovery + ✗ on
JWKS + ⚠ on empty supported algs + error list with backend
per-leg messages
* IssParamFalse — load-bearing UX claim that the iss-parameter
row renders · (informational), not ✗; body must contain the
word 'informational' so operators understand it's not a failure
* FetchError — top-level error rectangle when the POST throws
* TestIDSuffix — same component mounted twice with different
suffixes renders both without DOM-id collision
Verify gate:
* tsc --noEmit — clean
* vitest OIDCTestConnectionPanel.test.tsx — 8/8 pass
* vitest OIDCProvidersPage.test.tsx + OIDCProviderDetailPage.test.tsx
— 38/38 pass (panel-mount in both pages does not regress
existing tests because they don't trigger the test button)
Operator runbook: the four glyph meanings are documented inline on
the panel's subtitle. Audit doc annotation at
cowork/auth-bundles-audit-2026-05-10.md flips MED-5 from
'BACKEND CLOSED' to 'CLOSED' with the GUI-half annotation.
Refs cowork/auth-bundles-fixes-2026-05-11/09-med-oidc-test-connection-button.md.
Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the
2026-05-10 HIGH-12 closure (2e97cc1) — production-startup observability
for actor-demo-anon residual grants + CI guard banning new synthetic-
admin code paths.
What this changes:
* cmd/server/preflight_demo_residual.go (new) runs after the DB pool +
audit service are constructed and before the HTTPS listener starts.
Under any non-'none' auth type it queries actor_roles for the
synthetic actor-demo-anon and emits a WARN log + a categorized audit
row (auth.demo_residual_grants_detected) listing every grant
present. Migration 000029 unconditionally seeds the ar-demo-anon-admin
row at install time, so EVERY production deploy will see this WARN
on first boot; the intended cutover workflow is cleanup-once at
production handover.
* CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig,
default false) pivots the WARN to fail-closed startup refusal for
operators who want a paranoid posture against re-seeding.
* POST /api/v1/auth/demo-residual/cleanup (new handler at
internal/api/handler/demo_residual.go) is an admin-class
(auth.role.assign) endpoint that removes every actor-demo-anon row
from actor_roles and returns {removed: int64}. Idempotent; refuses
503 under Auth.Type=none (deleting the row would break the demo
path); audit-logs every invocation including no-op zero-removed
calls so the admin's action is always recorded.
* scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry
allowlist of source files that legitimately reference the
actor-demo-anon literal. New runtime code paths that resolve to the
synthetic actor (the same pattern that produced the original CRIT
class) are rejected at PR time. CI workflow auto-picks the script
via the existing scripts/ci-guards/*.sh loop in .github/workflows/
ci.yml; no workflow edit needed.
Regression matrix:
* cmd/server/preflight_demo_residual_test.go — 7 tests covering the
4 main behaviour branches (testcontainers-backed, testing.Short()-
skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd
Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent)
plus 3 pure-Go stdlib unit tests for the row-string formatter +
nil-safety contracts on both helpers.
* internal/api/handler/demo_residual_test.go — 7 stdlib+httptest
cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503),
CleanupError_Surfaces500, NilCleanupFn (defensive 500),
NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to
'unknown' actor in the audit row).
* internal/api/router/openapi_parity_test.go — new
POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing
pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config)
that had drifted out of SpecParityExceptions; the parity test was
red on dev/auth-bundle-2 before my work; this commit returns it to
green with full per-entry justifications + parity-debt notes.
Docs:
* docs/operator/security.md — new 'Demo-to-production cutover (Audit
2026-05-11 A-8)' section explaining the WARN message, the cleanup
curl one-liner, the equivalent SQL, the strict-mode env var, and
the CI guard.
* docs/operator/rbac.md — Last-reviewed bump + pointer to the new
env var + the security.md section.
* cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an
'A-8 follow-on CLOSED 2026-05-11' annotation describing the
deferred Phase 2 leg now landed.
* CHANGELOG.md — Unreleased ### Security entry summarizing the four
legs (detector + cleanup + strict-mode flag + CI guard) and the
acquisition-readiness narrative this closes.
Operator-facing impact: this closes a credibility gap, not an
exploitable vulnerability. The residue requires a regression
elsewhere in the middleware chain to be exploitable. After this
fix, the canonical narrative ('RBAC primitive with no synthetic-
admin fallback') is fully true.
Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual-
cleanup.md.
Whitespace alignment drift surfaced by gofmt -l after merging 7 fix branches.
Pure formatting, no semantic change. Pre-existing master drift in
internal/auth/oidc/{domain/types.go, integration_keycloak_rotate_test.go,
test_discovery.go} left untouched — that's separate tech debt.