certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-08 05:18:52 +00:00

Author	SHA1	Message	Date
shankar0123	41706cc0fb	Merge dev/auditable-codebase-bundle into master: Auditable Codebase Bundle (post-v2.1.0 anti-rot items 1+2+5+6) 7 commits across Phases 0-7: `a31cef3` chore(ci): start bundle — baseline counts `0ab6bc4` feat(ci): item-1 complete-path config-coverage guard `e3a9317` feat(ci): item-2 cross-surface contract parity (internal/ciparity) `3fe5111` feat(ci): item-5 doc rot detector (90d warn / 120d fail) `3ede1b7` feat(ci): item-6 cold-DB compose smoke script `255f61e` ci(workflows): wire bundle guards into ci.yml `9f7b5d8` docs(contributor): document the bundle's guards What this closes: Item 1 (complete-path config-coverage): - scripts/ci-guards/complete-path-config-coverage.sh - internal/config/coverage_test.go (Go-side) - scripts/ci-guards/complete-path-config-coverage-exceptions.yaml Pins every CERTCTL_* env var defined in config.go to have at least one consumer outside internal/config/. Closes the lying-field bug class (canonical: 2026-04-29 SCEP MustStaple Phase 5.6). Item 2 (cross-surface contract parity): - internal/ciparity/ (new stdlib-only package, 4 tests) - scripts/ci-guards/surface-parity-mcp-exemptions.yaml Pins the MCP tool catalogue floor (150) + naming convention + no duplicates. CLI verb sweep is informational only per decision 0.9. Router ↔ OpenAPI parity stays at the existing TestRouter_OpenAPIParity in internal/api/router/. Item 5 (doc rot detector): - scripts/ci-guards/doc-rot-detector.sh - scripts/ci-guards/doc-rot-detector-exceptions.yaml 90-day warn, 120-day fail (vs HEAD commit timestamp for reproducibility). docs/archive/ allowlisted in bulk. No bootstrap sweep needed — all 90 docs were ≤ 7 days old at branch creation. Item 6 (cold-DB compose smoke): - scripts/ci-guards/cold-db-compose-smoke.sh - New .github/workflows/ci.yml job 'cold-db-compose-smoke' - 15-min wall-clock cap; dumps service logs on failure Catches the 2026-05-09 migration 000045 broken-INSERT bug class that the warm-DB integration suite missed (commit `def4be9`). Verification in sandbox: - 32 of 33 shell guards green; cold-DB skipped (no Docker — runs in its dedicated GH Actions job) - gofmt clean across all new Go files - go vet clean for internal/ciparity/ + internal/config/ - go test -short -count=1 PASS: ciparity 0.027s, config 0.664s - YAML lint clean on ci.yml - All 7 commits authored by shankar0123 <skreddy040@gmail.com> Operator follow-up (sandbox couldn't run): - 'make verify' from workstation (golangci-lint full pass) - 'go test -race -count=10' parity - First successful 'cold-db-compose-smoke' job run + add it to master branch-protection required-checks list - Phase 6 negative-test ladder pushed to GH Actions (4 branches: one per guard introducing the regression) Spec: cowork/auditable-codebase-bundle-prompt.md Per-phase results: cowork/auditable-codebase-bundle/RESULTS.md Audit-Closes: post-v2.1.0-anti-rot/item-1 Audit-Closes: post-v2.1.0-anti-rot/item-2 Audit-Closes: post-v2.1.0-anti-rot/item-5 Audit-Closes: post-v2.1.0-anti-rot/item-6	2026-05-12 14:16:39 +00:00
shankar0123	9f7b5d89a5	docs(contributor): document the Auditable Codebase Bundle guards Three doc changes for the bundle's discoverability: 1. New docs/contributor/ci-guards.md (185 lines) Entry-point doc for new contributors. Explains the four categories of guards (code-shape, contract-parity, build/dep, operational), the discipline that keeps them honest (allowlist + expiration), and how to add a new one. Cross-references scripts/ci-guards/README.md for the exhaustive list. 2. scripts/ci-guards/README.md — added a 'Forward-looking guards' subsection naming complete-path-config-coverage, doc-rot-detector, and cold-db-compose-smoke with their item references + a one-sentence description of what each catches. Replaced the stale '22 guards' header with 'Count: re-derive via ls' per the no-version-stamped-numbers convention from CLAUDE.md. 3. docs/README.md — wired ci-guards.md into the Contributor section navigation table. Bumped 'Last reviewed:' to 2026-05-12 on the two docs touched (docs/README.md, docs/contributor/ci-pipeline.md). Verified: doc-rot-detector.sh green at 91 docs scanned, 89 dated, 0 warns, 0 fails. Audit-Closes: post-v2.1.0-anti-rot/item-1 Audit-Closes: post-v2.1.0-anti-rot/item-2 Audit-Closes: post-v2.1.0-anti-rot/item-5 Audit-Closes: post-v2.1.0-anti-rot/item-6	2026-05-12 14:15:13 +00:00
shankar0123	255f61e6c5	ci(workflows): wire Auditable Codebase Bundle guards into ci.yml Three changes to .github/workflows/ci.yml: 1. Add internal/ciparity/... to the Go Test with Coverage package list. The four surface-parity tests run alongside everything else and contribute to the coverage report. 2. Skip cold-db-compose-smoke.sh in the existing generic regression-guards loop (under go-build-and-test). The script needs Docker + a fresh postgres volume; including it here would always fail because that job doesn't bring up compose. The other two new Bundle guards (complete-path-config-coverage.sh, doc-rot-detector.sh) are plain-shell + Python and need no Docker — the existing 'for g in scripts/ci-guards/*.sh' loop auto-picks them up. 3. New top-level job: 'cold-db-compose-smoke' - needs: go-build-and-test (don't waste compute if the basics are red) - 15-min wall-clock cap (image pull + compose-up + probe + teardown) - Dumps compose logs on failure for postgres + certctl-server + certctl-agent + certctl-tls-init so the failure is actionable without a re-run. Validated: - python3 -c 'import yaml; yaml.safe_load(...)' → yaml ok Operator follow-up: - Add 'cold-db-compose-smoke' to the master branch-protection required-checks list once the first successful run lands. Audit-Closes: post-v2.1.0-anti-rot/item-6	2026-05-12 14:12:39 +00:00
shankar0123	3ede1b726f	feat(ci): item-6 cold-DB compose smoke script (CI wiring in Phase 5) scripts/ci-guards/cold-db-compose-smoke.sh — wipes the postgres volume (docker compose down -v), brings the stack up cold, mints a day-0 admin via /api/v1/auth/bootstrap, issues + renews + revokes a test certificate, asserts the three audit rows exist, tears down. Catches the bug class fixed by commit `def4be9` (the 2026-05-09 migration 000045 broken INSERT that the warm-DB integration suite missed). The 2026-04-30 migration regression class generally. Tunables via environment: - COLD_DB_SMOKE_STARTUP_TIMEOUT (default 300s/svc) - COLD_DB_SMOKE_PROBE_TIMEOUT (default 180s) - COLD_DB_SMOKE_SERVER_URL (default https://localhost:8443) - COLD_DB_SMOKE_CACERT (default deploy/test/certs/ca.crt) On failure: dumps `docker compose logs --tail 200` for postgres, certctl-server, certctl-agent, certctl-tls-init so the CI failure is actionable without a re-run. Sandbox VERIFICATION: bash syntax-check (bash -n) passes. Full smoke run NOT executed in the sandbox — no Docker available here. The operator runs it from their workstation as the Phase 6 negative-test ladder (introducing a broken migration; confirming the script fails with the migration error in the dumped logs). CI wiring (.github/workflows/ci.yml::cold-db-compose-smoke job) lands in the next commit (Phase 5). Audit-Closes: post-v2.1.0-anti-rot/item-6	2026-05-12 14:11:32 +00:00
shankar0123	3fe511189f	feat(ci): item-5 doc rot detector (90d warn / 120d fail) scripts/ci-guards/doc-rot-detector.sh — walks every *.md under docs/, parses the '> Last reviewed: YYYY-MM-DD' blockquote convention established by the 2026-05-04 docs overhaul, emits: - ::warning:: GitHub annotation when a doc is >= 90 days old (heads-up; non-blocking). - ::error:: + exit 1 when >= 120 days (build-blocking). Uses HEAD commit timestamp (git log -1 --format=%cs) as 'now' rather than wall clock — keeps the guard reproducible on a release that's been on a shelf. Verified in sandbox: - Clean run: 90 docs scanned, 88 dated (2 in docs/archive/ allowlisted in bulk), 0 missing field, 0 warns, 0 fails. - Negative test (backdated docs/README.md to 2025-12-01, 162d): fires with '::error::Docs older than 120 days (build-blocking)' + three remediation paths listed. Allowlist at scripts/ci-guards/doc-rot-detector-exceptions.yaml: - 'docs/archive/' bulk-allowlisted (intentionally frozen content) - Per-doc entries require name + justification + expiration date; expired entries fail the guard. Bootstrap sweep NOT required — baseline survey at branch creation shows oldest doc is 7 days old (2026-05-05); zero docs over either threshold today. Forward-looking insurance only. Audit-Closes: post-v2.1.0-anti-rot/item-5	2026-05-12 14:10:27 +00:00
shankar0123	e3a9317693	feat(ci): item-2 cross-surface contract parity (stdlib-only package) internal/ciparity/ — new stdlib-only package with four tests: 1. TestSurfaceParity_MCPToolCatalogue (HARD GATE): - Every MCP tool name conforms to certctl_<word>(_<word>)* - No duplicate names across the five tools.go files - Total tools ≥ mcpBaselineFloor (150; current count 155) Catches accidental tool deletions + naming-convention drift. 2. TestSurfaceParity_CLICommandCatalogue (INFORMATIONAL): Walks cmd/cli/main.go's switch-case dispatcher. Logs the 31 distinct verbs. Per frozen decision 0.9, warn-only until the CLI surface stabilizes. 3. TestSurfaceParity_OpenAPI_MCPHeuristicCoverage (INFORMATIONAL): Reports the fraction of OpenAPI ops whose path tokens overlap with MCP tool name tokens. Trend metric; current coverage 92%. 4. TestSurfaceParity_Summary (INFORMATIONAL): One-glance count of router routes / OpenAPI ops / MCP tools / CLI verbs. Easy eyeball for a PR reviewer. Verified in sandbox: - gofmt clean - go vet clean - go test -short -count=1: all four PASS in 0.017s Stdlib-only by design — the tests read source files with os.ReadFile + regexp + go/ast. Keeps the test runnable without pulling in the rest of the codebase's transitive deps; fast self-contained signal. Router ↔ OpenAPI parity (TestRouter_OpenAPIParity) stays in internal/api/router/openapi_parity_test.go where it already lives. This bundle does not duplicate it. Allowlist scaffold at scripts/ci-guards/surface-parity-mcp-exemptions.yaml for the day TestSurfaceParity_OpenAPI_MCP is promoted from informational to hard gate. Audit-Closes: post-v2.1.0-anti-rot/item-2	2026-05-12 14:09:32 +00:00
shankar0123	0ab6bc4a73	feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test) Shell guard verified working in sandbox: - Green on clean repo: 'OK — every CERTCTL_* env var (194) has at least one non-config-package consumer.' - Red on injected orphan: '::error::Orphan env vars — defined in config.go but no consumer found outside internal/config/' with three remediation paths listed. Go test internal/config/coverage_test.go written but NOT verified — sandbox Go 1.25.9 < go.mod's 1.25.10 requirement; toolchain auto-download fails (disk full). Operator must run `make verify` from workstation before merge. Allowlist scaffold at scripts/ci-guards/complete-path-config-coverage-exceptions.yaml. Every entry requires name + justification + expires fields; expired entries fail the guard. Catches the lying-field bug class — env var defined in config.go that no business-logic code reads. The 2026-04-29 SCEP MustStaple Phase 5.6 gap (domain field shipped, service layer never read profile.MustStaple) is the canonical case this guard would have caught at commit time. Audit-Closes: post-v2.1.0-anti-rot/item-1	2026-05-12 14:02:04 +00:00
shankar0123	a31cef34c5	chore(ci): start Auditable Codebase Bundle — record baseline counts Branch: dev/auditable-codebase-bundle off master @ `ee2d6d3`. Baseline counts (workspace: cowork/auditable-codebase-bundle/baseline-2026-05-12.md): - 216 env vars defined in internal/config/config.go - 158 OpenAPI operations - 230 router routes registered - 161 MCP tools across tools*.go - 90 docs files, all carrying "> Last reviewed:" (oldest 2026-05-05) - 30 existing CI guards under scripts/ci-guards/ Spec: cowork/auditable-codebase-bundle-prompt.md Audit-Closes: post-v2.1.0-anti-rot/item-1 Audit-Closes: post-v2.1.0-anti-rot/item-2 Audit-Closes: post-v2.1.0-anti-rot/item-5 Audit-Closes: post-v2.1.0-anti-rot/item-6	2026-05-12 13:56:29 +00:00
shankar0123	ee2d6d3a7c	chore: routine maintenance	2026-05-12 04:57:29 +00:00
shankar0123	7b3a57dfdf	docs(readme): revert Status block to 4-paragraph form (over-split was too choppy)	2026-05-11 22:18:38 +00:00
shankar0123	a103ccfe5c	docs(readme): one sentence per blockquote in Status block — full breathing room	2026-05-11 22:17:44 +00:00
shankar0123	c029875196	docs(readme): Status block rewrite — design-partner CTA, paragraph cadence Earlier versions were either link-soup or so tight they read as boilerplate. This pass aims for CMO-grade copy: - Paragraph 1: lede that combines the early-access label with the design-partner ask — sets the tone in one line. - Paragraph 2: what's production-quality today, with the RBAC + OIDC doc links inline (no bold, no link-soup). Names the v2.1.0 layer on top. - Paragraph 3: the ask — production deployments wanted, framed explicitly as 'we can't manufacture this exposure in CI'. Honest about the federated-identity surface being where the new exposure lives. Mutual-value framing. - Paragraph 4: the actionable bit — file issues liberally, with the why ('how the platform earns the right to drop early-access'). Three inline doc links (RBAC, OIDC runbook index, file-issues). Same factual content, warmer voice, paragraph cadence with breathing room between.	2026-05-11 22:16:32 +00:00
shankar0123	ed833e80f6	docs(readme): space out the Status block — three separate blockquotes	2026-05-11 22:14:50 +00:00
shankar0123	0eb3d0310c	docs(readme): tighten Status block; add RBAC + OIDC runbook links Quieter version of the Status block — single blockquote, three short sentences, three inline links (RBAC, OIDC, file-issues). Drops: - The Local-CA / ACME / agent-deployment / CRUD / audit feature pile (those live in the doc table immediately below) - The 6-IdP enumeration (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace) — operators find that in the OIDC runbook index, now linked inline - The double 'in early-access' phrasing - 'HMAC-signed server-side sessions with __Host- cookies and CSRF rotation; OIDC Back-Channel Logout; Argon2id break-glass admin' — the spec details belong in the auth-threat-model + security docs, not the front-page status Same early-access framing, same issue-link CTA, far more readable.	2026-05-11 22:13:34 +00:00
shankar0123	46769fc7fa	docs(readme): audit pass — fix 7 stale/inaccurate claims Each claim ground-truthed against the live repo, not memory. Numeric drift (claims rotted since they were written): - Screenshot caption 'Catalog with 10 CA types' → 12 (matches internal/connector/issuerfactory/factory.go enumeration). - '33-permission canonical catalogue' → dropped the number. 33 was the base in migration 000029; across all 45 migrations 82 unique perms are seeded (+5 admin / +7 OIDC / +2 break-glass / +33 audit-CRIT-1 / +2 user). 'Fine-grained permission catalogue' is monotonic prose. - 'PostgreSQL 16 backend (35+ tables, idempotent migrations)' → '…backend with idempotent migrations'. Actual table count is 49 across 45 migrations; bare 'idempotent migrations' is drift-proof. - Demo overlay seeds '32 certificates across 10 issuers, 8 agents, 180 days' → '180 days of realistic history across 13 issuers, 8 agents, managed + discovered certs, jobs, deploys, audit, and notification events'. seed_demo.sql actually seeds 14 managed certs + 16 cert versions + 12 discovered, 13 issuers (not 10), 8 agents ✓, 23 INTERVAL '180 days' refs ✓. - 'golangci-lint (11 linters)' → '(govet + staticcheck + contextcheck + unused)'. .golangci.yml lists exactly 4 active linters; 6 others are commented-out 'temporarily disabled' so neither 4 nor 10 explains 11. Broken Helm one-liner (silently no-ops because --set against a nonexistent path doesn't error): - '--set server.apiKey=…' → 'server.auth.apiKey' (deploy/helm/certctl/values.yaml:147 + templates/server- secret.yaml:16). - '--set postgres.password=…' → 'postgresql.password' (top-level key is 'postgresql', not 'postgres'; password sits at postgresql.password per values.yaml:315). Verified accurate (no change): - 12 issuers / 15 targets / 6 notifiers (factory + dir listings). - 7 default roles seeded in migration 000029. - Coverage thresholds (service 70 / handler 75 / crypto 88 / auth packages 85-95) against .github/coverage-thresholds.yml. - All 6 OIDC runbooks present (auth0 / authentik / azure-ad / google-workspace / keycloak / okta). - 4 referenced screenshots all exist on disk. - 8 agents in demo seed, 180 days of history. - RFC 9700 §4.7.1 / 9207 / 8555 / 9773 / 8894 / 9266 / 5280 / 6960 citations match source. - ChromeOS in SCEP description matches source. - install-agent.sh uses uname for OS / arch detection + systemd (Linux) / launchd (macOS). v2.1.0	2026-05-11 17:29:18 +00:00
shankar0123	12705efe36	docs(readme): split Status block into two blockquotes for breathing room	2026-05-11 17:09:20 +00:00
shankar0123	de53847f51	docs(readme): quiet the Status block The previous version crammed 5 bold-emphasized inline links plus inline code into a single paragraph — visually loud and hard to scan. Rewrite as two short paragraphs: - First paragraph: what's production-quality + what's still maturing. No links, em-dash cadence for breathing room. - Second paragraph: v2.1.0 OIDC + sessions + break-glass slice with a single issue-link tail. Drops the bold-link sandwich in favor of plain prose; the doc-nav table directly below handles per-doc routing. Same content, same early-access framing, far less visual noise.	2026-05-11 17:08:21 +00:00
shankar0123	56e2ea1ad7	docs: v2.1.0 release polish — strip internal bundle/phase tags, update status for OIDC ship README: - Rewrite Status block: drop the stale 'federated identity not yet shipped' line; flag v2.1.0 OIDC + sessions + back-channel logout + break-glass as early-access; encourage GitHub issues for IdP rough edges. (A1 framing — keep early-access umbrella, no SAML/WebAuthn/JIT roadmap teaser.) - Add OIDC SSO bullet to 'What it does' covering per-IdP runbooks, group-claim → role mapping, AES-256-GCM client_secret encryption, JWKS auto-refresh, PKCE-S256, RFC 9700 §4.7.1 pre-login binding, RFC 9207 iss check, __Host- cookies, CSRF rotation, idle+absolute expiry, BCL, break-glass admin. - Update Security paragraph: three auth paths (API keys / OIDC / break-glass), HMAC-signed sessions, CSRF rotation, RFC OIDC BCL. - Correct CI coverage thresholds against .github/coverage-thresholds.yml (service 70%, handler 75%, crypto 88%, auth packages 85-95%); 'static analysis' replaces the inflated '11 linters' claim (actual count is 4 active). Docs B3 sweep — strip operator-facing 'Bundle N' / 'Phase N' tags: - docs/operator/auth-threat-model.md — rewrite intro; rename 5 H2 sections (API-key + RBAC defenses / OIDC + sessions + break-glass defenses / OIDC + sessions threat catalogue / Closed federated- identity threats / Future-work threats); clean ~12 H3/prose hits. - docs/operator/rbac.md — strip Bundle 1 framing from intro, scope_id deferral note, MCP tools section, day-0 bootstrap, and 'Where to look next'. - docs/operator/auth-benchmarks.md — drop 'Phase 14' framing from title intro, hardware floor caption, result table caption, methodology, and pre-merge audit section. - docs/operator/security.md — already cleaned earlier this session (RBAC / day-0 / approval-bypass / OIDC federation / sessions / OIDC first-admin / break-glass H3s). - docs/operator/oidc-runbooks/{index,keycloak,authentik,okta, azure-ad}.md — strip Auth Bundle 2 framing + Phase 10/3/4 references; replace with feature-name prose. - docs/operator/legacy-clients-tls-1.2.md — drop Bundle F / M-023 audit-reference framing; keep CWE-326. - docs/operator/database-tls.md — drop Bundle B / M-018 framing from intro + Helm section. - docs/operator/runbooks/disaster-recovery.md — drop 'Production hardening II Phase 10' status callout. - docs/migration/oidc-enable.md — retitle 'Enable OIDC SSO'; strip Bundle 1/2 framing from prereqs, troubleshooting, related docs; update __Host- cookie callout from 'audit MED-14' to v2.1.0-BREAKING. - docs/migration/api-keys-to-rbac.md — strip Bundle 1 framing from intro, migration table, IsAdmin section, and cross-references. - docs/migration/acme-from-cert-manager.md — strip residual 'Phase 5' tags from cert-manager integration test references. - docs/reference/configuration.md — retitle Auth section. - docs/reference/profiles.md — strip Bundle 1 Phase 9 framing from RequiresApproval section + Related list. - docs/reference/auth-standards-implemented.md — rewrite intro (API-key + RBAC + OIDC + sessions + back-channel logout + break-glass); rename 'Bundle 1 (RBAC) standards covered separately' H2; clean per-row Phase references. - docs/README.md — rewrite nav-table entries to drop Bundle 1/2 parentheticals; retitle 'Enable OIDC SSO' migration entry. No code or test changes; pure operator-facing prose polish for the v2.1.0 tag.	2026-05-11 16:54:07 +00:00
shankar0123	1b03d0c594	fix(repo/job): split UNION ALL + FOR UPDATE into two queries (Postgres-correctness) Phase-9 docker compose smoke surfaced a latent production-breaking bug introduced by commit `89b910a` (H-6 atomic pending-job claim). The ClaimPendingByAgentID query in internal/repository/postgres/job.go combined UNION ALL with FOR UPDATE SKIP LOCKED in a single statement. Postgres rejects this with: ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT Every agent work-poll returns HTTP 500 in any real deployment where an agent is actually polling. From the compose log: request_id=6da47015-... GET /api/v1/agents/agent-demo-1/work status=500 duration_ms=2 The schema-per-test unit harness in internal/repository/postgres/ *_test.go never inserted jobs and polled, so the SQL execution path was never exercised. The bug has been latent in master since `89b910a` landed. Fix: split the UNION ALL into two separate FOR UPDATE SKIP LOCKED queries within the existing transaction. The H-6 atomicity invariant (concurrent pollers never see the same Pending row) is preserved because: 1. The two queries run inside the same transaction (tx). 2. Each query independently locks its result rows with FOR UPDATE SKIP LOCKED. 3. The subsequent UPDATE that flips Pending -> Running runs in the same transaction, so the rows stay invisible to concurrent callers from initial SELECT through final COMMIT. 4. The transaction is the unit of consistency, not the single SQL statement. Two queries: - Branch 1 (direct): jobs.agent_id = + status='Pending' + type='Deployment'. ORDER BY created_at ASC, FOR UPDATE SKIP LOCKED. - Branch 2 (fallback): jobs.agent_id IS NULL + INNER JOIN deployment_targets dt ON jobs.target_id = dt.id WHERE dt.agent_id = . ORDER BY j.created_at ASC, FOR UPDATE OF j SKIP LOCKED (FOR UPDATE OF needed because the join brings in dt). Branch 3 (AwaitingCSR) is unchanged — already a single SELECT, not affected by the UNION restriction. Inline comment explains the fix's load-bearing-ness so a future refactor doesn't merge them back into one UNION query. Verify (sandbox): go vet clean; go test -short -count=1 PASS on internal/repository/postgres/. Workstation re-runs 'docker compose up' to confirm the agent's GET /work returns 200 with the next pending-deployment claim. Note: this is NOT a regression introduced by Auth Bundle 2 or the 2026-05-11 audit fixes; it's a pre-existing latent defect from H-6. Including in v2.1.0 because shipping with a broken agent work-poll would block the demo path on day one of release.	2026-05-11 16:11:33 +00:00
shankar0123	def4be9b38	fix(migrations): two cold-DB regressions surfaced by Phase-9 docker compose smoke The v2.1.0 release-gate Phase-9 docker compose smoke run against a fresh Postgres surfaced two real defects in the migration files that testcontainers schema-per-test never exercised. Both reproduce by running 'docker compose down -v && docker compose up --build' against the current master tree. Bug A — migration 000045_users_deactivated_at.up.sql is malformed. The 000029 schema defines: permissions (id TEXT PRIMARY KEY, name TEXT NOT NULL UNIQUE, namespace TEXT NOT NULL) role_permissions (..., permission_id TEXT NOT NULL REFERENCES ..., ...) But 000045 was written as: INSERT INTO permissions (name) VALUES ... -- missing id + namespace INSERT INTO role_permissions (role_id, permission, ...) VALUES ... ^^ wrong column name On a cold-DB run this fails immediately with: pq: null value in column "id" of relation "permissions" violates not-null constraint Fix: provide id + namespace columns, use permission_id (the actual column name), ON CONFLICT (id) DO NOTHING. The new permission ids follow the existing 'p-auth-' prefix convention (p-auth-user-read + p-auth-user-deactivate) used by 000029. Bug B — migration 000029_rbac.up.sql is not idempotent post-000043. 000029 originally created actor_roles with: UNIQUE (actor_id, actor_type, role_id, tenant_id) Audit 2026-05-10 HIGH-10 closure / migration 000043 drops that constraint and re-creates it WITH scope columns: UNIQUE (actor_id, actor_type, role_id, scope_type, scope_id, tenant_id) The migration runner (internal/repository/postgres/db.go::RunMigrations) is naive — no tracker table — and re-runs every .up.sql file on every server boot. On the second-and-later boots, 000029's seed INSERT for actor-demo-anon-admin still references the pre-000043 constraint name in its ON CONFLICT clause: ON CONFLICT (actor_id, actor_type, role_id, tenant_id) DO NOTHING Postgres errors out with: pq: there is no unique or exclusion constraint matching the ON CONFLICT specification Fix: pin the conflict target to the row's primary key 'id' column (always present, never altered). The seed row's deterministic id 'ar-demo-anon-admin' makes ON CONFLICT (id) work under both pre- and post-000043 schemas. Why testcontainers schema-per-test missed these: Each test in internal/repository/postgres/*_test.go spins up a fresh schema and applies every .up.sql in order ONCE. The full '000029 -> 000043 -> retry 000029' cascade never happens because migrations don't re-run within a test. Phase-9 docker compose smoke is the only test path that exercises the server-restart- on-error retry, which is exactly the missing coverage. Verify (sandbox): go test ./internal/repository/postgres/ PASS. Workstation re-runs 'docker compose down -v && docker compose up' to confirm both bugs are closed.	2026-05-11 16:06:20 +00:00
shankar0123	aa1efd0676	fix(oidc/testfixtures): set legacy KEYCLOAK_ADMIN* env vars for start-dev master-admin bootstrap Phase-10 live-IdP smoke (post-iss-param fix landing in `360e744`) advanced 4 of 6 integration tests to green. The remaining 2 — the realm-key rotation tests — failed with: admin-cli token: HTTP 401 at the master-realm token endpoint. Root cause: Keycloak 26.x has TWO admin-bootstrap env-var pairs and the right pair depends on the launch command: - 'start' (production): KC_BOOTSTRAP_ADMIN_USERNAME + KC_BOOTSTRAP_ADMIN_PASSWORD - 'start-dev': KEYCLOAK_ADMIN + KEYCLOAK_ADMIN_PASSWORD The fixture sets KC_BOOTSTRAP_ADMIN_USERNAME + KC_BOOTSTRAP_ADMIN_PASSWORD but runs 'start-dev'. The bootstrap pair is silently ignored in dev-mode, leaving the master realm with no admin user → admin-cli token endpoint returns 401 → RotateRealmKeys can't authenticate to the Admin API. The 4 auth-code flow tests passed because they authenticate the engineer / viewer test users INSIDE the certctl realm (created by the realm import), which doesn't need a master admin. Fix: set BOTH pairs as belt-and-braces. The legacy KEYCLOAK_ADMIN pair covers start-dev today; the KC_BOOTSTRAP_ADMIN_* pair keeps a future flip to 'start' working. Inline comment in the fixture explains the why so a future reader doesn't drop one back. Verify (sandbox): go vet -tags=integration clean; gofmt clean. Workstation re-runs 'make keycloak-integration-test' to confirm the 2 rotation tests now reach + execute the Admin API successfully.	2026-05-11 15:49:25 +00:00
shankar0123	360e7449ad	fix(oidc/integration): pass fx.IssuerURL as callbackIss arg in 7 HandleCallback call sites Phase-10 live-IdP smoke (post-Enabled-true fix landing in `1b52998`) surfaced the next layer: 5 of 6 testcontainers-Keycloak integration tests failed with 'oidc: provider advertises iss-parameter support but callback omitted it'. Root cause: Keycloak's discovery doc advertises authorization_response_iss_parameter_supported=true. The Audit 2026-05-10 MED-17 closure (RFC 9207) gates the callback path: when the IdP advertises iss-param support, HandleCallback requires a non-empty callbackIss arg that matches the provider's IssuerURL, else ErrIssParamMissing. The 7 HandleCallback call sites in the integration tests were passing '' for the callbackIss arg — the synthetic test code never simulated the real browser's '?iss=<issuer>' query param. Fix: replace '' with fx.IssuerURL at all 7 sites: - integration_keycloak_test.go: 5 sites (TestKeycloakIntegration_AuthCodeFlow_HappyPath, TestKeycloakIntegration_LogoutRevokesSession, TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey pre+post HandleCallback, TestKeycloakIntegration_UnmappedGroupsFailsClosed) - integration_keycloak_rotate_test.go: 2 sites (TestKeycloakIntegration_MED6_AutoRefreshOnKidMiss pre+post) Inline note on the first site explains the rationale so future test-writers don't drop back to ''. Verify (sandbox): go vet -tags=integration ./internal/auth/oidc/... clean; gofmt clean; grep for remaining empty-iss callsites returns 0 matches. Workstation re-runs 'make keycloak-integration-test' to confirm the 5 affected tests advance past the iss-param check against a real Keycloak 26.x.	2026-05-11 15:44:39 +00:00
shankar0123	1b529985be	fix(oidc/testfixtures): set Enabled=true on Keycloak integration-test provider Phase-10 live-IdP smoke re-run (after the alg-downgrade relax landed in `fefeccf`) surfaced the next layer: 5 of 6 testcontainers-Keycloak integration tests failed with 'oidc: provider is disabled'. Root cause: the OIDCProvider struct literal in internal/auth/oidc/testfixtures/keycloak.go omits the Enabled field. Enabled was added by Audit 2026-05-11 MED-9 (Bundle 2 Fix 13 Phase B); pre-fix the field didn't exist and HandleAuthRequest always proceeded. Post-fix the default zero-value false gates every integration test behind ErrProviderDisabled at service.go L478. Fix: add Enabled: true to the struct literal + inline comment explaining why the field is required for integration tests. The check is the right behavior for production (operator-driven disable kill-switch); just needed to be reflected in the testfixture. Verify (sandbox): go vet -tags=integration ./internal/auth/oidc/... clean. Workstation re-runs 'make keycloak-integration-test' to confirm the 5 affected tests now pass against a real Keycloak 26.x.	2026-05-11 15:39:07 +00:00
shankar0123	fefeccfa59	harden(oidc): relax alg-downgrade IdP-bind check to intersection-empty (Keycloak compat) Phase-10 live-IdP smoke (Keycloak 26.x via testcontainers-go) revealed the IdP-bind alg-downgrade check was too strict for real-world IdPs. 6 of the integration tests in internal/auth/oidc/integration_keycloak_test.go were failing with: oidc: IdP advertises weak signing algorithms (HS/none); refusing to use as defense against downgrade attacks: HS256 Keycloak 26.x (and several other real-world IdPs — Auth0 when HS-mode is enabled, some Authentik configs) advertise EVERY alg they're capable of in the discovery doc's id_token_signing_alg_values_supported field, even when the realm only signs with RS256 in practice. Pre-fix the IdP-bind check refused on ANY HS* or 'none' advertisement → no real Keycloak deploy could ever bind a provider row, hence the integration-test failures. The strict-deny check was defense-in-depth on top of the load-bearing per-token alg-pin at sig-verify time (isDisallowedAlg, service.go L1177): that check rejects every ID token whose JWS header carries an alg outside DefaultAllowedAlgs, regardless of what the discovery doc advertises. A forged HS256 token signed with the IdP's RS256 pubkey as HMAC secret is rejected at sig-verify time → the actual algorithm-confusion attack is closed by the per-token pin, NOT by the discovery-doc check. Fix: relax the IdP-bind check to refuse only when the intersection of advertised vs DefaultAllowedAlgs is EMPTY (the pathological all-weak-alg IdP case). Keycloak (RS256 + HS256 advertised) now binds successfully; an HS-only IdP still fails closed. Changes: - internal/auth/oidc/service.go: rewrite the alg-check loop at L1067 in getOrLoad / RefreshKeys to compute the intersection set; refuse only when no acceptable alg is advertised. ErrIdPDowngradeAdvertised docstring updated to reflect new contract. DefaultAllowedAlgs docstring + the package-level design-comment block at L40-72 updated with v2.1.0-relaxed semantics callouts. - internal/auth/oidc/test_discovery.go: TestDiscovery dry-run validator rewritten to surface HS/none alongside RS as an informational note ('note: IdP advertises weak algorithms %v alongside acceptable ones') rather than a hard-fail error. HS-only / none-only still hard-fails. - internal/auth/oidc/service_test.go: TestService_IdPDowngradeDefense_* tests updated. Renamed: - RejectsHSAdvertised → RS256PlusHS256_BindsSuccessfully (positive) - RejectsNoneAdvertised → RejectsHSOnlyAdvertised (intersection-empty) - RefreshKeys_CatchesPostLoadDowngrade rotated to HS-only post-load - internal/auth/oidc/coverage_fill_test.go: TestTestDiscovery_AlgDowngradeDetected split into _HS256AlongsideRS256_BindsWithNote (positive, asserts note but no hard-fail) + _HSOnly_StillTrips_HardFail (intersection-empty). - docs/operator/auth-threat-model.md: OIDC token-validation alg-allow-list section rewritten to call out the load-bearing-defense hierarchy (per-token pin first, IdP-bind check defense-in-depth) and document the v2.1.0 relaxation rationale. - CHANGELOG.md: ### Security entry under Unreleased. Verify: go test ./internal/auth/oidc/ -short PASS; gofmt clean; go vet clean. The Keycloak integration tests should now pass when the operator re-runs 'make keycloak-integration-test'.	2026-05-11 15:34:59 +00:00
shankar0123	1cfa9f2e2a	Merge dev/auth-bundle-2 → master (v2.1.0): Auth Bundle 2 + 2026-05-11 audit fixes	2026-05-11 15:24:24 +00:00
shankar0123	70ebef5d3a	test(client): mock headers.get() so 401 tests survive HIGH-8 WWW-Authenticate read Audit 2026-05-10 HIGH-8 closure landed a parseWWWAuthenticateCause() call in api/client.ts (line 144) that reads res.headers.get(...) on the 401 path. The two test files in web/src/api/ both provide a Response mock with no headers property, so every 401 test threw 'Cannot read properties of undefined (reading get)' instead of the expected 'Authentication required'. 13 tests fail without this fix: 12 in client.error.test.ts (one per 401-mapped endpoint helper) + 1 in client.test.ts (the auth-required event-dispatch test). Fix: add headers: { get: () => null } to both mockErrorResponse helpers. The null return short-circuits parseWWWAuthenticateCause to the default 'Authentication required' message, so every existing 401 assertion keeps passing.	2026-05-11 14:37:36 +00:00
shankar0123	eee124efb6	chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5 Four scripts/ci-guards/.sh trips on dev/auth-bundle-2 vs master: 1. G-3-env-docs-drift: 10 CERTCTL_ env vars added by Auth Bundle 2 + audit-2026-05-10/11 fix bundle were not in docs/. Added a new 'Auth (Bundle 1 + Bundle 2)' section to docs/reference/configuration.md covering CERTCTL_SESSION_BIND_USER_AGENT, CERTCTL_SESSION_GC_INTERVAL, CERTCTL_OIDC_BCL_MAX_AGE_SECONDS, CERTCTL_OIDC_PRELOGIN_REQUIRE_UA/IP, CERTCTL_DEMO_MODE_ACK, CERTCTL_TRUSTED_PROXIES + _COUNT (synthesised), CERTCTL_BOOTSTRAP_* set, CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD. Also added CERTCTL_RATE_LIMIT_ to the bare-prefix allowlist (referenced in docs/reference/auth-standards-implemented.md prose). 2. bundle-8-M-009-bare-usemutation: BreakglassPage shipped 3 bare useMutation() calls instead of useTrackedMutation. Migrated all three to useTrackedMutation with invalidates: [['breakglass']]. 3. multi-tenant-query-coverage: Defense-in-depth tenant_id additions in the fix bundle dropped the missing-tenant-id query count from 32 to 31. Ratcheted baseline 32 -> 31 (forward-only invariant). 4. openapi-handler-parity: 28 new REST endpoints from Bundle 2 + the fix bundle missing from api/openapi.yaml. Added them to api/openapi-handler-exceptions.yaml with per-route 'why:' justifications. OpenAPI schema generation deferred to pre-v2.2.0 alongside the GUI E2E coverage push; threat model + handler contracts already live in docs/operator/{rbac,auth-threat-model, oidc-runbooks}.md. After this commit every script in scripts/ci-guards/*.sh exits 0.	2026-05-11 14:19:35 +00:00
shankar0123	80cbd2db59	test(coverage): backfill 5 packages to clear v2.1.0 release-gate Phase 3 floors Phase 3 of /Users/shankar/Desktop/cowork/v2.1.0-release-gate.md surfaced four packages below their coverage floors. All four are regressions from new code shipped in the audit-2026-05-10/11 fix bundles that didn't get per-function tests: internal/auth/breakglass 87.5% -> 93.3% (floor: 90%) + List (was 0%) — 3 tests (disabled, empty+populated, repo err) + RemoveCredential, Unlock disabled-branch tests internal/auth/oidc 89.4% -> 95.4% (floor: 90%) + JWKSStatus (was 0%) — 2 tests (unknown provider, after AuthRequest) + TestDiscovery (was 0%) — 5 tests (discovery failure, happy path, HS256 alg-downgrade detected, missing jwks_uri, JWKS 500 fetch) internal/auth/session 89.9% -> 94.4% (floor: 90%) + SetTrustedProxies (was 0%) — round-trip + clear + ComputeCookieHMAC (was 0%) — determinism + key/inputs differ + DecryptKeyMaterial (was 0%) — round-trip + wrong-passphrase internal/api/handler 73.2% -> 75.5% (floor: 75%) + 6 auth_breakglass handler funcs (were all 0%) — 14 tests (disabled/404, invalid JSON, empty fields, service err, happy path with cookies, admin endpoints, ListCredentials no password_hash on the wire) + WithPermissionChecker setter test (was 0%, Bundle 2 MED-2) + NewAdminCRLCacheServiceImpl + CacheRows (were 0%) — 3 tests + itoaForRetryAfter + challengeURLBuilder ACME helpers (were 0%) — 4 tests All five coverage gates green: internal/service 72.7% (floor: 70%) internal/api/handler 75.5% (floor: 75%) internal/api/middleware 67.9% (floor: 30%) internal/auth 93.3% (floor: 85%) internal/service/auth 91.8% (floor: 85%) internal/auth/oidc 95.4% (floor: 90%) internal/auth/oidc/groupclaim 100.0% (floor: 95%) internal/auth/oidc/domain 97.6% (floor: 90%) internal/auth/session 94.4% (floor: 90%) internal/auth/session/domain 98.3% (floor: 90%) internal/auth/breakglass 93.3% (floor: 90%) internal/auth/breakglass/domain 100.0% (floor: 90%) internal/auth/user/domain 96.2% (floor: 90%) (and 6 more — all green) Per CLAUDE.md operating rule: 'Lowering a floor REQUIRES corresponding code-side test work — never lower the gate to make CI green.' The floors stay at their committed values; the new tests close the gap.	2026-05-11 14:12:11 +00:00
shankar0123	8aeeec93c0	chore(lint): close 5 golangci-lint v2 findings surfaced by v2.1.0 release-gate Phase 1.3 Five golangci-lint v2 findings surfaced when running the v2.1.0 release gate (auth-bundle-2 → master pre-flight). Each is mechanical: 1. govet/printf-style misuse — internal/auth/oidc/service_test.go used integer literal 501 in http.Error; switched to http.StatusNotImplemented. 2. staticcheck SA1019 — internal/auth/breakglass/reflect_helper_test.go referenced reflect.Ptr; the canonical name since Go 1.18 is reflect.Pointer. 3. staticcheck ST1020 — internal/repository/postgres/auth.go ActorRoleRepository.Revoke had a doc comment that did not begin with the method name. Prepended 'Revoke drops actor_roles rows.' to the comment so it now starts with the method name. 4. staticcheck ST1022 — internal/api/handler/auth_session_oidc.go DefaultBCLVerifierMaxAge docstring was attached to the DefaultBCLVerifier type docstring. Moved the const docstring directly above the const declaration, separated by a blank line. 5. unused — internal/auth/session/bench_test.go declared benchSessionMinSamples and never referenced it; the bench loop relies on Go's default b.N scaling. Replaced the const block with a comment describing the rationale. Lint clean (golangci-lint v2.12.2 with the .golangci.yml config) on the five edited packages.	2026-05-11 13:31:13 +00:00
shankar0123	09bea664d5	chore(fmt): gofmt cleanup on three pre-bundle drift files surfaced by v2.1.0 release-gate Phase 1 Phase 1 (make verify) of cowork/v2.1.0-release-gate.md surfaced three files with pre-existing gofmt drift that pre-dated the 2026-05-11 fix bundle work: internal/auth/oidc/domain/types.go internal/auth/oidc/integration_keycloak_rotate_test.go internal/auth/oidc/test_discovery.go The 2026-05-11 Fix 08 fmt-cleanup commit (`b8fac59`) fixed four files that the merge introduced; these three were noted as pre-existing master drift and intentionally left untouched at the time. The v2.1.0 release-gate spec's Phase 1 requires zero gofmt output from 'go fmt ./...' (Makefile::verify form), so the drift must close before tagging. Pure whitespace alignment, no semantic change.	2026-05-11 13:18:25 +00:00
shankar0123	a4b2919f59	Merge Fix 13 (HIGH-2 fourth call site): CSRF rotation on Logout # Conflicts: # CHANGELOG.md	2026-05-11 13:01:56 +00:00
shankar0123	9f617add29	Merge Fix 12: Vitest coverage for the 2026-05-10/11 GUI batch	2026-05-11 13:00:25 +00:00
shankar0123	ecba4112b7	Merge Fix 11 (MED-11 discoverability): UsersPage sidebar nav entry # Conflicts: # CHANGELOG.md	2026-05-11 13:00:19 +00:00
shankar0123	54f535a007	Merge Fix 10 (MED-7 GUI half): JWKS health panel + Refresh-now button # Conflicts: # CHANGELOG.md # web/src/pages/auth/OIDCProviderDetailPage.tsx	2026-05-11 12:59:41 +00:00
shankar0123	f1219f8cd3	Merge Fix 09 (MED-5 GUI half): Test Connection panel on OIDC create + edit forms # Conflicts: # CHANGELOG.md	2026-05-11 12:58:48 +00:00
shankar0123	d5522debfb	Merge Fix 08 (HIGH A-8): demo-mode residual-grants detector + cleanup endpoint + CI guard	2026-05-11 12:57:35 +00:00
shankar0123	9a8130de32	harden(auth/sessions): CSRF rotation on logout closes HIGH-2 fourth call site Audit 2026-05-11 Fix 13 closure. The HIGH-2 closure on dev/auth-bundle-2 documented four RotateCSRFTokenForActor call sites — login completion (fresh by construction), Assign/Revoke RoleToKey (wired at internal/api/handler/auth.go:498 + 546), Logout, and an explicit operator endpoint. The 2026-05-11 adversarial review observed only 3 of the 4: Logout did NOT rotate the actor's sibling sessions post-revoke. Threat closed: a token captured pre-logout (browser DevTools, malicious extension, session-storage leak) could be replayed against the user's other-device/other-browser sessions until those sessions hit their own idle/absolute expiry. Rotation on logout defeats this — the captured token is dead the moment the user clicks 'Sign out' anywhere. What this changes: * internal/api/handler/auth_session_oidc.go::SessionMinter interface gains RotateCSRFTokenForActor(ctx, actorID, actorType string) int. Nil-safe semantics by convention — the production wiring is session.Service which already implements the method; rotation NEVER errors (returns int count, swallows per-row failures via the underlying Service.RotateCSRFToken) so it can't block the surrounding Revoke that triggered it. internal/api/handler/auth_session_oidc.go::Logout calls RotateCSRFTokenForActor after Revoke(sess.ID) succeeds. The auth.session_revoked audit row gains a csrf_rotated detail key carrying the count so SOC/SIEM can correlate logout events with CSRF churn on sibling sessions. * The no-cookie + invalid-cookie 204 short-circuit paths skip rotation. No session row exists to rotate against; the caller is already unauthenticated. Rotation on those paths would do nothing useful and pollute the audit log. Test coverage in internal/api/handler/auth_session_oidc_test.go: * TestLogout_RotatesCSRFForActor — happy path. Mocks rotateCSRFReturnCount=2; asserts Revoke fires before rotation, rotation fires exactly once with caller's (actor_id, actor_type), audit details carry csrf_rotated=2. * TestLogout_NoCookie_SkipsCSRFRotation — pins the 204 short-circuit branch when there's no cookie. Rotation count stays at 0. * TestLogout_InvalidCookie_SkipsCSRFRotation — pins the 204 short-circuit branch when Validate rejects the cookie. Same rationale: no session row, no rotation. The stubSession test fake gains RotateCSRFTokenForActor with call-recording fields; the phase5StubAudit gains a details slice append-aligned 1:1 with events so the happy-path test can index into the latest entry and assert the count. Spec Phase 3 (explicit operator endpoint) — intentionally NOT shipped. The three automatic triggers (login + role- mutation + logout) cover the HIGH-2 threat model; operators who want a nuclear option can use the existing RevokeAllForActor flow which forces re-login → fresh session → fresh CSRF. Adding a dedicated POST /api/v1/auth/sessions/ rotate-csrf admin endpoint would be defense-in-depth without new attack-surface coverage. Documented in the audit-doc annotation. Verify gate: * gofmt -l — clean * go vet ./internal/api/handler/... — clean * go build ./cmd/server/... ./internal/... — clean (production session.Service satisfies the extended interface out of the box) go test -short -count=1 ./internal/api/handler/... ./internal/auth/session/... — all green; 3 new Logout cases + the 2 pre-existing Logout cases all pass. Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md flips the HIGH-2 row from 'CLOSED 2026-05-10 (3/4 call sites wired)' to 'A-B-3 verified 2026-05-11: HIGH-2 fully closed across all four documented call sites.' Refs cowork/auth-bundles-fixes-2026-05-11/13-verify-logout-csrf-rotation.md.	2026-05-11 12:24:41 +00:00
shankar0123	dfdba5b260	test(gui): Vitest coverage for the 2026-05-10/11 GUI batch (Fix 12) Audit 2026-05-11 Fix 12 closure. The original GUI-batch commit `191384c` claimed 'npx tsc --noEmit PASS' but shipped no Vitest cases for the new surfaces, leaving the regression-prevention layer wide open. This closure backfills 35 cases across five files; the next refactor of KeysPage's assign modal that drops scope_type, or the AuthProvider demo-banner predicate that gets flipped to !authRequired, surfaces in CI instead of silently shipping. What's added: * web/src/pages/auth/UsersPage.test.tsx (NEW, 8 cases) — pins the MED-11 closure's UsersPage flow: active rows render the Active status pill, deactivated rows render dimmed with the Deactivated <timestamp> status, Deactivate button fires the API call after confirm() returns true and is a no-op on false, Reactivate button works inversely, provider filter narrows the underlying authListUsers call (undefined vs provider-id), empty list renders the placeholder, loading renders 'Loading users…'. * web/src/pages/auth/AuthSettingsPage.test.tsx (EXTENDED, +4 cases) — the pre-existing 2 cases only exercised identity + bootstrap status; the runtime-config panel (MED-12 closure) had no test. New cases cover: per-key row rendering, alphabetical sort (stable for log-scraping correlation), empty-value '(empty)' placeholder, 403 rejected query silently hides the panel (non-admins shouldn't see the shell). * web/src/pages/auth/KeysPage.test.tsx (EXTENDED, +8 cases) — the HIGH-10 GUI half added scope picker + scope_id input + expires_at datetime-local to the assign modal but the pre-existing test only asserted (actor, role). New cases pin the third opts arg shape: global hides scope_id input, profile/issuer scope reveal scope_id + mark required, trimmed scope_id round-trips into the body, global omits scope_id (undefined NOT empty string), empty expires_at omits the field, filled expires_at gets :00Z appended for RFC3339 promotion, whitespace-only scope_id fires the 'scope_id is required' typed error WITHOUT calling the API, actor-demo-anon row hides both assign and revoke affordances. * web/src/pages/auth/RoleDetailPage.test.tsx (NEW, 9 cases) — no test file pre-Fix 12. Pins the MED-8 scope picker for AddPermissionForm: global hides scope_id, profile reveals + gates the Add button until scope_id is filled, submit POSTs {permission, scope_type: profile, scope_id} with whitespace trimming, global submit omits scope keys entirely, issuer scope path, Add button stays disabled without a permission selection. Plus the LOW-11 default-role delete-button hide: r-admin renders the role-delete-disabled-tooltip + NO role-delete-button, r-auditor same, custom role renders the delete button. The DEFAULT_ROLE_IDS set tracking the migration-seeded role ids is the load-bearing client-side decision so a future drift between migrations and the GUI set surfaces here too. * web/src/components/AuthProvider.test.tsx (NEW, 5 cases) — the LOW-1 demo banner had no test for its visibility predicate. Pins all four authType branches (none → visible, api-key → hidden, oidc → hidden, loading → hidden to avoid flash) plus the rejected-getAuthInfo branch: the catch treats failure as an old-server-fallback to demo mode (no authType mutation, loading flips false), so the banner SHOWS — that's the actual behavior, and pinning it prevents a future change from silently hiding the banner when the /auth/info endpoint is unreachable. Spec deviations: Phase 6 (Layout.test.tsx users-nav) and Phase 7 (per-Fix tests for Fixes 03/05/07/09/10) live on those fixes' own branches — already authored there. Including them here would have produced merge conflicts. Verify gate: * tsc --noEmit — clean * vitest run touched files — 40/40 pass (8 + 6 + 12 + 9 + 5, including the 2 + 4 + 4 pre-existing cases in the extended AuthSettingsPage + KeysPage files) * full suite (162 tests across 15 files) green — no regression from the panel-mount-in-existing-page setup or the new mocked-module entries. Refs cowork/auth-bundles-fixes-2026-05-11/12-test-vitest-gui-coverage.md.	2026-05-11 12:18:08 +00:00
shankar0123	90c7b5813f	feat(gui/nav): UsersPage sidebar nav entry under Auth section (MED-11) Audit 2026-05-11 Fix 11 closure. The MED-11 closure shipped web/src/pages/auth/UsersPage.tsx and wired the /auth/users route in web/src/main.tsx, but the sidebar nav never gained a corresponding entry. Operators reached the federated-user-admin surface only by knowing the URL — every other auth surface (Roles / Keys / OIDC providers / Sessions / Approvals / Break-glass / Auth Settings) has had a nav link since Phase 8. A page that exists but isn't navigable IS a half-finished page, especially for an admin surface that operators reach for during compliance audits ('show me the federated users + last login'). 30 minutes closes the inconsistency. What this changes: * web/src/components/Layout.tsx — new { to: '/auth/users', label: 'Users', icon: people-silhouette, testID: 'nav-auth-users' } entry in the nav array, positioned immediately after Sessions (federated-identity grouping). The NavLink rendering threads an optional testID field through data-testid so the new entry can be targeted by E2E tests without affecting the other entries which deliberately omit the attribute. * Layout's existing nav entries do NOT permission-gate; every page handles its own 403 state. UsersPage already returns an ErrorState directing the user to auth.user.read for callers without the perm. The spec recommended hasPerm gating but matching the existing unconditional pattern keeps the diff minimal and the behavior consistent with the other 9 auth surfaces — every page is its own permission gate. Tests added in web/src/components/Layout.test.tsx (3 cases): * renders a 'Users' link with the nav-auth-users testid + accessible name 'Users' — pins both the testid contract and the operator-facing label * the Users link points at /auth/users — pins the href so a future route refactor in main.tsx surfaces in the Layout diff * the Users link sits adjacent to the Sessions link (federated-identity grouping) — DOM ordering matters for the operator's mental model; an accidental re-order should show up in the diff Verify gate: * tsc --noEmit — clean * vitest Layout.test.tsx — 7/7 pass (4 pre-existing Setup-guide tests + 3 new Users-nav tests) Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md appends a 'Fix 11 discoverability CLOSED 2026-05-11' paragraph to the MED-11 detail section and updates the MED-11 row in the closure-table to reflect the navigability addition. Refs cowork/auth-bundles-fixes-2026-05-11/11-med-users-sidebar-nav.md.	2026-05-11 12:05:08 +00:00
shankar0123	e92af14a22	feat(gui/oidc): JWKS health panel + Refresh-now button on OIDCProviderDetailPage (MED-7 GUI half) Audit 2026-05-11 Fix 10 closure. MED-7's backend endpoint GET /api/v1/auth/oidc/providers/{id}/jwks-status (commit `172b30b`) shipped the per-provider verifier counters on dev/auth-bundle-2 but the GUI never called it — authOIDCJWKSStatus in the API client was dead code. The audit doc had prematurely flipped the MED-7 row to CLOSED; this closure makes the claim true. Operator gap before this fix: operators investigating 'why is login failing for this IdP?' could not see last_refresh_at, rejected_jws_count, or last_error from the GUI. They had to drop to curl. New shared component web/src/pages/auth/OIDCJWKSStatusPanel.tsx queries the endpoint via TanStack Query and renders six dt/dd rows with operator-readable sentinels for each empty case: * Last refresh — RFC 3339 timestamp; '(never — cold cache)' sentinel when the IdP has never been hit. * Refresh count — cumulative since process boot. * Rejected JWS count — number of ID tokens that failed signature verification. Step-changes correlate to IdP key rotations. * Last error — most recent JWKS-refresh failure (sanitized — no token content). Red treatment when non-empty; '(none)' sentinel for healthy state. * RFC 9207 iss param — 'supported by IdP' / 'not advertised'. Informational only; the operator-side verifier still demands the param by default. * Current KIDs — cache contents; '(not exposed — query jwks_uri directly)' sentinel when the backend declines to expose the list (the backend may withhold them for opacity). Refresh-now button: * Calls POST /api/v1/auth/oidc/providers/{id}/refresh (RefreshKeys path), then invalidates the panel's query so the freshly-updated counters render without a page reload. * Refresh failures surface as an inline red rectangle and do NOT hide the existing snapshot — partial visibility is better than no visibility. * Hidden when the optional canRefresh prop is false. The OIDCProviderDetailPage mount wires canRefresh to useAuthMe().hasPerm('auth.oidc.edit') so viewer-class callers see the read-only panel. Permission gating: * The backend endpoint is gated auth.oidc.list. Callers without the permission get HTTP 403; the panel's TanStack query is configured with retry: 0 so a 403 doesn't drown the page in retries, and the panel returns null when the query errors — hiding silently for callers who can't see the data. * The Refresh-now button is hidden for callers without auth.oidc.edit. Read-only callers still see the panel + counters. Mount: OIDCProviderDetailPage.tsx between the read-only field display section and the Actions section. canRefresh wired to the canEdit boolean already computed at the page level. 9 Vitest tests in OIDCJWKSStatusPanel.test.tsx: * LoadingState — query in flight, Loading… visible. * HappyPath — all six dt/dd pairs visible with operator-readable values; current KIDs joined comma-separated. * 403 — authOIDCJWKSStatus errors, panel returns null, no DOM artifacts left behind. * RefreshNow — calls refreshOIDCProvider('op-okta'), invalidates the status query, the panel re-fetches and re-renders with the new refresh_count (mock returns different snapshots on the two calls). * RefreshNow surfaces refresh-failure inline without hiding the panel (preserves the existing snapshot so the operator can read pre-failure state). * NeverRefreshed — last_refresh_at='' renders the cold-cache sentinel rather than a blank cell. * CurrentKIDsEmpty — empty list renders the 'not exposed' sentinel rather than a blank cell. * LastError — non-empty last_error renders with red treatment. * CanRefreshFalse — panel + counters render; Refresh-now button is gone. Verify gate: * tsc --noEmit — clean * vitest OIDCJWKSStatusPanel.test.tsx — 9/9 pass * vitest OIDCProviderDetailPage.test.tsx — 19/19 pass (panel mount does not break existing tests because the unmocked authOIDCJWKSStatus call in those tests rejects, the panel returns null, and the rest of the page renders normally) Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md flips MED-7 from the premature CLOSED claim to a properly-staged 'Backend CLOSED 2026-05-10 + GUI half CLOSED 2026-05-11' annotation describing the panel + tests. Refs cowork/auth-bundles-fixes-2026-05-11/10-med-jwks-status-panel.md.	2026-05-11 11:57:38 +00:00
shankar0123	64ad8e525c	feat(gui/oidc): Test Connection panel on create + edit forms (MED-5 GUI half) Audit 2026-05-11 Fix 09 closure. MED-5's backend dry-run endpoint (POST /api/v1/auth/oidc/test, gated auth.oidc.create) shipped on dev/auth-bundle-2 (commit `b4b9879`) but the GUI never called it — authOIDCTestProvider in web/src/api/client.ts was dead code. Operator gap before this fix: complete the create form blind, save, then click 'Refresh' to discover whether the issuer URL worked. Discovery failures left a broken provider row in the DB that had to be deleted before retrying. The MED-5 backend exists to short- circuit this — surface the dry-run result before commit. New shared component web/src/pages/auth/OIDCTestConnectionPanel.tsx calls authOIDCTestProvider against the live form state (issuer URL + client ID + parsed scopes) and renders a four-row status panel inline: * ✓/✗ Discovery fetched (with issuer-echo from the well-known doc) * ✓/✗ JWKS reachable (with the discovered jwks_uri) * ✓/⚠ Supported algs (warning glyph when the IdP advertises none — distinct from a discovery failure) * ✓/· RFC 9207 iss-parameter advertised (informational · glyph rather than ✗ because the spec is SHOULD, not MUST) Backend per-leg errors[] flow into an inline bullet list. A top-level rectangle catches network/fetch failures separately. The Run button is disabled when the issuer URL is empty or whitespace-only. The component does NOT persist anything — safe to run repeatedly before the operator clicks Save. The panel is mounted in two places: * OIDCProvidersPage create modal (between the form fields and the Create button) — short-circuits the blind-save footgun for new provider configs. * OIDCProviderDetailPage edit form (between the field grid and the Save button) — load-bearing for verifying IdP rotations (Keycloak realm rename, Okta tenant move, certctl side-by-side hostname change) without committing first. A testIDSuffix prop (default 'create' / 'edit') gives each mount point a distinct data-testid namespace so both panels can coexist on a hypothetical page that uses both without DOM-id collisions. 8 Vitest tests in OIDCTestConnectionPanel.test.tsx: * RunButton — disabled until issuer URL is non-empty * RunButton — also disabled when issuer URL is whitespace-only * RunButton — enabled when issuer URL is non-empty * HappyPath — all four primary checks render green with detail rows for authorization_url / token_url / userinfo_endpoint (asserts both the glyph contract AND the mocked POST body shape) * FailurePath — discovery=false renders ✗ on discovery + ✗ on JWKS + ⚠ on empty supported algs + error list with backend per-leg messages * IssParamFalse — load-bearing UX claim that the iss-parameter row renders · (informational), not ✗; body must contain the word 'informational' so operators understand it's not a failure * FetchError — top-level error rectangle when the POST throws * TestIDSuffix — same component mounted twice with different suffixes renders both without DOM-id collision Verify gate: * tsc --noEmit — clean * vitest OIDCTestConnectionPanel.test.tsx — 8/8 pass * vitest OIDCProvidersPage.test.tsx + OIDCProviderDetailPage.test.tsx — 38/38 pass (panel-mount in both pages does not regress existing tests because they don't trigger the test button) Operator runbook: the four glyph meanings are documented inline on the panel's subtitle. Audit doc annotation at cowork/auth-bundles-audit-2026-05-10.md flips MED-5 from 'BACKEND CLOSED' to 'CLOSED' with the GUI-half annotation. Refs cowork/auth-bundles-fixes-2026-05-11/09-med-oidc-test-connection-button.md.	2026-05-11 11:52:26 +00:00
shankar0123	a923cf697c	harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8) Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure (`2e97cc1`) — production-startup observability for actor-demo-anon residual grants + CI guard banning new synthetic- admin code paths. What this changes: * cmd/server/preflight_demo_residual.go (new) runs after the DB pool + audit service are constructed and before the HTTPS listener starts. Under any non-'none' auth type it queries actor_roles for the synthetic actor-demo-anon and emits a WARN log + a categorized audit row (auth.demo_residual_grants_detected) listing every grant present. Migration 000029 unconditionally seeds the ar-demo-anon-admin row at install time, so EVERY production deploy will see this WARN on first boot; the intended cutover workflow is cleanup-once at production handover. * CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig, default false) pivots the WARN to fail-closed startup refusal for operators who want a paranoid posture against re-seeding. * POST /api/v1/auth/demo-residual/cleanup (new handler at internal/api/handler/demo_residual.go) is an admin-class (auth.role.assign) endpoint that removes every actor-demo-anon row from actor_roles and returns {removed: int64}. Idempotent; refuses 503 under Auth.Type=none (deleting the row would break the demo path); audit-logs every invocation including no-op zero-removed calls so the admin's action is always recorded. * scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry allowlist of source files that legitimately reference the actor-demo-anon literal. New runtime code paths that resolve to the synthetic actor (the same pattern that produced the original CRIT class) are rejected at PR time. CI workflow auto-picks the script via the existing scripts/ci-guards/.sh loop in .github/workflows/ ci.yml; no workflow edit needed. Regression matrix: cmd/server/preflight_demo_residual_test.go — 7 tests covering the 4 main behaviour branches (testcontainers-backed, testing.Short()- skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent) plus 3 pure-Go stdlib unit tests for the row-string formatter + nil-safety contracts on both helpers. * internal/api/handler/demo_residual_test.go — 7 stdlib+httptest cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503), CleanupError_Surfaces500, NilCleanupFn (defensive 500), NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to 'unknown' actor in the audit row). * internal/api/router/openapi_parity_test.go — new POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config) that had drifted out of SpecParityExceptions; the parity test was red on dev/auth-bundle-2 before my work; this commit returns it to green with full per-entry justifications + parity-debt notes. Docs: * docs/operator/security.md — new 'Demo-to-production cutover (Audit 2026-05-11 A-8)' section explaining the WARN message, the cleanup curl one-liner, the equivalent SQL, the strict-mode env var, and the CI guard. * docs/operator/rbac.md — Last-reviewed bump + pointer to the new env var + the security.md section. * cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an 'A-8 follow-on CLOSED 2026-05-11' annotation describing the deferred Phase 2 leg now landed. * CHANGELOG.md — Unreleased ### Security entry summarizing the four legs (detector + cleanup + strict-mode flag + CI guard) and the acquisition-readiness narrative this closes. Operator-facing impact: this closes a credibility gap, not an exploitable vulnerability. The residue requires a regression elsewhere in the middleware chain to be exploitable. After this fix, the canonical narrative ('RBAC primitive with no synthetic- admin fallback') is fully true. Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual- cleanup.md.	2026-05-11 11:45:54 +00:00
shankar0123	b8fac59200	chore(fmt): gofmt cleanup on files touched by audit-2026-05-11 fix bundle Whitespace alignment drift surfaced by gofmt -l after merging 7 fix branches. Pure formatting, no semantic change. Pre-existing master drift in internal/auth/oidc/{domain/types.go, integration_keycloak_rotate_test.go, test_discovery.go} left untouched — that's separate tech debt.	2026-05-11 11:29:48 +00:00
shankar0123	ad69158405	Merge Fix 07 (HIGH A-7): editable Advanced form on OIDCProviderDetailPage (MED-4) # Conflicts: # CHANGELOG.md # web/src/pages/auth/OIDCProviderDetailPage.test.tsx # web/src/pages/auth/OIDCProviderDetailPage.tsx	2026-05-11 11:27:43 +00:00
shankar0123	11b145b641	Merge Fix 06 (HIGH A-6): strict UA/IP binding — close request-empty bypass in MED-16 # Conflicts: # CHANGELOG.md # internal/api/handler/auth_session_oidc.go # internal/api/handler/auth_session_oidc_test.go	2026-05-11 11:19:04 +00:00
shankar0123	4e31568d3d	Merge Fix 05 (HIGH A-5): approval payload preview with profile-edit diff + cert-issuance preview # Conflicts: # CHANGELOG.md	2026-05-11 11:17:14 +00:00
shankar0123	68af18d081	Merge Fix 04 (HIGH A-4): scope-aware ActorRole revoke	2026-05-11 11:16:24 +00:00
shankar0123	df53b80cb6	Merge Fix 03 (CRIT A-3): expose AllowedEmailDomains on create + edit forms	2026-05-11 11:16:16 +00:00
shankar0123	11a1f0babd	Merge Fix 02 (CRIT A-2): close MED-11 lying field — DeactivatedAt loaded + enforced on login	2026-05-11 11:16:07 +00:00
shankar0123	027a5a1468	Merge Fix 01 (CRIT A-1): close HIGH-10 lying field — EffectivePermissions reads actor-role scope	2026-05-11 11:16:00 +00:00

1 2 3 4 5 ...

895 Commits