certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 23:31:39 +00:00

Author	SHA1	Message	Date
shankar0123	ff3f1cd864	harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8) Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure (`b81588e`) — production-startup observability for actor-demo-anon residual grants + CI guard banning new synthetic- admin code paths. What this changes: * cmd/server/preflight_demo_residual.go (new) runs after the DB pool + audit service are constructed and before the HTTPS listener starts. Under any non-'none' auth type it queries actor_roles for the synthetic actor-demo-anon and emits a WARN log + a categorized audit row (auth.demo_residual_grants_detected) listing every grant present. Migration 000029 unconditionally seeds the ar-demo-anon-admin row at install time, so EVERY production deploy will see this WARN on first boot; the intended cutover workflow is cleanup-once at production handover. * CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig, default false) pivots the WARN to fail-closed startup refusal for operators who want a paranoid posture against re-seeding. * POST /api/v1/auth/demo-residual/cleanup (new handler at internal/api/handler/demo_residual.go) is an admin-class (auth.role.assign) endpoint that removes every actor-demo-anon row from actor_roles and returns {removed: int64}. Idempotent; refuses 503 under Auth.Type=none (deleting the row would break the demo path); audit-logs every invocation including no-op zero-removed calls so the admin's action is always recorded. * scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry allowlist of source files that legitimately reference the actor-demo-anon literal. New runtime code paths that resolve to the synthetic actor (the same pattern that produced the original CRIT class) are rejected at PR time. CI workflow auto-picks the script via the existing scripts/ci-guards/.sh loop in .github/workflows/ ci.yml; no workflow edit needed. Regression matrix: cmd/server/preflight_demo_residual_test.go — 7 tests covering the 4 main behaviour branches (testcontainers-backed, testing.Short()- skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent) plus 3 pure-Go stdlib unit tests for the row-string formatter + nil-safety contracts on both helpers. * internal/api/handler/demo_residual_test.go — 7 stdlib+httptest cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503), CleanupError_Surfaces500, NilCleanupFn (defensive 500), NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to 'unknown' actor in the audit row). * internal/api/router/openapi_parity_test.go — new POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config) that had drifted out of SpecParityExceptions; the parity test was red on dev/auth-bundle-2 before my work; this commit returns it to green with full per-entry justifications + parity-debt notes. Docs: * docs/operator/security.md — new 'Demo-to-production cutover (Audit 2026-05-11 A-8)' section explaining the WARN message, the cleanup curl one-liner, the equivalent SQL, the strict-mode env var, and the CI guard. * docs/operator/rbac.md — Last-reviewed bump + pointer to the new env var + the security.md section. * cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an 'A-8 follow-on CLOSED 2026-05-11' annotation describing the deferred Phase 2 leg now landed. * CHANGELOG.md — Unreleased ### Security entry summarizing the four legs (detector + cleanup + strict-mode flag + CI guard) and the acquisition-readiness narrative this closes. Operator-facing impact: this closes a credibility gap, not an exploitable vulnerability. The residue requires a regression elsewhere in the middleware chain to be exploitable. After this fix, the canonical narrative ('RBAC primitive with no synthetic- admin fallback') is fully true. Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual- cleanup.md.	2026-05-11 11:45:54 +00:00
shankar0123	ddad647ee7	fix(auth/rbac): scope-aware ActorRole revoke (A-4) HIGH-10's UNIQUE (actor, role, scope_type, scope_id, tenant) uniqueness extension lets an operator grant the same role to the same actor at multiple scopes (e.g. r-operator on profile=p-acme AND profile=p-globex). But ActorRoleRepository.Revoke's WHERE clause omitted (scope_type, scope_id) — a single call deleted every variant. Selective revoke was unrepresentable; operators had to drop all and re-grant N-1, opening a race window where the actor's access was briefly different. Closure across all layers (handler → service → repo → MCP → GUI client), preserving the legacy "revoke all variants" contract for unmodified callers: internal/repository/auth.go - New ActorRoleRevokeOptions struct. Zero value = legacy semantic; non-empty ScopeType narrows to one variant. - New ErrActorRoleNotFound sentinel for scoped no-match (HTTP 404). internal/repository/postgres/auth.go - Revoke signature extended with opts. Empty opts.ScopeType uses the legacy SQL (no scope WHERE), zero-row delete = no error. - Non-empty narrows with `scope_type = $5 AND scope_id IS NOT DISTINCT FROM $6` — the IS-NOT-DISTINCT-FROM is load-bearing, vanilla `=` would silently miss the (global, NULL) case because NULL ≠ NULL in standard SQL. - Selective revoke with zero matching rows returns ErrActorRoleNotFound; operators get feedback on typos. internal/service/auth/actor_role_service.go - Revoke takes opts. Audit row's details map records the scope so SIEMs can distinguish wide-vs-selective revokes: `scope: "all_variants"` for the legacy path, or `scope_type` + `scope_id` for selective. Privilege check (auth.role.assign) and reserved-actor guard unchanged. internal/api/handler/auth.go - RevokeRoleFromKey parses optional `?scope_type=` / `?scope_id=` query params via new parseRevokeScope helper. - Validation mirrors AssignRoleToKey: scope_id forbidden with scope_type=global, required with profile/issuer, invalid scope_type → 400. scope_id without scope_type also → 400. - writeAuthError maps ErrActorRoleNotFound to 404. internal/mcp/tools_auth.go + types.go - AuthRevokeKeyRoleInput gains optional ScopeType + ScopeID with jsonschema descriptions explaining the dual-mode contract. - Tool call site appends URL-encoded query params when ScopeType is set; legacy callers (no scope_type) emit the bare DELETE path unchanged. web/src/api/client.ts - authRevokeKeyRole signature: optional 3rd argument `{ scope_type?, scope_id? }`. Pre-A-4 call sites (no opts arg) keep firing the bare DELETE — fully backward compatible. The GUI KeysPage's per-row revoke button (still one row per role, pre-Fix-12) continues to use the legacy shape; future GUI work can pass scope params for per-variant rows. docs/operator/rbac.md - New "Revoke: legacy 'all variants' vs scope-selective" subsection under "From the HTTP API" with curl examples for both modes plus the audit-row payload shape that lets SOC/SIEM tell them apart. Regression coverage: Repository (testcontainers, skipped under -short — 6 tests in internal/repository/postgres/auth_revoke_scope_test.go): TestRevokeActorRole_NoOpts_RemovesAllVariants TestRevokeActorRole_WithScope_RemovesOnlyMatching TestRevokeActorRole_WithGlobalScope_RemovesOnlyGlobal — pins the IS-NOT-DISTINCT-FROM branch (global, NULL) TestRevokeActorRole_NoMatch_ReturnsNotFound — pins the new sentinel TestRevokeActorRole_NoOpts_NoMatch_IsNoOp — pins the legacy idempotence contract TestRevokeActorRole_IssuerScope_RemovesOnlyMatching — pin the issuer-scope half (profile + issuer are symmetric scope types) Handler (7 new tests in auth_test.go): TestAuthHandler_RevokeRoleFromKey — extended to assert no scope filter is forwarded when query string is empty (legacy behaviour) TestAuthHandler_RevokeRoleFromKey_A4_ScopedProfile TestAuthHandler_RevokeRoleFromKey_A4_ScopedGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsMissingScopeID TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithoutScopeType TestAuthHandler_RevokeRoleFromKey_A4_RejectsInvalidScopeType TestAuthHandler_RevokeRoleFromKey_A4_ScopedNotFoundReturns404 MCP (2 new table rows in tools_per_tool_test.go): Scoped revoke with scope_type=profile + scope_id=p-acme → `?scope_type=profile&scope_id=p-acme` Scoped revoke with scope_type=global (no scope_id) → `?scope_type=global` Service-layer test plumbing (service_test.go) updated for new opts arg: 4 existing call sites pass repository.ActorRoleRevokeOptions{} to keep their pre-A-4 semantics; the fakeActorRoleRepo.Revoke implementation now mirrors the postgres scope-aware behaviour (legacy zero-value vs scoped narrowing + ErrActorRoleNotFound on no-match). Verify gate green: gofmt clean, go vet clean, go test -short across repository/postgres, service/auth, api/handler, and mcp. The pre-existing KeysPage.test.tsx failure observed on the baseline commit (reproduced via `git stash` earlier in Fix 03) is unrelated; my client.ts change adds an optional third argument and is fully backward-compatible. Spec at cowork/auth-bundles-fixes-2026-05-11/04-high-actor-role-revoke-scope.md. Audit doc updated: new row A-4 (2026-05-11) CLOSED appended to the status table at the bottom of cowork/auth-bundles-audit-2026-05-10.md. Operator-visible advisory in CHANGELOG.md v2.1.0 release notes under Security (non-BREAKING — legacy callers are unchanged). Depends on Fix 01 (the scope-aware EffectivePermissions read path on branch fix/audit-2026-05-11/crit-actor-role-scope-reads). This fix makes the inverse op selectively reversible; without Fix 01 the read side would mis-evaluate scoped grants anyway, making selective revoke moot at runtime.	2026-05-11 10:50:34 +00:00
shankar0123	77860fbcc3	harden(auth): LOW + Nit batch — bootstrap audit, crypto/rand, XFF trust, CSRF check, protocol-prefix unify (Batch 1) Audit 2026-05-10 — close 8 LOWs + 2 Nits in-bundle. Remainder (LOW-1/6/9/11/12, Nit-2/5) need GUI or DB-test runtime not present in-session; tracked in the audit-doc batch table. LOW-2: bootstrap.ValidateAndMint now emits 'bootstrap.consume_failed' audit rows on persist-key + grant-role failure branches before bubbling. Recovery requires DB seeding per the docstring; without this row, later forensics can't tell 'bootstrap was used and failed' from 'never invoked.' LOW-3: randomB64URLForHandler now uses crypto/rand (was time-nano- shifted). Two providers/mappings created in the same nanosecond used to collide; now they don't. Time-nano fallback retained for the unlikely crypto/rand-broken path. LOW-4: breakglass.verifyDummy uses s.readRand(salt) for the dummy Argon2id verify. Wall-clock cost unchanged (Argon2id memory alloc dominates), but cache/branch behavior now matches a real verify — closes the subtle timing side channel. LOW-5: clientIPFromRequest now only honors X-Forwarded-For when the direct connection's RemoteAddr falls in the CERTCTL_TRUSTED_PROXIES CIDR allowlist. Default-deny: empty list means XFF is ignored. SetTrustedProxies wired in cmd/server/main.go from cfg.Auth.TrustedProxies. LOW-7: internal/auth/protocol_endpoints.go::ProtocolEndpointPrefixes now carries /scep-mtls + /.well-known/est-mtls (previously only in router.AuthExemptDispatchPrefixes; the two lists had drifted). The canonical-prefix coverage test in Phase 12 still pins the set. LOW-8: docs/operator/rbac.md documents that r-mcp / r-cli / r-agent are not actor-type-bound — role naming is a hint, not an enforcement. Operators wanting hard binding must apply periodic audit queries. Native binding is on the v2 roadmap. LOW-10: Session.Validate now rejects a post-login row with empty CSRFTokenHash (IsPreLogin=false branch). validSession test fixture updated with a valid 64-hex CSRF hash. Nit-1: production RevokeAllForActor call sites already use typed constants (only test-file literals remain — acceptable). Nit-3: peekIssuer docstring documents the unsigned-permissive-by-design invariant + the post-verify re-check pin that the BCL handler enforces. A future commit that uses peekIssuer output before verify will trip the inline comment + the existing BCL test matrix. Status table updated in cowork/auth-bundles-audit-2026-05-10.md: 8 LOWs + 2 Nits CLOSED; 5 LOWs + 2 Nits OPEN with explicit reason (GUI work, repo refactor, Keycloak integration runtime, WONTFIX). Refs: cowork/auth-bundles-audit-2026-05-10.md LOW-2/3/4/5/7/8/10 cowork/auth-bundles-audit-2026-05-10.md Nit-1/3	2026-05-10 22:26:12 +00:00
shankar0123	457962f21a	fix(auth): apply rbacGate to every state-changing + read handler (CRIT-1 closure) Closes the wire-layer authorization gap surfaced by the 2026-05-10 audit (CRIT-1). Before this commit only ~24 of ~140 routes carried rbacGate enforcement — all of them admin-only fine-grained perms (auth.session., auth.oidc., auth.breakglass.admin, cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage). Every catalogued legacy-CRUD perm (cert.read/issue/revoke/delete, profile.edit/delete, issuer.edit/delete, target., agent., plus role-mgmt verbs) was declared in internal/domain/auth/validate.go but never wired at the router. A r-viewer Bearer was essentially r-admin minus five verbs at the wire layer (CWE-862). This commit: - Adds rbacGateScoped(checker, perm, scopeType, scopeFn, h) helper to internal/api/router/router.go for path-bound scope resolution. Per-profile and per-issuer grants (Decision 2) now reach the wire layer. - Wraps every state-changing route AND every read endpoint in router.go with rbacGate (global) or rbacGateScoped (path-bound). The auth-management routes (POST /api/v1/auth/roles, etc.) gain router-level enforcement in addition to the existing service-layer Authorizer check — defense in depth (HIGH-9 of the same audit collapses into this closure). - Auth-exempt surfaces stay un-gated by design: login, callback, BCL, logout, breakglass-login, bootstrap, health, auth-info, version. Allowlist is documented in TestRouterRBACGateCoverage. - Extends internal/domain/auth/validate.go CanonicalPermissions with 30 new perms across 12 namespaces: cert.edit; job.read, job.cancel; approval.read, approval.approve, approval.reject; policy.read/edit/delete; team.read/edit/delete; owner.read/edit/delete; notification.read/edit; discovery.read/run/claim; network_scan.read/edit/run; healthcheck.read/edit/delete/acknowledge; digest.read, digest.send; verification.read, verification.run; stats.read; metrics.read. - Updates DefaultRoles for r-admin / r-operator / r-viewer / r-mcp / r-cli / r-agent. r-auditor gets NOTHING new — the auditor pin (TestAuditorRoleHoldsExactlyAuditReadAndExport) stays invariant. - Migration 000039_audit_crit1_perms seeds the new perm rows + role grants per the updated DefaultRoles map. Idempotent ON CONFLICT DO NOTHING. Reverse migration removes role_permissions before permissions (ON DELETE RESTRICT on the FK). - AST-level CI guard TestRouterRBACGateCoverage in internal/api/router/router_rbac_coverage_test.go walks router.go and asserts every state-changing + read route is wrapped (or in the documented allowlist). Adding a new ungated route fails CI. - Updates docs/operator/rbac.md permission-catalogue table with the new namespaces + footer link to the AST CI guard. - Updates certctl/CHANGELOG.md v2.1.0 section with the closure narrative. Audit doc cowork/auth-bundles-audit-2026-05-10.md CRIT-1 row annotated CLOSED 2026-05-10. Bundle's exit-gate spec lives at cowork/auth-bundles-fixes-2026-05-10/01-crit-1-rbac-gates.md. CRIT-2 / CRIT-3 / CRIT-4 / CRIT-5 of the same audit remain open and continue to block the v2.1.0 tag. Verification gate green: - gofmt -d (no diff after gofmt -w on the touched files) - go vet ./... - go test -short -count=1 ./... (all packages pass including auditor pin) - go build ./... HIGH-9 of the audit closes via this commit's router-layer rbacGate on POST /api/v1/auth/keys/{id}/roles + DELETE /api/v1/auth/keys/{id}/roles/{role_id} (defense-in-depth on top of the existing service-layer privilege check). Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-1 HIGH-9	2026-05-10 19:58:26 +00:00
shankar0123	a581e2d222	auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes) Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator- facing docs updated, one new migration guide ships, README nav row added. Files ===== docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10): * Added 5 new Bundle 2 subsections under '## Authentication surface' after the Bundle 1 approval-bypass-closure entry: - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list, IdP-downgrade defense, iss/aud/azp/at_hash, single-use state+nonce, PKCE-S256 mandatory, JWKS rotation handling, encrypted client_secret at rest with the v3 blob format pinned by an integration test, pointer to oidc-runbooks/ for per-IdP setup. - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' — length-prefixed HMAC cookie wire format, HttpOnly + Secure + SameSite cookie hardening, idle/absolute timeouts, CSRF defense, signing-key rotation primitive, fail-fatal EnsureInitialSigningKey at server boot, OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414). - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists with Bundle 1's env-var-token bootstrap, group-scoped via CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID, one-shot per tenant. - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF, surface invisibility via 404-not-403, Argon2id with OWASP 2024 params, lockout state machine, constant-time-via- verifyDummy, WARN log at boot, runbook pointer for operator drill. - 'Migrating an existing deployment to OIDC' — pointer to the new migration/oidc-enable.md walkthrough. docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10): * Step-by-step migration guide for an operator on a Bundle-1-merged deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY, admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant) + 7 numbered steps (pin encryption key, complete IdP-side per runbook, configure certctl-side OIDCProvider, add group→role mappings with fail-closed warning, optional first-admin bootstrap, verify with single test user, announce SSO endpoint). * Rollback section covering the 4-step disable flow + the 409 Conflict on provider-delete-while-sessions-exist + the existing-sessions-keep-working-until-expiry semantics. * Troubleshooting section pinning 8 most-common failure modes (discovery doc fetch fails / IdP downgrade defense rejects / no roles assigned / iss mismatch / pre-login expired / state mismatch / sessions revoked but user can hit API / JWKS rotation breaks login). * Database row count drift documented so operators know what to expect after OIDC is live (10 Bundle 2 tables enumerated). * Cross-references to oidc-runbooks/ + security.md + auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md. CHANGELOG.md (MODIFIED): * v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive' to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'. * Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after Bundle 1 lands on master') with 18 new Bundle 2 entries: - OIDC + sessions + back-channel logout + break-glass overview. - OIDC token validation pinned at three layers (alg allow-list, IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification). - Length-prefixed HMAC session cookies. - CSRF double-submit + hashed-token-on-row. - OIDC client_secret AES-256-GCM v3 blob at rest + integration-test invariant. - OIDC first-admin bootstrap. - Default-OFF break-glass admin (Argon2id + lockout + constant-time + surface invisibility). - GUI: 4 new pages + login-page IdP buttons + sidebar logout. - 11 new MCP tools for OIDC + session management. - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace). - Threat model extended with 5 new defense subsections + 8 new threat-catalogue subsections. - Performance baselines documented (4 benchmarks; 3 measured + 1 operator-runs). - Standards-and-RFC implementation table (13 RFCs + 14 CWEs; NOT a compliance-mapping doc). - Coverage gates held at floor 90 across all 4 Bundle 2 packages (anti-Bundle-1-mistake invariant). - Multi-tenant query CI guard (ratchet baseline 32). - Phase 10 Keycloak testcontainers integration test + optional Okta smoke test. - OpenAPI cookieAuth security scheme + 13 new endpoints + 4 break-glass endpoints. - Bundle-1-only compat regression CI guard + Bundle-1-to-2-upgrade regression CI guard. * Final paragraph updated to point at oidc-enable.md alongside api-keys-to-rbac.md as the two migration walkthroughs. docs/README.md (MODIFIED): * Added the new oidc-enable.md migration row under '## Migration' alongside the existing api-keys-to-rbac.md entry, with a one-line description flagging it as the Bundle 2 OIDC onboarding walkthrough. Verification ============ * Last-reviewed on security.md + oidc-enable.md: 2026-05-10. * Internal-link sweep on oidc-enable.md: 0 broken (every relative link resolves via shell-loop verification). * Internal-link sweep on docs/README.md: 0 broken (all .md references resolve). * No Go-side impact, make verify gate unchanged. Bundle 2 documentation deliverables now complete: security.md + auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md + auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md + CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator- discoverable from docs/README.md root nav.	2026-05-10 17:07:27 +00:00
shankar0123	263dee4264	auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four benchmarks producing four numbers + the operator-doc table; three default-tag benchmarks runnable on every CI runner, the fourth (cold-cache OIDC) runnable on operator-side Docker hosts via the new make target. Files ===== internal/auth/session/bench_test.go (NEW): * BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs). Warm in-memory repo + warm session row. Pure CPU: parseCookie + HMAC verify + map lookup + sentinel checks. * BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms). Same pipeline but with a configurable per-call delay simulating a 1ms Postgres RTT on each repo call. Two repo calls per Validate (signing-key fetch + session-row fetch) = 2ms minimum; Go time.Sleep granularity adds ~1-2ms jitter. Documented why testcontainers Postgres isn't viable inside b.N: 30+ second container boot incompatible with per-iteration timing. * slowSessionRepo + slowKeyRepo wrappers add the per-call delay via time.Sleep; they delegate to the existing in-memory stubs. * reportPercentiles helper sorts + reports p50/p95/p99/max via b.ReportMetric (Go testing.B doesn't surface percentiles natively). internal/auth/oidc/bench_test.go (NEW): * BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms). Drives full HandleCallback against an in-process mockIdP (httptest.Server localhost loopback). Pre-warmed JWKS cache via RefreshKeys at setup. Pipeline: pre-login consume + state compare + token exchange (localhost ~50-200µs) + go-oidc Verify (RSA-2048 sig verify + alg pin) + service-layer iss/ aud/azp/at_hash/exp/iat/nonce re-checks + group-claim resolution + group→role mapping + user upsert + session mint. * The localhost-loopback /token call adds ~100-500µs of TCP overhead vs pure crypto; the prompt's "no network calls" steady-state framing accommodates this since the localhost loopback is the closest practical proxy for a same-region IdP /token call (which adds 5-15ms in production). internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration): * BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs). Drives RefreshKeys against a live Keycloak container from the Phase 10 testfixtures harness. Each iteration evicts the in-process cache + re-fetches discovery + re-fetches JWKS over real HTTP + re-runs the IdP-downgrade-attack defense. * Network-bounded: the cold path is dominated by HTTPS RTT to the IdP discovery endpoint, NOT crypto. The 200ms cap accommodates a geographically-distant IdP (~150ms RTT) plus the in-process JWKS fetch + downgrade-defense logic (~5ms locally). * Reuses the sharedKeycloak fixture from integration_keycloak_test.go (Phase 10) so the benchmark doesn't pay the 60-90s container boot cost separately. Skips with a clear message if invoked without the integration test setup. * Reports p50/p95/p99/max in MILLISECONDS (vs the microsecond-granularity steady-state benchmarks) since the cold path is two orders of magnitude slower. internal/auth/oidc/service_test.go (MODIFIED): * Refactored newMockIdP(t testing.T) to delegate to a new newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern for sharing test fixtures between testing.T and testing.B. No behavior change for existing service_test.go tests; the benchmark file in bench_test.go calls newMockIdPWithTB(b) to get the same fixture. docs/operator/auth-benchmarks.md (NEW): Result table with all four benchmarks + targets + measured numbers + status markers. Four-row matrix for the default-tag benchmarks; the fourth row (cold-cache) is operator-recorded with an empty cell waiting for the first Docker-equipped run. * Hardware floor section pinning the 4 vCPU / 8 GiB RAM / Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners satisfy this; operators on weaker hardware re-record. * "What each benchmark covers (and what it doesn't)" section per benchmark, distinguishing the warm steady-state pipeline from the cold path's network-bounded budget. * "Cold-cache OIDC: how to run" subsection documenting the make target + the test+benchmark coupling needed to populate sharedKeycloak. Operator-recorded baseline table seeded empty for first runs. * "Why the cold path is bounded by network latency, not crypto" section explaining the budget breakdown: - TCP handshake (1 RTT) - TLS 1.3 handshake (1-2 RTTs) - 2 HTTPS GETs (discovery + JWKS, 1 RTT each) - In-process crypto on the certctl side (~5-10ms total) So the 200ms cap is operator-checkable: real measurement > 200ms means the IdP is slow OR network congestion OR DNS issues — the diagnosis is upstream of certctl. Real measurement < 200ms means the IdP is on a fast same-region link. * Methodology section pinning the per-iteration timing capture + sort + percentile-extract approach. * Pre-merge audit section for the Phase 14 exit gate: four benchmarks ran, four numbers recorded, steady-state targets met, cold path is operator-runnable + measurably-bounded. Makefile (MODIFIED): * Added `make benchmark-auth` (default-tag, runs three of four benchmarks at 2000 samples each). * Added `make benchmark-auth-coldcache` (integration-tagged, runs OIDC cold-cache against live Keycloak; requires Docker). * Both targets carry explanatory comment blocks. docs/README.md (MODIFIED): * Added the auth-benchmarks.md doc to the Operator nav table alongside performance-baselines.md. Measured baselines at Phase 14 close (linux/arm64, 4 vCPU) ========================================================== BenchmarkSession_SteadyState p99 = 5µs (target < 1ms) ✓ 200× under BenchmarkSession_ColdProcess p99 = 7.1ms (target < 10ms) ✓ BenchmarkOIDC_SteadyState p99 = 1.5ms (target < 5ms) ✓ 3× under BenchmarkOIDC_ColdCache operator-runs (Docker required) Verification ============ * gofmt -l on three new bench files: clean. * go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean (default tag). * go vet -tags integration ./internal/auth/oidc/...: clean (integration tag covers the bench_keycloak_test.go file). * go test -short -count=1 across all 5 OIDC + session packages: green; the bench__test.go files compile but don't run under -short (testing.Short() guards + benchmarks are not selected by -run pattern). All three runnable benchmarks executed and produce the numbers above; recorded in auth-benchmarks.md.	2026-05-10 16:51:28 +00:00
shankar0123	944ce8e710	auth-bundle-2 Phase 12: extend auth-threat-model.md with Bundle 2 sections (OIDC + sessions + back-channel logout + OIDC first-admin + break-glass + 8 Bundle 2 threat sub-sections) Closes Phase 12 of cowork/auth-bundle-2-prompt.md. The single canonical operator-facing threat model (one doc per topic per the docs convention) now covers both Bundle 1 (RBAC) AND Bundle 2 (OIDC + sessions + back-channel logout + OIDC first-admin + break-glass) in one place. File: docs/operator/auth-threat-model.md (MODIFIED, +485 LOC) Conventions held ================ * The Bundle 1 sections ("Threat actors", "Defenses Bundle 1 ships", "Threats Bundle 1 does NOT close", "Compliance mapping", "Operator-facing checks", "Cross-references") stay structurally intact. Bundle 2 EXTENDS them; nothing is rewritten in place. * `Last reviewed:` header bumped 2026-05-09 → 2026-05-10. * Per the prompt's explicit instruction: "do NOT create a separate auth-threat-model-bundle-2.md companion." This commit is a single-file extension. Changes ======= Intro paragraph rewritten: * From "Bundle 1 lands... Bundle 2 will be updated" to "Bundle 1 AND Bundle 2 land." Sets the reader's expectation that this is the post-Bundle-2 doc. Threat actors section (4 new actors appended): * OIDC-federated end user (token-forgery / session-hijacking / group-claim-manipulation surface). * Stolen session cookie holder (XSS / network MITM / pasted-token). * Compromised IdP (rogue token issuance; mitigations bounded to audit trail + group-mapping configuration). * Break-glass-password holder (Phase 7.5 path bypasses OIDC + group layer entirely; default-OFF is the load-bearing mitigation). NEW: Defenses Bundle 2 ships (5 sub-sections): * OIDC token validation (Phase 3) — alg allow-list, IdP-downgrade defense, exact iss match, aud + azp checks, at_hash REQUIRED-when-access_token-present (Phase 3 tightening of OIDC core's MAY → MUST), single-use state + nonce, PKCE-S256 mandatory, iat window, JWKS rotation handling, JWKS-fetch-fail closed, encrypted client_secret at rest. * Session minting + cookies (Phases 4 + 6) — length-prefixed HMAC defeating concatenation collision, HttpOnly + Secure + SameSite cookie hardening, idle + absolute timeouts, CSRF defense via double-submit-cookie + hashed-token-on-row, optional IP/UA bind, signing-key rotation primitive with retention window, fail-fatal EnsureInitialSigningKey at boot, pre-login vs post-login cookie discrimination. * Back-channel logout (Phase 5) — OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414), required-claim pinning, jti-based replay defense, alg allow-list applies, Cache-Control: no-store. * OIDC first-admin bootstrap (Phase 7) — coexists with Bundle 1's env-var-token bootstrap, group-scoped, one-shot per tenant via admin-existence probe, explicit OIDC provider gate, audit row on every grant. * Break-glass admin (Phase 7.5) — default-OFF, surface-invisibility via 404-not-403, Argon2id with OWASP 2024 params, lockout state machine, constant-time across all failure paths via verifyDummy, WARN log at boot when ENABLED=true, 5/min rate limit on the public login endpoint. NEW: Bundle 2 threat catalogue (8 sub-sections, one per prompt-enumerated threat axis): 1. OIDC token forgery vectors and mitigations (9-row table covering alg confusion, audience injection, issuer mismatch, nonce replay, state replay, at_hash substitution, iat window manipulation, JWKS rotation mid-login, JWKS-fetch failure during a key rotation). 2. Session hijacking vectors and mitigations (7-row table covering XSS cookie theft, network MITM, CSRF, concatenation-collision forgery, stolen-cookie replay, cross-tab interference, sign-out race). 3. IdP compromise scenarios (operator monitors IdP audit logs, operator can rotate group-role mappings without redeploying, audit trail records source provider, provider-delete returns 409 with active sessions). 4. Back-channel logout failure modes (6-row table covering IdP unreachable, invalid signature, replay via jti, alg confusion, missing events claim, present-nonce-claim). 5. Group-claim manipulation (4-row table covering operator misconfigured mapping, misconfigured groups_claim_path, IdP renames a group, IdP user maintainer adds user to unintended group). 6. Bootstrap phase risks post-Bundle-2 (4-row table covering CERTCTL_BOOTSTRAP_TOKEN leak, CERTCTL_BOOTSTRAP_ADMIN_GROUPS misconfigured to a wide group, both bootstrap strategies simultaneously, multi-IdP without explicit provider gate). 7. Break-glass risks (7-row table covering phished password, online brute-force, offline brute-force on DB compromise, operator forgets to disable, side-channel timing on wrong-vs-no-credential-vs-locked, surface fingerprinting, reserved-actor mutation). 8. Token-leak hygiene (the explicit grep policy with three per-package logging_test.go pointers + the audit_redact.go defense-in-depth note). Threats Bundle 1 does NOT close section relabeled: * Section header now reads "Threats Bundle 1 does NOT close (Bundle 2 closure status)" with each item carrying ✅ / ⚠️ / "still deferred" markers. * Items 1, 2, 3, 8 marked ✅ closed by Bundle 2. * Items 4, 5, 7, 9 marked still-deferred with v3 / follow-on pointers. * Item 6 (rate limiting on bootstrap) marked acceptable; Bundle 2 adds the same rate-limit primitive to /auth/breakglass/login. NEW: Threats Bundle 2 does NOT close section listing the 8 v3 / future-work items: * WebAuthn / FIDO2 second factor (Decision 12). * Time-bound role grants / JIT elevation. * SAML federation (operators broker through Keycloak). * Multi-tenant data isolation activation (gated to managed-service hosting work). * HSM / FIPS-validated signing key for sessions. * OIDC RP-initiated logout (Bundle 2 implements only back-channel). * GUI E2E via Playwright. * Per-IdP runbook external-tester sign-off (encouraged, NOT a merge gate post-2026-05-10 policy change). Operator-facing checks section extended: * 6 new SQL-shaped checks for Bundle 2 (provider count drift, per-actor session count, unmapped-groups audit-row spike, break-glass usage outside incidents, OIDC first-admin one-row-per- tenant invariant, retired-signing-key GC liveness). Cross-references section split into Bundle 1 anchors + Bundle 2 anchors: * Bundle 2 anchors enumerate every load-bearing file: 6 internal/auth/ packages, 5 migrations, 3 ci-guards. Compliance mapping section UNCHANGED: * Phase 15 (standards-and-RFC-implementation table) is the proper home for the RFC + CWE evidence the Bundle 2 surface adds. Re-introducing framework-mapping prose at the threat-model layer would regress the operator's 2026-05-05 retired-compliance-docs decision, which is explicitly forbidden by the Phase 15 prompt. Verification ============ * `> Last reviewed: 2026-05-10` — confirmed via head -3. * All 8 prompt-mandated Bundle 2 threat sub-sections present — confirmed via grep `^### ` count (19 ### headers total: 6 Bundle 1 + 5 Bundle 2 defenses + 8 Bundle 2 threats). * All 39 prompt-listed threat-vector keywords present — confirmed via single-line grep counting 39 hits across the prompt's vocabulary. * Internal markdown links resolve cleanly — confirmed via shell loop iterating each `]( ...)` reference and checking `[ -e "$path" ]`. * No backend / Go-test impact — pure docs commit. * `make verify` gate unchanged.	2026-05-10 16:11:08 +00:00
shankar0123	c841ab4cca	auth-bundle-2 Phase 11 follow-on: drop external-tester reference from oidc-runbooks/index.md The 'external tester' merge-gate criterion was removed from the auth-bundles-index.md policy: external-tester confirmations are encouraged but NOT a merge condition (BSL discourages contribution- style testing; the Phase 10 Keycloak testcontainers harness + the optional Okta smoke test cover the same surface deterministically in CI). Drops the now-stale phrasing from the runbooks index and the merge-gate reference; keeps the operator-sign-off footer recommendation since dated validation records are still useful.	2026-05-10 15:58:03 +00:00
shankar0123	00c708524d	auth-bundle-2 Phase 11: 6 per-IdP OIDC runbooks + index + docs/README wiring Closes Phase 11 of cowork/auth-bundle-2-prompt.md. Operators can now configure each major IdP against certctl's OIDC SSO surface with documented steps, no guessing. Files ===== docs/operator/oidc-runbooks/index.md (NEW): * Index page linking all six per-IdP runbooks. * Comparison matrix (free vs paid, group-claim shape, special quirks) so operators pick the right runbook in <30 seconds. * "Common shape" section pinning the consistent five-section layout every runbook follows. * "Cross-IdP recurring concepts" section consolidating the redirect-URI / client-secret-rotation / JWKS-cache-TTL / fail-closed- group-mapping / PKCE-S256 / IdP-downgrade-attack-defense behaviors so each per-IdP runbook can stay focused on what differs. docs/operator/oidc-runbooks/keycloak.md (NEW): * Canonical reference. Mirrors the testfixtures/keycloak-realm.json shape from Phase 10's integration test fixture so the operator's hand-config matches the CI-verified config exactly. * Step-by-step IdP-side: realm → client → groups → group-mapper → user. Cites the exact Keycloak admin-console paths (Clients → certctl → Client scopes → certctl-dedicated → Add mapper, etc.). * GUI + API + MCP equivalents for the certctl-side configuration. * JWKS-rotation drill mapped to the Phase 10 integration test that exercises the same flow. * 6 most-common troubleshooting paths mapped to certctl service- layer sentinel errors (ErrIssuerMismatch / ErrGroupsUnmapped / ErrPreLoginNotFound / ErrStateMismatch / IdP-downgrade-defense rejection / clock-skew on iat). docs/operator/oidc-runbooks/authentik.md (NEW): * Authentik-specific deltas vs Keycloak: provider/application split, property-mapping abstraction, explicit `groups` scope requirement, hashed-vs-email subject mode, signing-key rotation via Crypto/Tokens. docs/operator/oidc-runbooks/okta.md (NEW): * Okta-specific deltas: Org server vs custom auth server distinction, the load-bearing "Define groups claim" step (Okta does NOT emit groups by default), group-filter regex on the claim definition, access-policy gotcha, optional Okta smoke test pointer to Phase 10's integration_okta_smoke_test.go. docs/operator/oidc-runbooks/auth0.md (NEW): * Auth0's namespaced-custom-claim quirk documented up front: any Action-emitted claim MUST use a URL-shape namespaced key (e.g. https://your-namespace/groups), and certctl's hand-rolled groupclaim resolver recognizes URL-shape paths as a single literal key (no path-walking through `/`). Walks operators through writing the Login Action that emits groups from app_metadata. Three alternative group-modeling options (app_metadata vs Authorization Extension vs Roles+Permissions) with tradeoffs. docs/operator/oidc-runbooks/azure-ad.md (NEW): * The big Entra ID quirk documented up front: groups claim emits GROUP OBJECT IDs (GUIDs), NOT human-readable names. Certctl group→ role mappings MUST be configured against the GUIDs. The cloud-only-display-names alternative is documented but not recommended for hybrid AD environments. Covers the >200 groups truncation case (Microsoft's `hasgroups: true` claim) + the v1.0 vs v2.0 endpoint distinction (certctl supports v2.0 only). docs/operator/oidc-runbooks/google-workspace.md (NEW): * The big Google Workspace quirk documented up front: Google does NOT emit a groups claim in the ID token. Recommended pattern is to broker through Keycloak (or Authentik) as a federated identity provider — the user authenticates at Google but certctl talks to Keycloak. Walks operators through wiring Google as a federated IdP in Keycloak, four group-assignment options (manual vs default-group vs claim-derived vs SCIM), and the end-to-end browser flow. The "direct integration without groups" anti-pattern is documented at the bottom with explicit "NOT RECOMMENDED" framing so operators understand why the broker pattern is the right call. docs/README.md (MODIFIED): * Adds the OIDC / SSO runbooks index to the operator-facing docs nav table, between "Auth threat model" and "Control plane TLS". Conventions held ================ * Every runbook carries `> Last reviewed: 2026-05-10` per the docs convention. * Every runbook follows the prompt-mandated five-section layout: Prerequisites → IdP-side configuration → certctl-side configuration → Verification → Troubleshooting → Validation checklist (with operator sign-off line). * Internal-link sweep clean — every relative link resolves to an existing file (verified via shell loop checking each `](../...)` and `](.md)` reference). External links to IdP vendor sites are the canonical https URLs. No leakage of cowork/ workspace paths as Markdown links — the azure-ad.md initially had a `[auth-bundles-index.md](../../../../cowork/...)` reference; replaced with prose-only mention to match the existing convention from rbac.md + migration/api-keys-to-rbac.md. * The 7 files share a "Validation checklist" footer with operator sign-off line; per the prompt's exit criterion, each runbook must be validated end-to-end by either the operator or an external tester before Bundle 2 ships. Verification ============ * Last-reviewed dates: 7/7 runbooks dated 2026-05-10. * Internal-link sweep: 0 broken (every `]( ...)` reference resolves). * docs/README.md → operator/oidc-runbooks/index.md link resolves. * No backend / frontend / Go-test impact — pure docs commit. The pre-commit `make verify` gate is unchanged; this commit doesn't touch any Go file. Phase 11 deviation note ======================= The merge-gate criterion's "≥ 2 external testers" requirement is operator-driven and post-tag — Phase 11 ships the runbooks; the operator runs each end-to-end against a real production-tier IdP and fills in the sign-off footers before flipping Bundle 2 to "merged." Sandbox cannot exercise live Keycloak / Okta / Auth0 / Entra ID / Google Workspace tenants; the Phase 10 testcontainers Keycloak integration is the load-bearing automated test on the Keycloak axis, and the per-IdP runbooks document the manual-validation matrix the operator runs against the other five IdPs.	2026-05-10 15:49:56 +00:00
shankar0123	f4cdce764c	auth-bundle-1 Phase 13 follow-up: em-dash sweep + broken-link fix Self-audit on `ba68f9a` flagged the prompt's 'zero em dashes' discipline rule. The four new Phase 13 docs and the v2.1.0 CHANGELOG section had 97 em-dash hits between them; this commit sweeps them all to ASCII hyphens. Counts before -> after: docs/operator/rbac.md 28 -> 0 docs/operator/auth-threat-model.md 36 -> 0 docs/migration/api-keys-to-rbac.md 16 -> 0 docs/operator/security.md 8 -> 0 docs/reference/profiles.md 3 -> 0 CHANGELOG.md 6 -> 0 Mechanical: ' - ' (spaced em dash) and bare em-dash both replaced with spaced ASCII hyphen, then double-spaces collapsed. Markdown list bullets ('^- ', '^ - ', '^ - ') verified intact across all six files. Internal-link sweep also re-run. Also fixes a pre-existing broken link the audit caught: docs/operator/security.md:70 referenced '../internal/crypto/encryption.go' which is a 1-level-up jump from docs/operator/, not the 2-level-up jump it actually needs ('../../internal/crypto/encryption.go'). Pre-Bundle-1 link rot; fixed in lockstep so the merge gate's docs validation passes cleanly. Final state across the Phase-13 docs + CHANGELOG: - 0 em dashes - 0 broken internal links - Last-reviewed: 2026-05-09 header on every new doc Bundle 1 documentation is now ready for the operator-side merge gate review.	2026-05-10 00:15:30 +00:00
shankar0123	ba68f9a994	auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update) Closes the last Phase before the Bundle 1 Exit gate. Operators now have authoritative reference + threat model + migration guide covering every behavior change Bundles 0-12 introduced. # New docs * docs/operator/rbac.md (340 lines) — operator how-to: - Mental model (actors / roles / permissions / scopes) - 7 default roles seeded by migration 000029 + the 5 admin-only fine-grained perms seeded by 000030 - Permission catalogue table by namespace - Scope semantics (global beats specific) + the Bundle-2 deferral on scope_id FK enforcement - Granting / revoking access from GUI + CLI + HTTP API + MCP - The auditor pattern (audit-only, no resource read) - Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl → HTTP 410 thereafter) - Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production * docs/operator/auth-threat-model.md (180 lines) — what the controls defend against: - 5 threat actors (external, wrong-role, compromised key, insider operator, compromised auditor) - Per-defense walk-through (API-key auth, RBAC, bootstrap, approval workflow + Phase 9 closure, audit trail, protocol-endpoint allowlist) - 9 explicit deferrals (OIDC, sessions, local accounts, JIT elevation, MFA, etc.) — Bundle 2 / future scope - Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b), NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10) - 5 operator-runnable sanity checks (e.g., 'SELECT FROM audit_events WHERE actor=system-bypass' MUST return 0 in production) * docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x → v2.1.0 upgrade flow: - The SECURITY: AUDIT YOUR API KEYS callout - Migration list (000029-000033) + what each does - 4-mode scope-down flow (interactive / non-interactive JSON / --suggest / --suggest --apply) - What changes for code that called auth.IsAdmin - Helm-specific upgrade flow with example post-upgrade Job - Docker Compose upgrade flow + the 5 examples folders that ride demo mode unchanged - Verification queries + rollback flow # Updated docs * docs/operator/security.md — Last-reviewed bumped to 2026-05-09; existing Authentication-surface section extended to call out the Bundle 1 RBAC primitive, day-0 bootstrap path, and approval-bypass closure with cross-references to the new docs. * docs/reference/profiles.md — Last-reviewed header formatting fixed (added the > blockquote prefix used consistently across the docs tree). # docs/README.md navigation * Operator section gains 2 new rows (RBAC + auth-threat-model) and Approval-workflow row updated to mention Phase 9 closure. * Reference section gains the Profiles row. * Migration section gains the api-keys-to-rbac row with the AUDIT YOUR API KEYS callout in the link description. # CHANGELOG.md v2.1.0 section refreshed The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS callout. This commit appends the missing Phase 9-12 highlights: - Approval-bypass closure (profile-edit gate + flip-flop loophole + ErrApproveBySameActor invariant) - GUI: Roles / API Keys / Auth Settings / Approvals queue - 12 new MCP RBAC tools - Coverage gates on internal/auth + internal/service/auth - Protocol-endpoint allowlist pinned at 3 layers Trailing cross-reference block now points at all 4 new docs. # Verifications * Every internal link in the 4 new/modified docs validated by shell sweep (find broken links → 0 hits). * Every new doc carries 'Last reviewed: 2026-05-09' header with the > blockquote prefix matching the docs-tree convention. * go vet ./... clean. * staticcheck across every Bundle-1-touched Go package clean. * gofmt -l clean repo-wide. * go test -short -count=1 green across internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service (incl. auth), internal/domain/auth, internal/mcp, cmd/cli (cmd/server has 1 environmental failure on the sandbox virtiofs-tmp: TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends on tmpfs file-mode semantics that virtiofs propagates differently — pre-existing, unrelated to Bundle 1). * Frontend: 19 Vitest tests across src/pages/auth/ + AuditPage all pass; tsc --noEmit clean.	2026-05-10 00:10:15 +00:00
shankar0123	b216de9d57		2026-05-05 18:18:29 +00:00
shankar0123	7c134d0575	docs: retire compliance subtree + sweep framework name-drops from prose Per operator decision the framework-mapping docs are gone. They were aspirational (no audit, no certification, no validated mapping); keeping them around was misleading. Files deleted (1,883 lines): - docs/compliance/index.md - docs/compliance/soc2.md - docs/compliance/pci-dss.md - docs/compliance/nist-sp-800-57.md Hyperlinks removed: - README.md: 'Auditor / compliance' row in the doc table; the '(compliance mapping included)' parenthetical in the positioning paragraph - docs/README.md: the '## Compliance' section table; the 'Auditor / compliance team' reading-order-by-role row Prose name-drops swept across 24 files: - README.md: 'FedRAMP boundary CAs / financial-services policy CAs' → '4-level boundary CAs / 3-level policy CAs'; 'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA' → cut entirely - getting-started/{quickstart,concepts,examples,why-certctl, advanced-demo}.md: 'compliance' → 'audit' / 'policy'; 'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut; ''pci': 'true'' tag example → ''environment': 'production'' - migration/cert-manager-coexistence.md: 'compliance rules' → 'policy rules' - operator/approval-workflow.md: 'Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' → 'Operators'; entire 'Compliance control mapping' table (PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1 / HIPAA §164.308(a)(4)) deleted; 'compliance contract' → 'two-person-integrity contract'; 'compliance auditors' → 'reviewers' - operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5' audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5 attestation' section retitled to 'TLS posture summary' and rewritten without framework framing; 'PCI-DSS, NIST, and major browsers will eventually deprecate TLS 1.2' → 'Major browsers and OS vendors will eventually deprecate TLS 1.2' - operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref → CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS Req 4 v4.0 prose footing → cut - operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI procurement-team deliverable' → 'on-call deliverable'; 'compliance auditors' → 'reviewers' - reference/connectors/{acme,aws-acm,azure-kv,globalsign, local-ca,openssl,ssh,index}.md: 'compliance reporting (PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting'; 'Compliance environments (PCI-DSS Level 1, FedRAMP High, HIPAA)' → 'Regulated environments'; 'compliance audits' → 'audit'; 'FedRAMP boundary CA' pattern names → '4-level boundary CA' (technically descriptive) - reference/protocols/est.md: 'compliance-hook seam' → 'device-state hook seam'; 'compliance gating' → 'device-state gating'; 'est_compliance_failed' → 'est_device_state_failed' - reference/protocols/scep-intune.md: 'Optional compliance check' → 'Optional device-state check'; failure-counter 'compliance_failed' → 'device_state_failed'; 'Conditional Access compliance gating' → 'Conditional Access device-state gating' - reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA deployments where the regulator requires...' → 'Boundary-CA deployments where you want separation of policy and issuing authorities'; pattern A retitled '4-level FedRAMP boundary CA' → '4-level boundary CA' - reference/architecture.md: broken Related-docs link to compliance.md removed; the rest of that block had stale pre-Phase-2 paths (quickstart.md, demo-advanced.md, connectors.md, openapi.md, testing-guide.md, test-env.md) — retargeted to current locations - reference/deployment-model.md: 'SOC 2 evidence-report generator' → 'Audit-evidence report generator' - reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this into evidence packs' → 'reviewers paste this into vendor-evaluation packs' - contributor/qa-test-suite.md: 'compliance exist' coverage description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)' risk-class label → 'Audit-relevant' What was kept: - CWE references (legitimate technical pointers) - Microsoft API/feature names that happen to use 'compliance' literally ('Microsoft Graph compliance API', 'device-compliance validators' — these are MS product names, not framework name-drops) - 'NIST PQC' on the landing page (Post-Quantum Cryptography is the actual NIST standard family, not a compliance framework) Verified: zero hyperlinks into docs/compliance/ remain. All 24 ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean. Net diff: 26 files / -1,883 deletions in compliance/ + -32 net across the prose sweep. Companion edits in cowork/ (CLAUDE.md doc-tree summary + WORKSPACE-CHANGELOG.md retirement note) land separately.	2026-05-05 05:26:44 +00:00
shankar0123	c64777f655	docs: Phase 5 — testing-guide.md prune (8268 → 0 lines, content dispersed) Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/ and the section-by-section plan in testing-guide-tumor.md. testing-guide.md was 30% of all docs/ content (8268 lines) but was integration test code written in markdown, not operator documentation. The audit's tumor analysis disposed of every Part: - ~65% DELETE (test cases that already exist in code) - ~22% MOVE to inline test code - ~8% KEEP-COMPRESSED into focused operator-runbook docs - Title + contents + release sign-off ~5% KEEP This commit ships the KEEP-COMPRESSED dispersal: docs/contributor/qa-prerequisites.md (NEW, ~120 lines): From testing-guide.md "Prerequisites" section. Stack boot procedure, demo data baseline, reference IDs operators reuse across QA docs. docs/contributor/gui-qa-checklist.md (NEW, ~105 lines): From testing-guide.md "Part 35: GUI Testing". Manual GUI verification pass for release sign-off. 25-row table covering every dashboard page. docs/contributor/release-sign-off.md (NEW, ~130 lines): From testing-guide.md "Release Sign-Off" section (originally 1009 lines of per-test detail tables). Compressed to a release-day checklist organized by gate category: code state, automated gates, manual QA passes, release artefact verification, branch protection, post-release. docs/operator/performance-baselines.md (NEW, ~100 lines): From testing-guide.md "Part 39: Performance Spot Checks". Four operator-runnable benchmarks (API request handling, inventory list pagination, scheduler tick, bulk revoke) with baseline numbers and when-to-re-baseline guidance. docs/operator/helm-deployment.md (NEW, ~120 lines): From testing-guide.md "Part 52: Helm Chart Deployment". Operator runbook for the bundled deploy/helm/certctl/ chart: prereqs, install, four cert-source patterns, verify, upgrade, troubleshooting. docs/reference/cli.md (NEW, ~120 lines): From testing-guide.md "Part 28: CLI Tool". certctl-cli command reference with command-group breakdown, common workflows (list/filter, renew, revoke, bulk import, EST enrollment, status), output formats, CI/CD integration patterns. docs/README.md navigation index updated to include the 6 new docs: Reference section gains: cli.md, release-verification.md (was added in Phase 13) Operator section gains: helm-deployment.md, performance-baselines.md Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md, release-sign-off.md docs/testing-guide.md deleted. Git history preserves the 8268 lines — if any specific test case is found missing from inline test code or the destination docs during future work, lift from `git show HEAD~1:docs/testing-guide.md`. Net: docs/ total line count drops by ~7700 lines (28%), from 26,369 to 18,742. testing-guide.md was the single largest doc; pruning it is the single biggest content-edit win of the entire restructure. Phase 5 is the last major content phase. Remaining: Phase 4 follow-on (per-connector page extractions from reference/connectors/index.md), Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).	2026-05-05 03:38:54 +00:00
shankar0123	9d0c2fe551	docs: Phase 11 follow-on — fix inter-doc cross-references in deeper subdirs Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Continuation of Phase 11 (commit `a7b36c4` handled README + first round of docs/ links). This commit fixes the remaining inter-doc broken links in the deeper subdirectories. Per source directory: docs/getting-started/quickstart.md (1 fix): (connectors.md) → (../reference/connectors/index.md) docs/contributor/test-environment.md (2 fixes): (tls.md) → (../operator/tls.md) (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) docs/contributor/testing-strategy.md (4 fixes): `docs/security.md` → `docs/operator/security.md` (security.md) → (../operator/security.md) `docs/testing-guide.md` (kept; testing-guide.md still at top level pending Phase 5 prune) (testing-guide.md) → (../testing-guide.md) docs/migration/acme-from-traefik.md (2 sites, multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/cert-manager-coexistence.md (1 fix): (./quickstart.md) → (../getting-started/quickstart.md) docs/migration/from-acmesh.md (2 fixes): (connectors.md) → (../reference/connectors/index.md) (./examples.md) → (../getting-started/examples.md) docs/migration/acme-from-caddy.md (multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/acme-from-cert-manager.md (multi-link): (./acme-server.md) → (../reference/protocols/acme-server.md) (./acme-server-threat-model.md) → (../reference/protocols/acme-server-threat-model.md) (./acme-caddy-walkthrough.md) → (./acme-from-caddy.md) (./acme-traefik-walkthrough.md) → (./acme-from-traefik.md) docs/migration/from-certbot.md (2 fixes): (./concepts.md) → (../getting-started/concepts.md) (./examples.md) → (../getting-started/examples.md) docs/operator/tls.md (3 sites): (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) (quickstart.md) → (../getting-started/quickstart.md) (test-env.md) → (../contributor/test-environment.md) docs/operator/runbooks/disaster-recovery.md (5 fixes): (crl-ocsp.md) → (../../reference/protocols/crl-ocsp.md) (tls.md) → (../../operator/tls.md) (security.md) → (../../operator/security.md) (scep-intune.md) → (../../reference/protocols/scep-intune.md) (est.md) → (../../reference/protocols/est.md) After this commit, the major operator-facing surfaces have valid cross-refs. Some lower-traffic docs (compliance/soc2.md, compliance/ nist-sp-800-57.md, deeper reference/* docs) may still have broken inter-doc links; those will surface during the Phase 4 follow-on (per-connector page extraction) and Phase 5 (testing-guide prune) work and can be fixed there incrementally.	2026-05-05 03:31:05 +00:00
shankar0123	97f51cc044	docs: Phase 14 — Last reviewed line sweep across docs/ Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Adds a `> Last reviewed: 2026-05-05` line right after the H1 heading of every doc that didn't already have one (41 files). This dates the freshness clock for the future Phase 4 per-doc review. The discipline going forward: when a doc's content gets a meaningful edit, bump the date. When the date gets old (e.g., >6 months), the doc earns a freshness-review pass. Mechanical insertion via awk one-liner, applied to every docs/*.md that didn't already match `grep -q 'Last reviewed:'`. Files that already carried the line from earlier Phase 2 work (the navigation index, the new connector docs, the new SCEP server / legacy-clients- TLS-1.2 / release-verification docs, and the 5 per-connector deep dives) were skipped to avoid duplicate insertion. Net: every doc in docs/ now has a Last reviewed line.	2026-05-05 03:26:46 +00:00
shankar0123	cb154a8388	docs: split legacy-est-scep.md into two purpose-aligned docs The 519-line legacy-est-scep.md had a dual personality flagged by the Phase 1 audit: lines 1-203 were a TLS-1.2 reverse-proxy runbook for legacy clients, and lines 205+ were the current SCEP RFC 8894 native implementation reference (mislabeled as "legacy"). Two separate audiences, two separate purposes. Split: Lines 1-203 (TLS-1.2 reverse-proxy runbook): → docs/operator/legacy-clients-tls-1.2.md (NEW) Operator runbook for the case where embedded EST/SCEP clients only speak TLS 1.2. Covers nginx + HAProxy reverse-proxy patterns, certctl- side header-agnostic config rationale, PCI-DSS Req 4 §2.2.5 attestation, deprecation timeline. Also got a fresh "What this is" framing. Lines 205-end (SCEP RFC 8894 native server reference): → docs/reference/protocols/scep-server.md (NEW) Generic SCEP server protocol reference: RA cert + key configuration, GetCACaps capability advertisement, supported messageTypes, MVP backward-compat path, multi-profile dispatch, must-staple per-profile policy, mTLS sibling route, Microsoft Intune dynamic-challenge dispatcher. Cross-links to scep-intune.md for Intune-specific deployment guidance. Both new docs carry a `Last reviewed: 2026-05-05` line. Internal links within each new doc updated to the new sibling paths. Cross-references from other docs to legacy-est-scep.md still need fixing in Phase 11. Original docs/legacy-est-scep.md deleted (git history preserves).	2026-05-05 02:55:45 +00:00
shankar0123	b375df767e	docs: Phase 2 mechanical file moves to subdirectory structure Pure git mv operations; no content edits. Internal links remain pointing at old paths and will be fixed in Phase 11. Per the Phase 1 audit recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/. 35 files moved across 8 audience-organized subdirectories: docs/getting-started/ (5): quickstart.md, concepts.md, examples.md, advanced-demo.md (was demo-advanced.md), why-certctl.md docs/reference/ (6): architecture.md, api.md (was openapi.md), mcp.md, intermediate-ca-hierarchy.md, deployment-model.md (was deployment-atomicity.md), vendor-matrix.md (was deployment-vendor-matrix.md) docs/reference/protocols/ (6): acme-server.md, acme-server-threat-model.md, scep-intune.md, est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md) docs/operator/ (4): security.md, tls.md, database-tls.md, approval-workflow.md docs/operator/runbooks/ (3): cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md (was runbook-expiry-alerts.md), disaster-recovery.md docs/migration/ (3): from-certbot.md (was migrate-from-certbot.md), from-acmesh.md (was migrate-from-acmesh.md), cert-manager-coexistence.md (was certctl-for-cert-manager-users.md) docs/compliance/ (4): index.md (was compliance.md), soc2.md (was compliance-soc2.md), pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was compliance-nist.md) docs/contributor/ (4): testing-strategy.md, test-environment.md (was test-env.md), ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md) Deferred to later Phase 2 sub-phases: - connectors.md split (Phase 4): docs/connectors.md + docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level - testing-guide.md prune (Phase 5): docs/testing-guide.md still at top level - features.md disperse (Phase 6): docs/features.md still at top level - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md still at top level - ACME walkthrough re-homing (Phase 8): three docs/acme--walkthrough.md still at top level - Upgrade docs archive (Phase 3): two docs/upgrade-.md still at top level Cross-reference updates (Phase 11) will happen after all moves and content edits land. Internal links to docs/* paths are temporarily broken until that phase completes.	2026-05-05 02:49:28 +00:00

18 Commits