certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-07-28 12:18:59 +00:00

Author	SHA1	Message	Date
shankar0123	a975ccfca0	docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard Closes acquisition-diligence Bundle 6 findings on secret custody, config encryption, and local artifact hygiene. Source IDs: S6, R4, SEC-M2, RT-M1, RT-M2, RT-L1. Surgical closures (artifact-only audit-framed memos stay out of the public repo per the Bundle 5 lesson): R4 / RT-L1 — local EC private key artifact rm cmd/agent/mc-001.key (gitignored, never in git history, leftover from a 2025-era agent dev run on the operator's workstation). Added scripts/ci-guards/B6-no-private-keys-in-tree.sh that fails the build if any TRACKED non-test file contains a PEM private-key block, so the next attempt to commit similar material gets caught at CI. Allowlist: _test.go (hermetic-test PEMs), examples/.md (sample walkthroughs), internal/scep/intune/testdata/ (certificates, not keys). RT-M1 — landing-page HSM implication certctl.io/index.html: 'their hardware' / 'your hardware' colloquial comparisons rephrased to 'their custody' / 'your servers'. The phrase 'Your keys. Your hardware. Your data. Your terms.' becomes 'Your keys. Your servers. Your data. Your terms.' to remove any inferred HSM-backed key-storage claim. The technical disclosure now lives in docs/operator/secret-custody.md (linked below); the landing page no longer makes a claim it cannot back. S6 + SEC-M2 + RT-M2 (composite documentation closure) Added docs/operator/secret-custody.md — public operator reference enumerating every secret material on the control plane and on agents: - Local CA private key (FileDriver, file-on-disk, heap-resident with the L-014 carve-out documented in internal/connector/issuer/local/local.go). - Agent ECDSA P-256 keys (file on agent host, never transmitted). - OIDC client secret (AES-256-GCM v3, PBKDF2 600k). - Session signing key (same encryption regime). - Break-glass credential (Argon2id, never encrypted). - API-key bearer tokens (SHA-256 hash only; plaintext shown once). - CSR private keys mid-issuance (agent memory only). - Issuer-connector backend secrets (encrypted_config column, fail-closed for source='database', plaintext-by-design for source='env' with rationale). The Env-seeded-vs-DB-seeded plaintext policy is explained in plain text so a buyer review can independently verify the startup guard at cmd/server/main.go:222-262 makes sense. Added docs/operator/runbooks/config-encryption-upgrade.md — the procedural arm: how to force v1/v2 -> v3 re-seal across the database, plus the passphrase-rotation order. Documents the AEAD-driven read fallback (v3 -> v2 -> v1) and the fact that re-sealing happens passively on UPDATE. Open roadmap item: a certctl admin reseal --all command (tracked in WORKSPACE-ROADMAP.md). Both docs wired into docs/README.md Operator + Runbooks tables. Verification: rg -n 'CONFIG_ENCRYPTION\|encrypt\|v1\|private key\|HSM\|PKCS11\|mc-001.key\|\.key\|Local CA' \ internal cmd docs .gitignore README.md # ambient (no NEW leaks) find . -name '.key' \ -not -path './.git/' -not -path './web/node_modules/' # empty git ls-files \| xargs grep -lE 'BEGIN . PRIVATE KEY' \ \| grep -vE '_test\.go$\|^examples/\|^internal/scep/intune/testdata/' # empty bash scripts/ci-guards/B6-no-private-keys-in-tree.sh # PASS bash scripts/ci-guards/G-3-env-docs-drift.sh # PASS bash scripts/ci-guards/doc-rot-detector.sh # PASS Residual roadmap (deliberately deferred): - signer.PKCS11Driver (HSM-token-backed CA-key custody). - signer.CloudKMSDriver (AWS/GCP/Azure KMS-backed CA-key custody). - FIPS 140-3 mode for the whole control plane. - HSM-backed session signing key. - Built-in 'certctl admin reseal --all' command. All five tracked in WORKSPACE-ROADMAP.md, not retracted.	2026-05-13 01:48:40 +00:00
shankar0123	d92a0c98ac	docs: remove audit-bundle-flavored docs from public repo Three docs added in Bundle 4 + Bundle 5 closure commits (`709e1c9`, `36840dd`) were framed around acquisition-diligence audit findings and don't belong in the public-facing operator docs tree: - docs/operator/scheduler-ha.md (Bundle 4 D2 per-loop HA truth table) - docs/operator/rate-limit-scope.md (Bundle 4 D3 scope statement) - docs/operator/security-bundle-5-audit-closure.md (Bundle 5 closure receipt) Audit-bundle artifacts live in the operator's local cowork/ scratchpad, not in docs/. The underlying code closures (advisory-lock migrations, SSRF-guarded notifier transports, break-glass login limiter, MCP gating, etc.) stand — only the audit-framed documentation surface is removed. docs/README.md: drop the two table rows that pointed at the now-deleted scheduler-ha.md + rate-limit-scope.md (added in `709e1c9`, lines 77-78).	2026-05-13 01:35:24 +00:00
shankar0123	f680b35121	ci(guards): fix G-3 (CERTCTL_MCP_READ_ONLY phantom) + S-1 (hardcoded 45) Two CI guards tripped on the B4 + B5 closure commits: 1. G-3 env-docs-drift caught `CERTCTL_MCP_READ_ONLY` mentioned in docs/operator/security-bundle-5-audit-closure.md (Bundle 5 S8 row) without a corresponding entry in internal/config/config.go. The env var is a v3 idea, not a shipped feature — the doc now describes the future gate without naming the literal env var, matching the G-3 phantom-env-var contract. 2. S-1 hardcoded-source-counts caught "all 45 migrations" in docs/operator/scheduler-ha.md (Bundle 4 D8 closure prose). Per the CLAUDE.md operating rule "Numeric claims about current state rot", swapped the literal count for the rebuild command `ls migrations/*.up.sql \| wc -l`. Both fixes are doc-only — no code change, no test change. The underlying Bundle 4 + Bundle 5 closures stand. Verification: bash scripts/ci-guards/G-3-env-docs-drift.sh # clean bash scripts/ci-guards/S-1-hardcoded-source-counts.sh # clean	2026-05-13 01:24:06 +00:00
shankar0123	36840ddd01	fix(security): close BUNDLE 5 — auth, OIDC, MCP, API + browser security edges Bundle 5 closure (2026-05-13 acquisition diligence audit). 13-finding security audit pass across the auth / OIDC / MCP / API / browser- security surface. Five real closures shipped in code, two false-as- stated findings annotated with the existing implementation, three operator-decision items documented for v3 follow-up, three doc-only fixes (auth architecture narrative aligned with shipped OIDC). Source findings closed (code): S1 break-glass /auth/breakglass/login lacked the documented 5/min per-source-IP rate limit; handler now owns its own SlidingWindowLimiter wired at startup. Doc claim turns true. R6 OIDC test_discovery JWKS probe ran on http.DefaultClient; now uses an http.Client whose transport wraps validation.SafeHTTPDialContext. JWKS URI can no longer pivot into reserved-address ranges via DNS rebinding. R7 Slack + Teams notifiers built http.Client without the SSRF dial-time guard. Both New() constructors now install validation.SafeHTTPDialContext; webhook URLs (operator- configured via dynamic-config GUI) cannot dial 169.254.x or in-cluster reserved ranges. Test seam: newForTest bypasses the guard for httptest's 127.0.0.1 binds, mirroring the existing internal/connector/notifier/webhook pattern. RT-L2 CERTCTL_ACME_INSECURE=true now emits a prominent logger.Warn at server boot. Pre-Bundle-5 the knob silently disabled ACME directory TLS verification. Source findings closed (doc): finding 1 + HIGH-5 Architecture doc claimed no in-process JWT/ OIDC/mTLS/SAML and pointed everyone at the authenticating-gateway pattern. Auth Bundle 2 (commit dea5053) shipped native OIDC + sessions + break-glass. New §"In-process authentication surface" table (api-key / oidc / none) supersedes the old framing; "Authenticating-gateway pattern (SAML, mTLS-as-auth, LDAP)" section retained for protocols certctl still doesn't ship natively. Source findings verified false (existing implementation): S4 OIDC email-domain allowlist — `email_domain_test.go` already pins the strict-equality semantics (subdomain not auto-accepted, multi-entry no-match path, empty allowlist accepts all by-design per RFC 9700 §4.1.1). SEC-L1 CSP / HSTS / referrer-policy headers — already shipped at internal/api/middleware/securityheaders.go and wired at cmd/server/main.go L2003+L2027+L2115. Operator-decision / deferred (tracked in bundle-5 closure doc): S3 CERTCTL_API_KEYS_NAMED parsing is wired, end-to-end validation is partial. Operator decides: complete the named-key middleware path or deprecate the syntax. S5 Audit-middleware best-effort for read paths; security-critical writes use WithinTx. Operator decides per-path escalation. S8 MCP threat model — the binary is a thin protocol bridge, no privileges of its own; every tool call carries CERTCTL_API_KEY and is auth'd + RBAC-gated server-side. Optional CERTCTL_MCP_READ_ONLY gate tracked as v3. SEC-H1 2026-05-10 audit CRIT-1/2/4 already closed on master; CRIT-3/5 status against the spec folder is operator- workstation-validation-only. Documented for follow-up. SEC-L2 WebAuthn / FIDO2 / step-up — already documented in docs/operator/auth-threat-model.md "Threats Bundle 2 does NOT close". v3 work item per CLAUDE.md decision 12. Full per-finding rationale + receipts at docs/operator/security-bundle-5-audit-closure.md. Verification: gofmt -l # clean go vet ./internal/connector/notifier/slack ./internal/connector/notifier/teams ./internal/auth/oidc ./internal/api/handler ./cmd/server # clean go build ./cmd/server [...] # clean go test -short -count=1 ./internal/connector/notifier/slack ./internal/connector/notifier/teams ./internal/api/handler ./internal/auth/oidc ./internal/config # PASS # (slack 0.028s + teams # 0.023s + handler 11.0s; # newForTest seam keeps # httptest tests green) Audit-Closes: BUNDLE-5 S1 R6 R7 RT-L2 finding-1 HIGH-5 Audit-Verifies-False: S4 SEC-L1 Audit-Defers: S3 S5 S8 SEC-H1 SEC-L2	2026-05-13 01:18:45 +00:00
shankar0123	709e1c9292	fix(scale): close BUNDLE 4 — migrations, scheduler HA, rate-limits, scale receipts Bundle 4 closure (2026-05-13 acquisition diligence audit). Closes the "what happens under multi-replica" question cluster: migration runner had no concurrency control + no applied-version ledger, 15 scheduler loops had per-process idempotency but no cross-replica documentation, rate limits were process-local without an operator-facing scope statement, load-test scope explicitly omitted four hot paths without linking them to a roadmap. Source findings closed: HIGH-1 + D4 + finding 4 (migration tracking) D8 (scheduler loop ownership) MED-1 + MED-2 (rate-limit scope) T9 + LOW-7 + finding 7 (load-test receipt scope) Closures by source ID: HIGH-1 + D4 + finding 4 — Migration tracking + advisory lock. internal/repository/postgres/db.go::RunMigrations now wraps every migration execution in: 1. A dedicated *sql.Conn pinned to one connection for the entire scan + apply lifecycle (pg_advisory_lock is connection-scoped). 2. pg_advisory_lock(migrationAdvisoryLockID) — fixed int64 key derived from "certctl-migrations" so the same constant resolves across deployments without colliding with operator advisory locks. Blocks the second replica until the first finishes. 3. CREATE TABLE IF NOT EXISTS schema_migrations(version TEXT PK, applied_at TIMESTAMPTZ DEFAULT NOW()) — audit ledger. 4. Skip-applied loop: SELECT version FROM schema_migrations → map[string]struct{} → skip every .up.sql whose filename is in the map. INSERT after successful execute, ON CONFLICT (version) DO NOTHING for defense in depth. Pre-Bundle-4 every server boot re-ran all 45 .up.sql files. The "idempotency via IF NOT EXISTS / ON CONFLICT" contract in CLAUDE.md held per-migration but offered no protection when two Helm replicas raced on schema DDL. Post-Bundle-4 single-replica deploys see zero behavior change beyond the audit-table population; multi-replica deploys get HA-safe schema bootstrap. D8 — Scheduler HA semantics documented. New docs/operator/scheduler-ha.md with per-loop inventory of all 15 loops in internal/scheduler/scheduler.go. Classification: - HA-safe (jobProcessorLoop, jobRetryLoop) — FOR UPDATE SKIP LOCKED via ClaimPendingJobs (Bundle 1 H-6 closure, `6cb4414`). - HA-safe-ish (jobTimeoutLoop) — atomic UPDATE-WHERE-status. - Idempotent under N>1 replicas (renewalCheckLoop, agentHealthCheckLoop, shortLivedExpiryCheckLoop, networkScanLoop, healthCheckLoop, acmeGCLoop, sessionGCLoop) — duplicate ticks produce idempotent side effects. - Side-effect-duplicating under N>1 replicas (notificationProcessLoop, notificationRetryLoop, digestLoop, cloudDiscoveryLoop, crlGenerationLoop) — duplicate webhook/email/AWS-API/CRL-signing operations. Operators running multi-replica accept N× side effects or pin to server.replicas: 1. Leader-election work tracked in WORKSPACE-ROADMAP.md as v3. MED-1 + MED-2 — Rate-limit scope. New docs/operator/rate-limit-scope.md states the contract verbatim: process-local sync.Mutex-guarded sliding-window log, effective cluster-wide cap = configured-per-replica × server.replicas, restart-safe (no persistent state, no shared store), bounded (50k/100k key cap with eviction). Five call sites documented: ocspLimiter (1m/IP), exportLimiter (1h/actor), EST per-principal (24h/CN), EST failed-auth (1h/IP), Intune dispatcher (24h/Subject+Issuer), plus the HTTP middleware token-bucket (RPS+Burst per replica). Cluster-wide shared limits via Redis or Postgres-backed bucket are tracked in WORKSPACE-ROADMAP.md as v3. T9 + LOW-7 + finding 7 — Load-test receipt scope. The existing harness at deploy/test/loadtest/ already self-documents the gap ("What it explicitly does NOT measure"). No code change needed for this finding; Bundle 4 cross-references scheduler-ha.md and rate-limit-scope.md from those gap callouts so the four deferred coverage classes (issuer connector, scheduler throughput, agent fleet, DB p99) land in the same place an acquirer reads about HA semantics and rate limits. Tests: internal/repository/postgres/migrations_test.go (new, 4 tests): - TestRunMigrations_PopulatesSchemaMigrations: audit table exists and is non-empty after the first migration run. - TestRunMigrations_SkipsAppliedOnSecondCall: second call is observable no-op on row count. - TestRunMigrations_ConcurrentCallsSerialized: two goroutines racing the migrator both return without error; row count unchanged; no duplicate versions. - TestRunMigrations_FreshDatabaseHappyPath: ≥ 30 migrations land on a fresh schema. Gated by testcontainers via the existing repo_test.go getTestDB pattern; skipped under -short. The integration lane runs them. Verification: gofmt -l # clean go vet ./internal/repository/postgres ./cmd/server # clean go build ./cmd/server ./internal/repository/postgres # clean go test -short -count=1 ./internal/repository/postgres ./internal/ratelimit # PASS Operator follow-up: full integration run on workstation: go test -count=1 ./internal/repository/postgres -run TestRunMigrations_ Receipts (paths for the audit packet): Migration runner evidence: internal/repository/postgres/db.go L135-340 (advisory-lock + ledger + skip-applied loop) + internal/repository/postgres/migrations_test.go (4 tests). Scheduler loop inventory: docs/operator/scheduler-ha.md (15-loop table with HA classification per loop). Rate-limit storage matrix: docs/operator/rate-limit-scope.md. Load-test baseline: deploy/test/loadtest/README.md (already self-documenting), cross-linked from scheduler-ha.md. Remaining operator warnings (deferred, tracked in WORKSPACE-ROADMAP.md): - Leader election for the four duplicate-side-effect loops (notificationProcessLoop, notificationRetryLoop, digestLoop, cloudDiscoveryLoop, crlGenerationLoop). v3 work item. - Shared rate-limits across replicas (Redis / Postgres token bucket). v3 work item. - Issuer-connector + scheduler-throughput + agent-fleet + DB-p99 load-test coverage. Tracked separately; per-issuer Prometheus histograms already capture issuer round-trip latency in production runs. Audit-Closes: BUNDLE-4 HIGH-1 D4 D8 MED-1 MED-2 T9 LOW-7 finding-4 finding-7	2026-05-13 01:00:39 +00:00
shankar0123	30034085e6	docs: v2.1.0 release polish — strip internal bundle/phase tags, update status for OIDC ship README: - Rewrite Status block: drop the stale 'federated identity not yet shipped' line; flag v2.1.0 OIDC + sessions + back-channel logout + break-glass as early-access; encourage GitHub issues for IdP rough edges. (A1 framing — keep early-access umbrella, no SAML/WebAuthn/JIT roadmap teaser.) - Add OIDC SSO bullet to 'What it does' covering per-IdP runbooks, group-claim → role mapping, AES-256-GCM client_secret encryption, JWKS auto-refresh, PKCE-S256, RFC 9700 §4.7.1 pre-login binding, RFC 9207 iss check, __Host- cookies, CSRF rotation, idle+absolute expiry, BCL, break-glass admin. - Update Security paragraph: three auth paths (API keys / OIDC / break-glass), HMAC-signed sessions, CSRF rotation, RFC OIDC BCL. - Correct CI coverage thresholds against .github/coverage-thresholds.yml (service 70%, handler 75%, crypto 88%, auth packages 85-95%); 'static analysis' replaces the inflated '11 linters' claim (actual count is 4 active). Docs B3 sweep — strip operator-facing 'Bundle N' / 'Phase N' tags: - docs/operator/auth-threat-model.md — rewrite intro; rename 5 H2 sections (API-key + RBAC defenses / OIDC + sessions + break-glass defenses / OIDC + sessions threat catalogue / Closed federated- identity threats / Future-work threats); clean ~12 H3/prose hits. - docs/operator/rbac.md — strip Bundle 1 framing from intro, scope_id deferral note, MCP tools section, day-0 bootstrap, and 'Where to look next'. - docs/operator/auth-benchmarks.md — drop 'Phase 14' framing from title intro, hardware floor caption, result table caption, methodology, and pre-merge audit section. - docs/operator/security.md — already cleaned earlier this session (RBAC / day-0 / approval-bypass / OIDC federation / sessions / OIDC first-admin / break-glass H3s). - docs/operator/oidc-runbooks/{index,keycloak,authentik,okta, azure-ad}.md — strip Auth Bundle 2 framing + Phase 10/3/4 references; replace with feature-name prose. - docs/operator/legacy-clients-tls-1.2.md — drop Bundle F / M-023 audit-reference framing; keep CWE-326. - docs/operator/database-tls.md — drop Bundle B / M-018 framing from intro + Helm section. - docs/operator/runbooks/disaster-recovery.md — drop 'Production hardening II Phase 10' status callout. - docs/migration/oidc-enable.md — retitle 'Enable OIDC SSO'; strip Bundle 1/2 framing from prereqs, troubleshooting, related docs; update __Host- cookie callout from 'audit MED-14' to v2.1.0-BREAKING. - docs/migration/api-keys-to-rbac.md — strip Bundle 1 framing from intro, migration table, IsAdmin section, and cross-references. - docs/migration/acme-from-cert-manager.md — strip residual 'Phase 5' tags from cert-manager integration test references. - docs/reference/configuration.md — retitle Auth section. - docs/reference/profiles.md — strip Bundle 1 Phase 9 framing from RequiresApproval section + Related list. - docs/reference/auth-standards-implemented.md — rewrite intro (API-key + RBAC + OIDC + sessions + back-channel logout + break-glass); rename 'Bundle 1 (RBAC) standards covered separately' H2; clean per-row Phase references. - docs/README.md — rewrite nav-table entries to drop Bundle 1/2 parentheticals; retitle 'Enable OIDC SSO' migration entry. No code or test changes; pure operator-facing prose polish for the v2.1.0 tag.	2026-05-11 16:54:07 +00:00
shankar0123	92c50d9e19	harden(oidc): relax alg-downgrade IdP-bind check to intersection-empty (Keycloak compat) Phase-10 live-IdP smoke (Keycloak 26.x via testcontainers-go) revealed the IdP-bind alg-downgrade check was too strict for real-world IdPs. 6 of the integration tests in internal/auth/oidc/integration_keycloak_test.go were failing with: oidc: IdP advertises weak signing algorithms (HS/none); refusing to use as defense against downgrade attacks: HS256 Keycloak 26.x (and several other real-world IdPs — Auth0 when HS-mode is enabled, some Authentik configs) advertise EVERY alg they're capable of in the discovery doc's id_token_signing_alg_values_supported field, even when the realm only signs with RS256 in practice. Pre-fix the IdP-bind check refused on ANY HS* or 'none' advertisement → no real Keycloak deploy could ever bind a provider row, hence the integration-test failures. The strict-deny check was defense-in-depth on top of the load-bearing per-token alg-pin at sig-verify time (isDisallowedAlg, service.go L1177): that check rejects every ID token whose JWS header carries an alg outside DefaultAllowedAlgs, regardless of what the discovery doc advertises. A forged HS256 token signed with the IdP's RS256 pubkey as HMAC secret is rejected at sig-verify time → the actual algorithm-confusion attack is closed by the per-token pin, NOT by the discovery-doc check. Fix: relax the IdP-bind check to refuse only when the intersection of advertised vs DefaultAllowedAlgs is EMPTY (the pathological all-weak-alg IdP case). Keycloak (RS256 + HS256 advertised) now binds successfully; an HS-only IdP still fails closed. Changes: - internal/auth/oidc/service.go: rewrite the alg-check loop at L1067 in getOrLoad / RefreshKeys to compute the intersection set; refuse only when no acceptable alg is advertised. ErrIdPDowngradeAdvertised docstring updated to reflect new contract. DefaultAllowedAlgs docstring + the package-level design-comment block at L40-72 updated with v2.1.0-relaxed semantics callouts. - internal/auth/oidc/test_discovery.go: TestDiscovery dry-run validator rewritten to surface HS/none alongside RS as an informational note ('note: IdP advertises weak algorithms %v alongside acceptable ones') rather than a hard-fail error. HS-only / none-only still hard-fails. - internal/auth/oidc/service_test.go: TestService_IdPDowngradeDefense_* tests updated. Renamed: - RejectsHSAdvertised → RS256PlusHS256_BindsSuccessfully (positive) - RejectsNoneAdvertised → RejectsHSOnlyAdvertised (intersection-empty) - RefreshKeys_CatchesPostLoadDowngrade rotated to HS-only post-load - internal/auth/oidc/coverage_fill_test.go: TestTestDiscovery_AlgDowngradeDetected split into _HS256AlongsideRS256_BindsWithNote (positive, asserts note but no hard-fail) + _HSOnly_StillTrips_HardFail (intersection-empty). - docs/operator/auth-threat-model.md: OIDC token-validation alg-allow-list section rewritten to call out the load-bearing-defense hierarchy (per-token pin first, IdP-bind check defense-in-depth) and document the v2.1.0 relaxation rationale. - CHANGELOG.md: ### Security entry under Unreleased. Verify: go test ./internal/auth/oidc/ -short PASS; gofmt clean; go vet clean. The Keycloak integration tests should now pass when the operator re-runs 'make keycloak-integration-test'.	2026-05-11 15:34:59 +00:00
shankar0123	ff3f1cd864	harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8) Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure (`b81588e`) — production-startup observability for actor-demo-anon residual grants + CI guard banning new synthetic- admin code paths. What this changes: * cmd/server/preflight_demo_residual.go (new) runs after the DB pool + audit service are constructed and before the HTTPS listener starts. Under any non-'none' auth type it queries actor_roles for the synthetic actor-demo-anon and emits a WARN log + a categorized audit row (auth.demo_residual_grants_detected) listing every grant present. Migration 000029 unconditionally seeds the ar-demo-anon-admin row at install time, so EVERY production deploy will see this WARN on first boot; the intended cutover workflow is cleanup-once at production handover. * CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig, default false) pivots the WARN to fail-closed startup refusal for operators who want a paranoid posture against re-seeding. * POST /api/v1/auth/demo-residual/cleanup (new handler at internal/api/handler/demo_residual.go) is an admin-class (auth.role.assign) endpoint that removes every actor-demo-anon row from actor_roles and returns {removed: int64}. Idempotent; refuses 503 under Auth.Type=none (deleting the row would break the demo path); audit-logs every invocation including no-op zero-removed calls so the admin's action is always recorded. * scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry allowlist of source files that legitimately reference the actor-demo-anon literal. New runtime code paths that resolve to the synthetic actor (the same pattern that produced the original CRIT class) are rejected at PR time. CI workflow auto-picks the script via the existing scripts/ci-guards/.sh loop in .github/workflows/ ci.yml; no workflow edit needed. Regression matrix: cmd/server/preflight_demo_residual_test.go — 7 tests covering the 4 main behaviour branches (testcontainers-backed, testing.Short()- skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent) plus 3 pure-Go stdlib unit tests for the row-string formatter + nil-safety contracts on both helpers. * internal/api/handler/demo_residual_test.go — 7 stdlib+httptest cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503), CleanupError_Surfaces500, NilCleanupFn (defensive 500), NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to 'unknown' actor in the audit row). * internal/api/router/openapi_parity_test.go — new POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config) that had drifted out of SpecParityExceptions; the parity test was red on dev/auth-bundle-2 before my work; this commit returns it to green with full per-entry justifications + parity-debt notes. Docs: * docs/operator/security.md — new 'Demo-to-production cutover (Audit 2026-05-11 A-8)' section explaining the WARN message, the cleanup curl one-liner, the equivalent SQL, the strict-mode env var, and the CI guard. * docs/operator/rbac.md — Last-reviewed bump + pointer to the new env var + the security.md section. * cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an 'A-8 follow-on CLOSED 2026-05-11' annotation describing the deferred Phase 2 leg now landed. * CHANGELOG.md — Unreleased ### Security entry summarizing the four legs (detector + cleanup + strict-mode flag + CI guard) and the acquisition-readiness narrative this closes. Operator-facing impact: this closes a credibility gap, not an exploitable vulnerability. The residue requires a regression elsewhere in the middleware chain to be exploitable. After this fix, the canonical narrative ('RBAC primitive with no synthetic- admin fallback') is fully true. Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual- cleanup.md.	2026-05-11 11:45:54 +00:00
shankar0123	ddad647ee7	fix(auth/rbac): scope-aware ActorRole revoke (A-4) HIGH-10's UNIQUE (actor, role, scope_type, scope_id, tenant) uniqueness extension lets an operator grant the same role to the same actor at multiple scopes (e.g. r-operator on profile=p-acme AND profile=p-globex). But ActorRoleRepository.Revoke's WHERE clause omitted (scope_type, scope_id) — a single call deleted every variant. Selective revoke was unrepresentable; operators had to drop all and re-grant N-1, opening a race window where the actor's access was briefly different. Closure across all layers (handler → service → repo → MCP → GUI client), preserving the legacy "revoke all variants" contract for unmodified callers: internal/repository/auth.go - New ActorRoleRevokeOptions struct. Zero value = legacy semantic; non-empty ScopeType narrows to one variant. - New ErrActorRoleNotFound sentinel for scoped no-match (HTTP 404). internal/repository/postgres/auth.go - Revoke signature extended with opts. Empty opts.ScopeType uses the legacy SQL (no scope WHERE), zero-row delete = no error. - Non-empty narrows with `scope_type = $5 AND scope_id IS NOT DISTINCT FROM $6` — the IS-NOT-DISTINCT-FROM is load-bearing, vanilla `=` would silently miss the (global, NULL) case because NULL ≠ NULL in standard SQL. - Selective revoke with zero matching rows returns ErrActorRoleNotFound; operators get feedback on typos. internal/service/auth/actor_role_service.go - Revoke takes opts. Audit row's details map records the scope so SIEMs can distinguish wide-vs-selective revokes: `scope: "all_variants"` for the legacy path, or `scope_type` + `scope_id` for selective. Privilege check (auth.role.assign) and reserved-actor guard unchanged. internal/api/handler/auth.go - RevokeRoleFromKey parses optional `?scope_type=` / `?scope_id=` query params via new parseRevokeScope helper. - Validation mirrors AssignRoleToKey: scope_id forbidden with scope_type=global, required with profile/issuer, invalid scope_type → 400. scope_id without scope_type also → 400. - writeAuthError maps ErrActorRoleNotFound to 404. internal/mcp/tools_auth.go + types.go - AuthRevokeKeyRoleInput gains optional ScopeType + ScopeID with jsonschema descriptions explaining the dual-mode contract. - Tool call site appends URL-encoded query params when ScopeType is set; legacy callers (no scope_type) emit the bare DELETE path unchanged. web/src/api/client.ts - authRevokeKeyRole signature: optional 3rd argument `{ scope_type?, scope_id? }`. Pre-A-4 call sites (no opts arg) keep firing the bare DELETE — fully backward compatible. The GUI KeysPage's per-row revoke button (still one row per role, pre-Fix-12) continues to use the legacy shape; future GUI work can pass scope params for per-variant rows. docs/operator/rbac.md - New "Revoke: legacy 'all variants' vs scope-selective" subsection under "From the HTTP API" with curl examples for both modes plus the audit-row payload shape that lets SOC/SIEM tell them apart. Regression coverage: Repository (testcontainers, skipped under -short — 6 tests in internal/repository/postgres/auth_revoke_scope_test.go): TestRevokeActorRole_NoOpts_RemovesAllVariants TestRevokeActorRole_WithScope_RemovesOnlyMatching TestRevokeActorRole_WithGlobalScope_RemovesOnlyGlobal — pins the IS-NOT-DISTINCT-FROM branch (global, NULL) TestRevokeActorRole_NoMatch_ReturnsNotFound — pins the new sentinel TestRevokeActorRole_NoOpts_NoMatch_IsNoOp — pins the legacy idempotence contract TestRevokeActorRole_IssuerScope_RemovesOnlyMatching — pin the issuer-scope half (profile + issuer are symmetric scope types) Handler (7 new tests in auth_test.go): TestAuthHandler_RevokeRoleFromKey — extended to assert no scope filter is forwarded when query string is empty (legacy behaviour) TestAuthHandler_RevokeRoleFromKey_A4_ScopedProfile TestAuthHandler_RevokeRoleFromKey_A4_ScopedGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsMissingScopeID TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithoutScopeType TestAuthHandler_RevokeRoleFromKey_A4_RejectsInvalidScopeType TestAuthHandler_RevokeRoleFromKey_A4_ScopedNotFoundReturns404 MCP (2 new table rows in tools_per_tool_test.go): Scoped revoke with scope_type=profile + scope_id=p-acme → `?scope_type=profile&scope_id=p-acme` Scoped revoke with scope_type=global (no scope_id) → `?scope_type=global` Service-layer test plumbing (service_test.go) updated for new opts arg: 4 existing call sites pass repository.ActorRoleRevokeOptions{} to keep their pre-A-4 semantics; the fakeActorRoleRepo.Revoke implementation now mirrors the postgres scope-aware behaviour (legacy zero-value vs scoped narrowing + ErrActorRoleNotFound on no-match). Verify gate green: gofmt clean, go vet clean, go test -short across repository/postgres, service/auth, api/handler, and mcp. The pre-existing KeysPage.test.tsx failure observed on the baseline commit (reproduced via `git stash` earlier in Fix 03) is unrelated; my client.ts change adds an optional third argument and is fully backward-compatible. Spec at cowork/auth-bundles-fixes-2026-05-11/04-high-actor-role-revoke-scope.md. Audit doc updated: new row A-4 (2026-05-11) CLOSED appended to the status table at the bottom of cowork/auth-bundles-audit-2026-05-10.md. Operator-visible advisory in CHANGELOG.md v2.1.0 release notes under Security (non-BREAKING — legacy callers are unchanged). Depends on Fix 01 (the scope-aware EffectivePermissions read path on branch fix/audit-2026-05-11/crit-actor-role-scope-reads). This fix makes the inverse op selectively reversible; without Fix 01 the read side would mis-evaluate scoped grants anyway, making selective revoke moot at runtime.	2026-05-11 10:50:34 +00:00
shankar0123	77860fbcc3	harden(auth): LOW + Nit batch — bootstrap audit, crypto/rand, XFF trust, CSRF check, protocol-prefix unify (Batch 1) Audit 2026-05-10 — close 8 LOWs + 2 Nits in-bundle. Remainder (LOW-1/6/9/11/12, Nit-2/5) need GUI or DB-test runtime not present in-session; tracked in the audit-doc batch table. LOW-2: bootstrap.ValidateAndMint now emits 'bootstrap.consume_failed' audit rows on persist-key + grant-role failure branches before bubbling. Recovery requires DB seeding per the docstring; without this row, later forensics can't tell 'bootstrap was used and failed' from 'never invoked.' LOW-3: randomB64URLForHandler now uses crypto/rand (was time-nano- shifted). Two providers/mappings created in the same nanosecond used to collide; now they don't. Time-nano fallback retained for the unlikely crypto/rand-broken path. LOW-4: breakglass.verifyDummy uses s.readRand(salt) for the dummy Argon2id verify. Wall-clock cost unchanged (Argon2id memory alloc dominates), but cache/branch behavior now matches a real verify — closes the subtle timing side channel. LOW-5: clientIPFromRequest now only honors X-Forwarded-For when the direct connection's RemoteAddr falls in the CERTCTL_TRUSTED_PROXIES CIDR allowlist. Default-deny: empty list means XFF is ignored. SetTrustedProxies wired in cmd/server/main.go from cfg.Auth.TrustedProxies. LOW-7: internal/auth/protocol_endpoints.go::ProtocolEndpointPrefixes now carries /scep-mtls + /.well-known/est-mtls (previously only in router.AuthExemptDispatchPrefixes; the two lists had drifted). The canonical-prefix coverage test in Phase 12 still pins the set. LOW-8: docs/operator/rbac.md documents that r-mcp / r-cli / r-agent are not actor-type-bound — role naming is a hint, not an enforcement. Operators wanting hard binding must apply periodic audit queries. Native binding is on the v2 roadmap. LOW-10: Session.Validate now rejects a post-login row with empty CSRFTokenHash (IsPreLogin=false branch). validSession test fixture updated with a valid 64-hex CSRF hash. Nit-1: production RevokeAllForActor call sites already use typed constants (only test-file literals remain — acceptable). Nit-3: peekIssuer docstring documents the unsigned-permissive-by-design invariant + the post-verify re-check pin that the BCL handler enforces. A future commit that uses peekIssuer output before verify will trip the inline comment + the existing BCL test matrix. Status table updated in cowork/auth-bundles-audit-2026-05-10.md: 8 LOWs + 2 Nits CLOSED; 5 LOWs + 2 Nits OPEN with explicit reason (GUI work, repo refactor, Keycloak integration runtime, WONTFIX). Refs: cowork/auth-bundles-audit-2026-05-10.md LOW-2/3/4/5/7/8/10 cowork/auth-bundles-audit-2026-05-10.md Nit-1/3	2026-05-10 22:26:12 +00:00
shankar0123	457962f21a	fix(auth): apply rbacGate to every state-changing + read handler (CRIT-1 closure) Closes the wire-layer authorization gap surfaced by the 2026-05-10 audit (CRIT-1). Before this commit only ~24 of ~140 routes carried rbacGate enforcement — all of them admin-only fine-grained perms (auth.session., auth.oidc., auth.breakglass.admin, cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage). Every catalogued legacy-CRUD perm (cert.read/issue/revoke/delete, profile.edit/delete, issuer.edit/delete, target., agent., plus role-mgmt verbs) was declared in internal/domain/auth/validate.go but never wired at the router. A r-viewer Bearer was essentially r-admin minus five verbs at the wire layer (CWE-862). This commit: - Adds rbacGateScoped(checker, perm, scopeType, scopeFn, h) helper to internal/api/router/router.go for path-bound scope resolution. Per-profile and per-issuer grants (Decision 2) now reach the wire layer. - Wraps every state-changing route AND every read endpoint in router.go with rbacGate (global) or rbacGateScoped (path-bound). The auth-management routes (POST /api/v1/auth/roles, etc.) gain router-level enforcement in addition to the existing service-layer Authorizer check — defense in depth (HIGH-9 of the same audit collapses into this closure). - Auth-exempt surfaces stay un-gated by design: login, callback, BCL, logout, breakglass-login, bootstrap, health, auth-info, version. Allowlist is documented in TestRouterRBACGateCoverage. - Extends internal/domain/auth/validate.go CanonicalPermissions with 30 new perms across 12 namespaces: cert.edit; job.read, job.cancel; approval.read, approval.approve, approval.reject; policy.read/edit/delete; team.read/edit/delete; owner.read/edit/delete; notification.read/edit; discovery.read/run/claim; network_scan.read/edit/run; healthcheck.read/edit/delete/acknowledge; digest.read, digest.send; verification.read, verification.run; stats.read; metrics.read. - Updates DefaultRoles for r-admin / r-operator / r-viewer / r-mcp / r-cli / r-agent. r-auditor gets NOTHING new — the auditor pin (TestAuditorRoleHoldsExactlyAuditReadAndExport) stays invariant. - Migration 000039_audit_crit1_perms seeds the new perm rows + role grants per the updated DefaultRoles map. Idempotent ON CONFLICT DO NOTHING. Reverse migration removes role_permissions before permissions (ON DELETE RESTRICT on the FK). - AST-level CI guard TestRouterRBACGateCoverage in internal/api/router/router_rbac_coverage_test.go walks router.go and asserts every state-changing + read route is wrapped (or in the documented allowlist). Adding a new ungated route fails CI. - Updates docs/operator/rbac.md permission-catalogue table with the new namespaces + footer link to the AST CI guard. - Updates certctl/CHANGELOG.md v2.1.0 section with the closure narrative. Audit doc cowork/auth-bundles-audit-2026-05-10.md CRIT-1 row annotated CLOSED 2026-05-10. Bundle's exit-gate spec lives at cowork/auth-bundles-fixes-2026-05-10/01-crit-1-rbac-gates.md. CRIT-2 / CRIT-3 / CRIT-4 / CRIT-5 of the same audit remain open and continue to block the v2.1.0 tag. Verification gate green: - gofmt -d (no diff after gofmt -w on the touched files) - go vet ./... - go test -short -count=1 ./... (all packages pass including auditor pin) - go build ./... HIGH-9 of the audit closes via this commit's router-layer rbacGate on POST /api/v1/auth/keys/{id}/roles + DELETE /api/v1/auth/keys/{id}/roles/{role_id} (defense-in-depth on top of the existing service-layer privilege check). Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-1 HIGH-9	2026-05-10 19:58:26 +00:00
shankar0123	a581e2d222	auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes) Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator- facing docs updated, one new migration guide ships, README nav row added. Files ===== docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10): * Added 5 new Bundle 2 subsections under '## Authentication surface' after the Bundle 1 approval-bypass-closure entry: - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list, IdP-downgrade defense, iss/aud/azp/at_hash, single-use state+nonce, PKCE-S256 mandatory, JWKS rotation handling, encrypted client_secret at rest with the v3 blob format pinned by an integration test, pointer to oidc-runbooks/ for per-IdP setup. - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' — length-prefixed HMAC cookie wire format, HttpOnly + Secure + SameSite cookie hardening, idle/absolute timeouts, CSRF defense, signing-key rotation primitive, fail-fatal EnsureInitialSigningKey at server boot, OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414). - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists with Bundle 1's env-var-token bootstrap, group-scoped via CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID, one-shot per tenant. - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF, surface invisibility via 404-not-403, Argon2id with OWASP 2024 params, lockout state machine, constant-time-via- verifyDummy, WARN log at boot, runbook pointer for operator drill. - 'Migrating an existing deployment to OIDC' — pointer to the new migration/oidc-enable.md walkthrough. docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10): * Step-by-step migration guide for an operator on a Bundle-1-merged deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY, admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant) + 7 numbered steps (pin encryption key, complete IdP-side per runbook, configure certctl-side OIDCProvider, add group→role mappings with fail-closed warning, optional first-admin bootstrap, verify with single test user, announce SSO endpoint). * Rollback section covering the 4-step disable flow + the 409 Conflict on provider-delete-while-sessions-exist + the existing-sessions-keep-working-until-expiry semantics. * Troubleshooting section pinning 8 most-common failure modes (discovery doc fetch fails / IdP downgrade defense rejects / no roles assigned / iss mismatch / pre-login expired / state mismatch / sessions revoked but user can hit API / JWKS rotation breaks login). * Database row count drift documented so operators know what to expect after OIDC is live (10 Bundle 2 tables enumerated). * Cross-references to oidc-runbooks/ + security.md + auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md. CHANGELOG.md (MODIFIED): * v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive' to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'. * Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after Bundle 1 lands on master') with 18 new Bundle 2 entries: - OIDC + sessions + back-channel logout + break-glass overview. - OIDC token validation pinned at three layers (alg allow-list, IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification). - Length-prefixed HMAC session cookies. - CSRF double-submit + hashed-token-on-row. - OIDC client_secret AES-256-GCM v3 blob at rest + integration-test invariant. - OIDC first-admin bootstrap. - Default-OFF break-glass admin (Argon2id + lockout + constant-time + surface invisibility). - GUI: 4 new pages + login-page IdP buttons + sidebar logout. - 11 new MCP tools for OIDC + session management. - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace). - Threat model extended with 5 new defense subsections + 8 new threat-catalogue subsections. - Performance baselines documented (4 benchmarks; 3 measured + 1 operator-runs). - Standards-and-RFC implementation table (13 RFCs + 14 CWEs; NOT a compliance-mapping doc). - Coverage gates held at floor 90 across all 4 Bundle 2 packages (anti-Bundle-1-mistake invariant). - Multi-tenant query CI guard (ratchet baseline 32). - Phase 10 Keycloak testcontainers integration test + optional Okta smoke test. - OpenAPI cookieAuth security scheme + 13 new endpoints + 4 break-glass endpoints. - Bundle-1-only compat regression CI guard + Bundle-1-to-2-upgrade regression CI guard. * Final paragraph updated to point at oidc-enable.md alongside api-keys-to-rbac.md as the two migration walkthroughs. docs/README.md (MODIFIED): * Added the new oidc-enable.md migration row under '## Migration' alongside the existing api-keys-to-rbac.md entry, with a one-line description flagging it as the Bundle 2 OIDC onboarding walkthrough. Verification ============ * Last-reviewed on security.md + oidc-enable.md: 2026-05-10. * Internal-link sweep on oidc-enable.md: 0 broken (every relative link resolves via shell-loop verification). * Internal-link sweep on docs/README.md: 0 broken (all .md references resolve). * No Go-side impact, make verify gate unchanged. Bundle 2 documentation deliverables now complete: security.md + auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md + auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md + CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator- discoverable from docs/README.md root nav.	2026-05-10 17:07:27 +00:00
shankar0123	263dee4264	auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four benchmarks producing four numbers + the operator-doc table; three default-tag benchmarks runnable on every CI runner, the fourth (cold-cache OIDC) runnable on operator-side Docker hosts via the new make target. Files ===== internal/auth/session/bench_test.go (NEW): * BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs). Warm in-memory repo + warm session row. Pure CPU: parseCookie + HMAC verify + map lookup + sentinel checks. * BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms). Same pipeline but with a configurable per-call delay simulating a 1ms Postgres RTT on each repo call. Two repo calls per Validate (signing-key fetch + session-row fetch) = 2ms minimum; Go time.Sleep granularity adds ~1-2ms jitter. Documented why testcontainers Postgres isn't viable inside b.N: 30+ second container boot incompatible with per-iteration timing. * slowSessionRepo + slowKeyRepo wrappers add the per-call delay via time.Sleep; they delegate to the existing in-memory stubs. * reportPercentiles helper sorts + reports p50/p95/p99/max via b.ReportMetric (Go testing.B doesn't surface percentiles natively). internal/auth/oidc/bench_test.go (NEW): * BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms). Drives full HandleCallback against an in-process mockIdP (httptest.Server localhost loopback). Pre-warmed JWKS cache via RefreshKeys at setup. Pipeline: pre-login consume + state compare + token exchange (localhost ~50-200µs) + go-oidc Verify (RSA-2048 sig verify + alg pin) + service-layer iss/ aud/azp/at_hash/exp/iat/nonce re-checks + group-claim resolution + group→role mapping + user upsert + session mint. * The localhost-loopback /token call adds ~100-500µs of TCP overhead vs pure crypto; the prompt's "no network calls" steady-state framing accommodates this since the localhost loopback is the closest practical proxy for a same-region IdP /token call (which adds 5-15ms in production). internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration): * BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs). Drives RefreshKeys against a live Keycloak container from the Phase 10 testfixtures harness. Each iteration evicts the in-process cache + re-fetches discovery + re-fetches JWKS over real HTTP + re-runs the IdP-downgrade-attack defense. * Network-bounded: the cold path is dominated by HTTPS RTT to the IdP discovery endpoint, NOT crypto. The 200ms cap accommodates a geographically-distant IdP (~150ms RTT) plus the in-process JWKS fetch + downgrade-defense logic (~5ms locally). * Reuses the sharedKeycloak fixture from integration_keycloak_test.go (Phase 10) so the benchmark doesn't pay the 60-90s container boot cost separately. Skips with a clear message if invoked without the integration test setup. * Reports p50/p95/p99/max in MILLISECONDS (vs the microsecond-granularity steady-state benchmarks) since the cold path is two orders of magnitude slower. internal/auth/oidc/service_test.go (MODIFIED): * Refactored newMockIdP(t testing.T) to delegate to a new newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern for sharing test fixtures between testing.T and testing.B. No behavior change for existing service_test.go tests; the benchmark file in bench_test.go calls newMockIdPWithTB(b) to get the same fixture. docs/operator/auth-benchmarks.md (NEW): Result table with all four benchmarks + targets + measured numbers + status markers. Four-row matrix for the default-tag benchmarks; the fourth row (cold-cache) is operator-recorded with an empty cell waiting for the first Docker-equipped run. * Hardware floor section pinning the 4 vCPU / 8 GiB RAM / Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners satisfy this; operators on weaker hardware re-record. * "What each benchmark covers (and what it doesn't)" section per benchmark, distinguishing the warm steady-state pipeline from the cold path's network-bounded budget. * "Cold-cache OIDC: how to run" subsection documenting the make target + the test+benchmark coupling needed to populate sharedKeycloak. Operator-recorded baseline table seeded empty for first runs. * "Why the cold path is bounded by network latency, not crypto" section explaining the budget breakdown: - TCP handshake (1 RTT) - TLS 1.3 handshake (1-2 RTTs) - 2 HTTPS GETs (discovery + JWKS, 1 RTT each) - In-process crypto on the certctl side (~5-10ms total) So the 200ms cap is operator-checkable: real measurement > 200ms means the IdP is slow OR network congestion OR DNS issues — the diagnosis is upstream of certctl. Real measurement < 200ms means the IdP is on a fast same-region link. * Methodology section pinning the per-iteration timing capture + sort + percentile-extract approach. * Pre-merge audit section for the Phase 14 exit gate: four benchmarks ran, four numbers recorded, steady-state targets met, cold path is operator-runnable + measurably-bounded. Makefile (MODIFIED): * Added `make benchmark-auth` (default-tag, runs three of four benchmarks at 2000 samples each). * Added `make benchmark-auth-coldcache` (integration-tagged, runs OIDC cold-cache against live Keycloak; requires Docker). * Both targets carry explanatory comment blocks. docs/README.md (MODIFIED): * Added the auth-benchmarks.md doc to the Operator nav table alongside performance-baselines.md. Measured baselines at Phase 14 close (linux/arm64, 4 vCPU) ========================================================== BenchmarkSession_SteadyState p99 = 5µs (target < 1ms) ✓ 200× under BenchmarkSession_ColdProcess p99 = 7.1ms (target < 10ms) ✓ BenchmarkOIDC_SteadyState p99 = 1.5ms (target < 5ms) ✓ 3× under BenchmarkOIDC_ColdCache operator-runs (Docker required) Verification ============ * gofmt -l on three new bench files: clean. * go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean (default tag). * go vet -tags integration ./internal/auth/oidc/...: clean (integration tag covers the bench_keycloak_test.go file). * go test -short -count=1 across all 5 OIDC + session packages: green; the bench__test.go files compile but don't run under -short (testing.Short() guards + benchmarks are not selected by -run pattern). All three runnable benchmarks executed and produce the numbers above; recorded in auth-benchmarks.md.	2026-05-10 16:51:28 +00:00
shankar0123	944ce8e710	auth-bundle-2 Phase 12: extend auth-threat-model.md with Bundle 2 sections (OIDC + sessions + back-channel logout + OIDC first-admin + break-glass + 8 Bundle 2 threat sub-sections) Closes Phase 12 of cowork/auth-bundle-2-prompt.md. The single canonical operator-facing threat model (one doc per topic per the docs convention) now covers both Bundle 1 (RBAC) AND Bundle 2 (OIDC + sessions + back-channel logout + OIDC first-admin + break-glass) in one place. File: docs/operator/auth-threat-model.md (MODIFIED, +485 LOC) Conventions held ================ * The Bundle 1 sections ("Threat actors", "Defenses Bundle 1 ships", "Threats Bundle 1 does NOT close", "Compliance mapping", "Operator-facing checks", "Cross-references") stay structurally intact. Bundle 2 EXTENDS them; nothing is rewritten in place. * `Last reviewed:` header bumped 2026-05-09 → 2026-05-10. * Per the prompt's explicit instruction: "do NOT create a separate auth-threat-model-bundle-2.md companion." This commit is a single-file extension. Changes ======= Intro paragraph rewritten: * From "Bundle 1 lands... Bundle 2 will be updated" to "Bundle 1 AND Bundle 2 land." Sets the reader's expectation that this is the post-Bundle-2 doc. Threat actors section (4 new actors appended): * OIDC-federated end user (token-forgery / session-hijacking / group-claim-manipulation surface). * Stolen session cookie holder (XSS / network MITM / pasted-token). * Compromised IdP (rogue token issuance; mitigations bounded to audit trail + group-mapping configuration). * Break-glass-password holder (Phase 7.5 path bypasses OIDC + group layer entirely; default-OFF is the load-bearing mitigation). NEW: Defenses Bundle 2 ships (5 sub-sections): * OIDC token validation (Phase 3) — alg allow-list, IdP-downgrade defense, exact iss match, aud + azp checks, at_hash REQUIRED-when-access_token-present (Phase 3 tightening of OIDC core's MAY → MUST), single-use state + nonce, PKCE-S256 mandatory, iat window, JWKS rotation handling, JWKS-fetch-fail closed, encrypted client_secret at rest. * Session minting + cookies (Phases 4 + 6) — length-prefixed HMAC defeating concatenation collision, HttpOnly + Secure + SameSite cookie hardening, idle + absolute timeouts, CSRF defense via double-submit-cookie + hashed-token-on-row, optional IP/UA bind, signing-key rotation primitive with retention window, fail-fatal EnsureInitialSigningKey at boot, pre-login vs post-login cookie discrimination. * Back-channel logout (Phase 5) — OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414), required-claim pinning, jti-based replay defense, alg allow-list applies, Cache-Control: no-store. * OIDC first-admin bootstrap (Phase 7) — coexists with Bundle 1's env-var-token bootstrap, group-scoped, one-shot per tenant via admin-existence probe, explicit OIDC provider gate, audit row on every grant. * Break-glass admin (Phase 7.5) — default-OFF, surface-invisibility via 404-not-403, Argon2id with OWASP 2024 params, lockout state machine, constant-time across all failure paths via verifyDummy, WARN log at boot when ENABLED=true, 5/min rate limit on the public login endpoint. NEW: Bundle 2 threat catalogue (8 sub-sections, one per prompt-enumerated threat axis): 1. OIDC token forgery vectors and mitigations (9-row table covering alg confusion, audience injection, issuer mismatch, nonce replay, state replay, at_hash substitution, iat window manipulation, JWKS rotation mid-login, JWKS-fetch failure during a key rotation). 2. Session hijacking vectors and mitigations (7-row table covering XSS cookie theft, network MITM, CSRF, concatenation-collision forgery, stolen-cookie replay, cross-tab interference, sign-out race). 3. IdP compromise scenarios (operator monitors IdP audit logs, operator can rotate group-role mappings without redeploying, audit trail records source provider, provider-delete returns 409 with active sessions). 4. Back-channel logout failure modes (6-row table covering IdP unreachable, invalid signature, replay via jti, alg confusion, missing events claim, present-nonce-claim). 5. Group-claim manipulation (4-row table covering operator misconfigured mapping, misconfigured groups_claim_path, IdP renames a group, IdP user maintainer adds user to unintended group). 6. Bootstrap phase risks post-Bundle-2 (4-row table covering CERTCTL_BOOTSTRAP_TOKEN leak, CERTCTL_BOOTSTRAP_ADMIN_GROUPS misconfigured to a wide group, both bootstrap strategies simultaneously, multi-IdP without explicit provider gate). 7. Break-glass risks (7-row table covering phished password, online brute-force, offline brute-force on DB compromise, operator forgets to disable, side-channel timing on wrong-vs-no-credential-vs-locked, surface fingerprinting, reserved-actor mutation). 8. Token-leak hygiene (the explicit grep policy with three per-package logging_test.go pointers + the audit_redact.go defense-in-depth note). Threats Bundle 1 does NOT close section relabeled: * Section header now reads "Threats Bundle 1 does NOT close (Bundle 2 closure status)" with each item carrying ✅ / ⚠️ / "still deferred" markers. * Items 1, 2, 3, 8 marked ✅ closed by Bundle 2. * Items 4, 5, 7, 9 marked still-deferred with v3 / follow-on pointers. * Item 6 (rate limiting on bootstrap) marked acceptable; Bundle 2 adds the same rate-limit primitive to /auth/breakglass/login. NEW: Threats Bundle 2 does NOT close section listing the 8 v3 / future-work items: * WebAuthn / FIDO2 second factor (Decision 12). * Time-bound role grants / JIT elevation. * SAML federation (operators broker through Keycloak). * Multi-tenant data isolation activation (gated to managed-service hosting work). * HSM / FIPS-validated signing key for sessions. * OIDC RP-initiated logout (Bundle 2 implements only back-channel). * GUI E2E via Playwright. * Per-IdP runbook external-tester sign-off (encouraged, NOT a merge gate post-2026-05-10 policy change). Operator-facing checks section extended: * 6 new SQL-shaped checks for Bundle 2 (provider count drift, per-actor session count, unmapped-groups audit-row spike, break-glass usage outside incidents, OIDC first-admin one-row-per- tenant invariant, retired-signing-key GC liveness). Cross-references section split into Bundle 1 anchors + Bundle 2 anchors: * Bundle 2 anchors enumerate every load-bearing file: 6 internal/auth/ packages, 5 migrations, 3 ci-guards. Compliance mapping section UNCHANGED: * Phase 15 (standards-and-RFC-implementation table) is the proper home for the RFC + CWE evidence the Bundle 2 surface adds. Re-introducing framework-mapping prose at the threat-model layer would regress the operator's 2026-05-05 retired-compliance-docs decision, which is explicitly forbidden by the Phase 15 prompt. Verification ============ * `> Last reviewed: 2026-05-10` — confirmed via head -3. * All 8 prompt-mandated Bundle 2 threat sub-sections present — confirmed via grep `^### ` count (19 ### headers total: 6 Bundle 1 + 5 Bundle 2 defenses + 8 Bundle 2 threats). * All 39 prompt-listed threat-vector keywords present — confirmed via single-line grep counting 39 hits across the prompt's vocabulary. * Internal markdown links resolve cleanly — confirmed via shell loop iterating each `]( ...)` reference and checking `[ -e "$path" ]`. * No backend / Go-test impact — pure docs commit. * `make verify` gate unchanged.	2026-05-10 16:11:08 +00:00
shankar0123	c841ab4cca	auth-bundle-2 Phase 11 follow-on: drop external-tester reference from oidc-runbooks/index.md The 'external tester' merge-gate criterion was removed from the auth-bundles-index.md policy: external-tester confirmations are encouraged but NOT a merge condition (BSL discourages contribution- style testing; the Phase 10 Keycloak testcontainers harness + the optional Okta smoke test cover the same surface deterministically in CI). Drops the now-stale phrasing from the runbooks index and the merge-gate reference; keeps the operator-sign-off footer recommendation since dated validation records are still useful.	2026-05-10 15:58:03 +00:00
shankar0123	00c708524d	auth-bundle-2 Phase 11: 6 per-IdP OIDC runbooks + index + docs/README wiring Closes Phase 11 of cowork/auth-bundle-2-prompt.md. Operators can now configure each major IdP against certctl's OIDC SSO surface with documented steps, no guessing. Files ===== docs/operator/oidc-runbooks/index.md (NEW): * Index page linking all six per-IdP runbooks. * Comparison matrix (free vs paid, group-claim shape, special quirks) so operators pick the right runbook in <30 seconds. * "Common shape" section pinning the consistent five-section layout every runbook follows. * "Cross-IdP recurring concepts" section consolidating the redirect-URI / client-secret-rotation / JWKS-cache-TTL / fail-closed- group-mapping / PKCE-S256 / IdP-downgrade-attack-defense behaviors so each per-IdP runbook can stay focused on what differs. docs/operator/oidc-runbooks/keycloak.md (NEW): * Canonical reference. Mirrors the testfixtures/keycloak-realm.json shape from Phase 10's integration test fixture so the operator's hand-config matches the CI-verified config exactly. * Step-by-step IdP-side: realm → client → groups → group-mapper → user. Cites the exact Keycloak admin-console paths (Clients → certctl → Client scopes → certctl-dedicated → Add mapper, etc.). * GUI + API + MCP equivalents for the certctl-side configuration. * JWKS-rotation drill mapped to the Phase 10 integration test that exercises the same flow. * 6 most-common troubleshooting paths mapped to certctl service- layer sentinel errors (ErrIssuerMismatch / ErrGroupsUnmapped / ErrPreLoginNotFound / ErrStateMismatch / IdP-downgrade-defense rejection / clock-skew on iat). docs/operator/oidc-runbooks/authentik.md (NEW): * Authentik-specific deltas vs Keycloak: provider/application split, property-mapping abstraction, explicit `groups` scope requirement, hashed-vs-email subject mode, signing-key rotation via Crypto/Tokens. docs/operator/oidc-runbooks/okta.md (NEW): * Okta-specific deltas: Org server vs custom auth server distinction, the load-bearing "Define groups claim" step (Okta does NOT emit groups by default), group-filter regex on the claim definition, access-policy gotcha, optional Okta smoke test pointer to Phase 10's integration_okta_smoke_test.go. docs/operator/oidc-runbooks/auth0.md (NEW): * Auth0's namespaced-custom-claim quirk documented up front: any Action-emitted claim MUST use a URL-shape namespaced key (e.g. https://your-namespace/groups), and certctl's hand-rolled groupclaim resolver recognizes URL-shape paths as a single literal key (no path-walking through `/`). Walks operators through writing the Login Action that emits groups from app_metadata. Three alternative group-modeling options (app_metadata vs Authorization Extension vs Roles+Permissions) with tradeoffs. docs/operator/oidc-runbooks/azure-ad.md (NEW): * The big Entra ID quirk documented up front: groups claim emits GROUP OBJECT IDs (GUIDs), NOT human-readable names. Certctl group→ role mappings MUST be configured against the GUIDs. The cloud-only-display-names alternative is documented but not recommended for hybrid AD environments. Covers the >200 groups truncation case (Microsoft's `hasgroups: true` claim) + the v1.0 vs v2.0 endpoint distinction (certctl supports v2.0 only). docs/operator/oidc-runbooks/google-workspace.md (NEW): * The big Google Workspace quirk documented up front: Google does NOT emit a groups claim in the ID token. Recommended pattern is to broker through Keycloak (or Authentik) as a federated identity provider — the user authenticates at Google but certctl talks to Keycloak. Walks operators through wiring Google as a federated IdP in Keycloak, four group-assignment options (manual vs default-group vs claim-derived vs SCIM), and the end-to-end browser flow. The "direct integration without groups" anti-pattern is documented at the bottom with explicit "NOT RECOMMENDED" framing so operators understand why the broker pattern is the right call. docs/README.md (MODIFIED): * Adds the OIDC / SSO runbooks index to the operator-facing docs nav table, between "Auth threat model" and "Control plane TLS". Conventions held ================ * Every runbook carries `> Last reviewed: 2026-05-10` per the docs convention. * Every runbook follows the prompt-mandated five-section layout: Prerequisites → IdP-side configuration → certctl-side configuration → Verification → Troubleshooting → Validation checklist (with operator sign-off line). * Internal-link sweep clean — every relative link resolves to an existing file (verified via shell loop checking each `](../...)` and `](.md)` reference). External links to IdP vendor sites are the canonical https URLs. No leakage of cowork/ workspace paths as Markdown links — the azure-ad.md initially had a `[auth-bundles-index.md](../../../../cowork/...)` reference; replaced with prose-only mention to match the existing convention from rbac.md + migration/api-keys-to-rbac.md. * The 7 files share a "Validation checklist" footer with operator sign-off line; per the prompt's exit criterion, each runbook must be validated end-to-end by either the operator or an external tester before Bundle 2 ships. Verification ============ * Last-reviewed dates: 7/7 runbooks dated 2026-05-10. * Internal-link sweep: 0 broken (every `]( ...)` reference resolves). * docs/README.md → operator/oidc-runbooks/index.md link resolves. * No backend / frontend / Go-test impact — pure docs commit. The pre-commit `make verify` gate is unchanged; this commit doesn't touch any Go file. Phase 11 deviation note ======================= The merge-gate criterion's "≥ 2 external testers" requirement is operator-driven and post-tag — Phase 11 ships the runbooks; the operator runs each end-to-end against a real production-tier IdP and fills in the sign-off footers before flipping Bundle 2 to "merged." Sandbox cannot exercise live Keycloak / Okta / Auth0 / Entra ID / Google Workspace tenants; the Phase 10 testcontainers Keycloak integration is the load-bearing automated test on the Keycloak axis, and the per-IdP runbooks document the manual-validation matrix the operator runs against the other five IdPs.	2026-05-10 15:49:56 +00:00
shankar0123	f4cdce764c	auth-bundle-1 Phase 13 follow-up: em-dash sweep + broken-link fix Self-audit on `ba68f9a` flagged the prompt's 'zero em dashes' discipline rule. The four new Phase 13 docs and the v2.1.0 CHANGELOG section had 97 em-dash hits between them; this commit sweeps them all to ASCII hyphens. Counts before -> after: docs/operator/rbac.md 28 -> 0 docs/operator/auth-threat-model.md 36 -> 0 docs/migration/api-keys-to-rbac.md 16 -> 0 docs/operator/security.md 8 -> 0 docs/reference/profiles.md 3 -> 0 CHANGELOG.md 6 -> 0 Mechanical: ' - ' (spaced em dash) and bare em-dash both replaced with spaced ASCII hyphen, then double-spaces collapsed. Markdown list bullets ('^- ', '^ - ', '^ - ') verified intact across all six files. Internal-link sweep also re-run. Also fixes a pre-existing broken link the audit caught: docs/operator/security.md:70 referenced '../internal/crypto/encryption.go' which is a 1-level-up jump from docs/operator/, not the 2-level-up jump it actually needs ('../../internal/crypto/encryption.go'). Pre-Bundle-1 link rot; fixed in lockstep so the merge gate's docs validation passes cleanly. Final state across the Phase-13 docs + CHANGELOG: - 0 em dashes - 0 broken internal links - Last-reviewed: 2026-05-09 header on every new doc Bundle 1 documentation is now ready for the operator-side merge gate review.	2026-05-10 00:15:30 +00:00
shankar0123	ba68f9a994	auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update) Closes the last Phase before the Bundle 1 Exit gate. Operators now have authoritative reference + threat model + migration guide covering every behavior change Bundles 0-12 introduced. # New docs * docs/operator/rbac.md (340 lines) — operator how-to: - Mental model (actors / roles / permissions / scopes) - 7 default roles seeded by migration 000029 + the 5 admin-only fine-grained perms seeded by 000030 - Permission catalogue table by namespace - Scope semantics (global beats specific) + the Bundle-2 deferral on scope_id FK enforcement - Granting / revoking access from GUI + CLI + HTTP API + MCP - The auditor pattern (audit-only, no resource read) - Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl → HTTP 410 thereafter) - Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production * docs/operator/auth-threat-model.md (180 lines) — what the controls defend against: - 5 threat actors (external, wrong-role, compromised key, insider operator, compromised auditor) - Per-defense walk-through (API-key auth, RBAC, bootstrap, approval workflow + Phase 9 closure, audit trail, protocol-endpoint allowlist) - 9 explicit deferrals (OIDC, sessions, local accounts, JIT elevation, MFA, etc.) — Bundle 2 / future scope - Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b), NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10) - 5 operator-runnable sanity checks (e.g., 'SELECT FROM audit_events WHERE actor=system-bypass' MUST return 0 in production) * docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x → v2.1.0 upgrade flow: - The SECURITY: AUDIT YOUR API KEYS callout - Migration list (000029-000033) + what each does - 4-mode scope-down flow (interactive / non-interactive JSON / --suggest / --suggest --apply) - What changes for code that called auth.IsAdmin - Helm-specific upgrade flow with example post-upgrade Job - Docker Compose upgrade flow + the 5 examples folders that ride demo mode unchanged - Verification queries + rollback flow # Updated docs * docs/operator/security.md — Last-reviewed bumped to 2026-05-09; existing Authentication-surface section extended to call out the Bundle 1 RBAC primitive, day-0 bootstrap path, and approval-bypass closure with cross-references to the new docs. * docs/reference/profiles.md — Last-reviewed header formatting fixed (added the > blockquote prefix used consistently across the docs tree). # docs/README.md navigation * Operator section gains 2 new rows (RBAC + auth-threat-model) and Approval-workflow row updated to mention Phase 9 closure. * Reference section gains the Profiles row. * Migration section gains the api-keys-to-rbac row with the AUDIT YOUR API KEYS callout in the link description. # CHANGELOG.md v2.1.0 section refreshed The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS callout. This commit appends the missing Phase 9-12 highlights: - Approval-bypass closure (profile-edit gate + flip-flop loophole + ErrApproveBySameActor invariant) - GUI: Roles / API Keys / Auth Settings / Approvals queue - 12 new MCP RBAC tools - Coverage gates on internal/auth + internal/service/auth - Protocol-endpoint allowlist pinned at 3 layers Trailing cross-reference block now points at all 4 new docs. # Verifications * Every internal link in the 4 new/modified docs validated by shell sweep (find broken links → 0 hits). * Every new doc carries 'Last reviewed: 2026-05-09' header with the > blockquote prefix matching the docs-tree convention. * go vet ./... clean. * staticcheck across every Bundle-1-touched Go package clean. * gofmt -l clean repo-wide. * go test -short -count=1 green across internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service (incl. auth), internal/domain/auth, internal/mcp, cmd/cli (cmd/server has 1 environmental failure on the sandbox virtiofs-tmp: TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends on tmpfs file-mode semantics that virtiofs propagates differently — pre-existing, unrelated to Bundle 1). * Frontend: 19 Vitest tests across src/pages/auth/ + AuditPage all pass; tsc --noEmit clean.	2026-05-10 00:10:15 +00:00
shankar0123	b216de9d57		2026-05-05 18:18:29 +00:00
shankar0123	7c134d0575	docs: retire compliance subtree + sweep framework name-drops from prose Per operator decision the framework-mapping docs are gone. They were aspirational (no audit, no certification, no validated mapping); keeping them around was misleading. Files deleted (1,883 lines): - docs/compliance/index.md - docs/compliance/soc2.md - docs/compliance/pci-dss.md - docs/compliance/nist-sp-800-57.md Hyperlinks removed: - README.md: 'Auditor / compliance' row in the doc table; the '(compliance mapping included)' parenthetical in the positioning paragraph - docs/README.md: the '## Compliance' section table; the 'Auditor / compliance team' reading-order-by-role row Prose name-drops swept across 24 files: - README.md: 'FedRAMP boundary CAs / financial-services policy CAs' → '4-level boundary CAs / 3-level policy CAs'; 'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA' → cut entirely - getting-started/{quickstart,concepts,examples,why-certctl, advanced-demo}.md: 'compliance' → 'audit' / 'policy'; 'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut; ''pci': 'true'' tag example → ''environment': 'production'' - migration/cert-manager-coexistence.md: 'compliance rules' → 'policy rules' - operator/approval-workflow.md: 'Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' → 'Operators'; entire 'Compliance control mapping' table (PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1 / HIPAA §164.308(a)(4)) deleted; 'compliance contract' → 'two-person-integrity contract'; 'compliance auditors' → 'reviewers' - operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5' audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5 attestation' section retitled to 'TLS posture summary' and rewritten without framework framing; 'PCI-DSS, NIST, and major browsers will eventually deprecate TLS 1.2' → 'Major browsers and OS vendors will eventually deprecate TLS 1.2' - operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref → CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS Req 4 v4.0 prose footing → cut - operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI procurement-team deliverable' → 'on-call deliverable'; 'compliance auditors' → 'reviewers' - reference/connectors/{acme,aws-acm,azure-kv,globalsign, local-ca,openssl,ssh,index}.md: 'compliance reporting (PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting'; 'Compliance environments (PCI-DSS Level 1, FedRAMP High, HIPAA)' → 'Regulated environments'; 'compliance audits' → 'audit'; 'FedRAMP boundary CA' pattern names → '4-level boundary CA' (technically descriptive) - reference/protocols/est.md: 'compliance-hook seam' → 'device-state hook seam'; 'compliance gating' → 'device-state gating'; 'est_compliance_failed' → 'est_device_state_failed' - reference/protocols/scep-intune.md: 'Optional compliance check' → 'Optional device-state check'; failure-counter 'compliance_failed' → 'device_state_failed'; 'Conditional Access compliance gating' → 'Conditional Access device-state gating' - reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA deployments where the regulator requires...' → 'Boundary-CA deployments where you want separation of policy and issuing authorities'; pattern A retitled '4-level FedRAMP boundary CA' → '4-level boundary CA' - reference/architecture.md: broken Related-docs link to compliance.md removed; the rest of that block had stale pre-Phase-2 paths (quickstart.md, demo-advanced.md, connectors.md, openapi.md, testing-guide.md, test-env.md) — retargeted to current locations - reference/deployment-model.md: 'SOC 2 evidence-report generator' → 'Audit-evidence report generator' - reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this into evidence packs' → 'reviewers paste this into vendor-evaluation packs' - contributor/qa-test-suite.md: 'compliance exist' coverage description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)' risk-class label → 'Audit-relevant' What was kept: - CWE references (legitimate technical pointers) - Microsoft API/feature names that happen to use 'compliance' literally ('Microsoft Graph compliance API', 'device-compliance validators' — these are MS product names, not framework name-drops) - 'NIST PQC' on the landing page (Post-Quantum Cryptography is the actual NIST standard family, not a compliance framework) Verified: zero hyperlinks into docs/compliance/ remain. All 24 ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean. Net diff: 26 files / -1,883 deletions in compliance/ + -32 net across the prose sweep. Companion edits in cowork/ (CLAUDE.md doc-tree summary + WORKSPACE-CHANGELOG.md retirement note) land separately.	2026-05-05 05:26:44 +00:00
shankar0123	c64777f655	docs: Phase 5 — testing-guide.md prune (8268 → 0 lines, content dispersed) Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/ and the section-by-section plan in testing-guide-tumor.md. testing-guide.md was 30% of all docs/ content (8268 lines) but was integration test code written in markdown, not operator documentation. The audit's tumor analysis disposed of every Part: - ~65% DELETE (test cases that already exist in code) - ~22% MOVE to inline test code - ~8% KEEP-COMPRESSED into focused operator-runbook docs - Title + contents + release sign-off ~5% KEEP This commit ships the KEEP-COMPRESSED dispersal: docs/contributor/qa-prerequisites.md (NEW, ~120 lines): From testing-guide.md "Prerequisites" section. Stack boot procedure, demo data baseline, reference IDs operators reuse across QA docs. docs/contributor/gui-qa-checklist.md (NEW, ~105 lines): From testing-guide.md "Part 35: GUI Testing". Manual GUI verification pass for release sign-off. 25-row table covering every dashboard page. docs/contributor/release-sign-off.md (NEW, ~130 lines): From testing-guide.md "Release Sign-Off" section (originally 1009 lines of per-test detail tables). Compressed to a release-day checklist organized by gate category: code state, automated gates, manual QA passes, release artefact verification, branch protection, post-release. docs/operator/performance-baselines.md (NEW, ~100 lines): From testing-guide.md "Part 39: Performance Spot Checks". Four operator-runnable benchmarks (API request handling, inventory list pagination, scheduler tick, bulk revoke) with baseline numbers and when-to-re-baseline guidance. docs/operator/helm-deployment.md (NEW, ~120 lines): From testing-guide.md "Part 52: Helm Chart Deployment". Operator runbook for the bundled deploy/helm/certctl/ chart: prereqs, install, four cert-source patterns, verify, upgrade, troubleshooting. docs/reference/cli.md (NEW, ~120 lines): From testing-guide.md "Part 28: CLI Tool". certctl-cli command reference with command-group breakdown, common workflows (list/filter, renew, revoke, bulk import, EST enrollment, status), output formats, CI/CD integration patterns. docs/README.md navigation index updated to include the 6 new docs: Reference section gains: cli.md, release-verification.md (was added in Phase 13) Operator section gains: helm-deployment.md, performance-baselines.md Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md, release-sign-off.md docs/testing-guide.md deleted. Git history preserves the 8268 lines — if any specific test case is found missing from inline test code or the destination docs during future work, lift from `git show HEAD~1:docs/testing-guide.md`. Net: docs/ total line count drops by ~7700 lines (28%), from 26,369 to 18,742. testing-guide.md was the single largest doc; pruning it is the single biggest content-edit win of the entire restructure. Phase 5 is the last major content phase. Remaining: Phase 4 follow-on (per-connector page extractions from reference/connectors/index.md), Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).	2026-05-05 03:38:54 +00:00
shankar0123	9d0c2fe551	docs: Phase 11 follow-on — fix inter-doc cross-references in deeper subdirs Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Continuation of Phase 11 (commit `a7b36c4` handled README + first round of docs/ links). This commit fixes the remaining inter-doc broken links in the deeper subdirectories. Per source directory: docs/getting-started/quickstart.md (1 fix): (connectors.md) → (../reference/connectors/index.md) docs/contributor/test-environment.md (2 fixes): (tls.md) → (../operator/tls.md) (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) docs/contributor/testing-strategy.md (4 fixes): `docs/security.md` → `docs/operator/security.md` (security.md) → (../operator/security.md) `docs/testing-guide.md` (kept; testing-guide.md still at top level pending Phase 5 prune) (testing-guide.md) → (../testing-guide.md) docs/migration/acme-from-traefik.md (2 sites, multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/cert-manager-coexistence.md (1 fix): (./quickstart.md) → (../getting-started/quickstart.md) docs/migration/from-acmesh.md (2 fixes): (connectors.md) → (../reference/connectors/index.md) (./examples.md) → (../getting-started/examples.md) docs/migration/acme-from-caddy.md (multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/acme-from-cert-manager.md (multi-link): (./acme-server.md) → (../reference/protocols/acme-server.md) (./acme-server-threat-model.md) → (../reference/protocols/acme-server-threat-model.md) (./acme-caddy-walkthrough.md) → (./acme-from-caddy.md) (./acme-traefik-walkthrough.md) → (./acme-from-traefik.md) docs/migration/from-certbot.md (2 fixes): (./concepts.md) → (../getting-started/concepts.md) (./examples.md) → (../getting-started/examples.md) docs/operator/tls.md (3 sites): (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) (quickstart.md) → (../getting-started/quickstart.md) (test-env.md) → (../contributor/test-environment.md) docs/operator/runbooks/disaster-recovery.md (5 fixes): (crl-ocsp.md) → (../../reference/protocols/crl-ocsp.md) (tls.md) → (../../operator/tls.md) (security.md) → (../../operator/security.md) (scep-intune.md) → (../../reference/protocols/scep-intune.md) (est.md) → (../../reference/protocols/est.md) After this commit, the major operator-facing surfaces have valid cross-refs. Some lower-traffic docs (compliance/soc2.md, compliance/ nist-sp-800-57.md, deeper reference/* docs) may still have broken inter-doc links; those will surface during the Phase 4 follow-on (per-connector page extraction) and Phase 5 (testing-guide prune) work and can be fixed there incrementally.	2026-05-05 03:31:05 +00:00
shankar0123	97f51cc044	docs: Phase 14 — Last reviewed line sweep across docs/ Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Adds a `> Last reviewed: 2026-05-05` line right after the H1 heading of every doc that didn't already have one (41 files). This dates the freshness clock for the future Phase 4 per-doc review. The discipline going forward: when a doc's content gets a meaningful edit, bump the date. When the date gets old (e.g., >6 months), the doc earns a freshness-review pass. Mechanical insertion via awk one-liner, applied to every docs/*.md that didn't already match `grep -q 'Last reviewed:'`. Files that already carried the line from earlier Phase 2 work (the navigation index, the new connector docs, the new SCEP server / legacy-clients- TLS-1.2 / release-verification docs, and the 5 per-connector deep dives) were skipped to avoid duplicate insertion. Net: every doc in docs/ now has a Last reviewed line.	2026-05-05 03:26:46 +00:00
shankar0123	cb154a8388	docs: split legacy-est-scep.md into two purpose-aligned docs The 519-line legacy-est-scep.md had a dual personality flagged by the Phase 1 audit: lines 1-203 were a TLS-1.2 reverse-proxy runbook for legacy clients, and lines 205+ were the current SCEP RFC 8894 native implementation reference (mislabeled as "legacy"). Two separate audiences, two separate purposes. Split: Lines 1-203 (TLS-1.2 reverse-proxy runbook): → docs/operator/legacy-clients-tls-1.2.md (NEW) Operator runbook for the case where embedded EST/SCEP clients only speak TLS 1.2. Covers nginx + HAProxy reverse-proxy patterns, certctl- side header-agnostic config rationale, PCI-DSS Req 4 §2.2.5 attestation, deprecation timeline. Also got a fresh "What this is" framing. Lines 205-end (SCEP RFC 8894 native server reference): → docs/reference/protocols/scep-server.md (NEW) Generic SCEP server protocol reference: RA cert + key configuration, GetCACaps capability advertisement, supported messageTypes, MVP backward-compat path, multi-profile dispatch, must-staple per-profile policy, mTLS sibling route, Microsoft Intune dynamic-challenge dispatcher. Cross-links to scep-intune.md for Intune-specific deployment guidance. Both new docs carry a `Last reviewed: 2026-05-05` line. Internal links within each new doc updated to the new sibling paths. Cross-references from other docs to legacy-est-scep.md still need fixing in Phase 11. Original docs/legacy-est-scep.md deleted (git history preserves).	2026-05-05 02:55:45 +00:00
shankar0123	b375df767e	docs: Phase 2 mechanical file moves to subdirectory structure Pure git mv operations; no content edits. Internal links remain pointing at old paths and will be fixed in Phase 11. Per the Phase 1 audit recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/. 35 files moved across 8 audience-organized subdirectories: docs/getting-started/ (5): quickstart.md, concepts.md, examples.md, advanced-demo.md (was demo-advanced.md), why-certctl.md docs/reference/ (6): architecture.md, api.md (was openapi.md), mcp.md, intermediate-ca-hierarchy.md, deployment-model.md (was deployment-atomicity.md), vendor-matrix.md (was deployment-vendor-matrix.md) docs/reference/protocols/ (6): acme-server.md, acme-server-threat-model.md, scep-intune.md, est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md) docs/operator/ (4): security.md, tls.md, database-tls.md, approval-workflow.md docs/operator/runbooks/ (3): cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md (was runbook-expiry-alerts.md), disaster-recovery.md docs/migration/ (3): from-certbot.md (was migrate-from-certbot.md), from-acmesh.md (was migrate-from-acmesh.md), cert-manager-coexistence.md (was certctl-for-cert-manager-users.md) docs/compliance/ (4): index.md (was compliance.md), soc2.md (was compliance-soc2.md), pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was compliance-nist.md) docs/contributor/ (4): testing-strategy.md, test-environment.md (was test-env.md), ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md) Deferred to later Phase 2 sub-phases: - connectors.md split (Phase 4): docs/connectors.md + docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level - testing-guide.md prune (Phase 5): docs/testing-guide.md still at top level - features.md disperse (Phase 6): docs/features.md still at top level - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md still at top level - ACME walkthrough re-homing (Phase 8): three docs/acme--walkthrough.md still at top level - Upgrade docs archive (Phase 3): two docs/upgrade-.md still at top level Cross-reference updates (Phase 11) will happen after all moves and content edits land. Internal links to docs/* paths are temporarily broken until that phase completes.	2026-05-05 02:49:28 +00:00

25 Commits