certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 17:22:07 +00:00

Author	SHA1	Message	Date
shankar0123	a923cf697c	harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8) Audit 2026-05-11 A-8 closure. Closes the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure (`2e97cc1`) — production-startup observability for actor-demo-anon residual grants + CI guard banning new synthetic- admin code paths. What this changes: * cmd/server/preflight_demo_residual.go (new) runs after the DB pool + audit service are constructed and before the HTTPS listener starts. Under any non-'none' auth type it queries actor_roles for the synthetic actor-demo-anon and emits a WARN log + a categorized audit row (auth.demo_residual_grants_detected) listing every grant present. Migration 000029 unconditionally seeds the ar-demo-anon-admin row at install time, so EVERY production deploy will see this WARN on first boot; the intended cutover workflow is cleanup-once at production handover. * CERTCTL_DEMO_MODE_RESIDUAL_STRICT (new env var on AuthConfig, default false) pivots the WARN to fail-closed startup refusal for operators who want a paranoid posture against re-seeding. * POST /api/v1/auth/demo-residual/cleanup (new handler at internal/api/handler/demo_residual.go) is an admin-class (auth.role.assign) endpoint that removes every actor-demo-anon row from actor_roles and returns {removed: int64}. Idempotent; refuses 503 under Auth.Type=none (deleting the row would break the demo path); audit-logs every invocation including no-op zero-removed calls so the admin's action is always recorded. * scripts/ci-guards/no-new-synthetic-admin.sh pins the 17-entry allowlist of source files that legitimately reference the actor-demo-anon literal. New runtime code paths that resolve to the synthetic actor (the same pattern that produced the original CRIT class) are rejected at PR time. CI workflow auto-picks the script via the existing scripts/ci-guards/.sh loop in .github/workflows/ ci.yml; no workflow edit needed. Regression matrix: cmd/server/preflight_demo_residual_test.go — 7 tests covering the 4 main behaviour branches (testcontainers-backed, testing.Short()- skipped: DemoModeActive_Skips, NoResidue_Passes, HasResidue_LogsAnd Audits, StrictMode_RefusesStartup, DeleteDemoAnonResidue_Idempotent) plus 3 pure-Go stdlib unit tests for the row-string formatter + nil-safety contracts on both helpers. * internal/api/handler/demo_residual_test.go — 7 stdlib+httptest cases: HappyPath, Idempotent_ReturnsZero, RejectsInDemoMode (503), CleanupError_Surfaces500, NilCleanupFn (defensive 500), NilAuditWriter_DoesNotPanic, MissingActorContext (falls back to 'unknown' actor in the audit row). * internal/api/router/openapi_parity_test.go — new POST /api/v1/auth/demo-residual/cleanup entry plus 6 pre-existing pre-A-8 entries (oidc/test, jwks-status, users CRUD, runtime-config) that had drifted out of SpecParityExceptions; the parity test was red on dev/auth-bundle-2 before my work; this commit returns it to green with full per-entry justifications + parity-debt notes. Docs: * docs/operator/security.md — new 'Demo-to-production cutover (Audit 2026-05-11 A-8)' section explaining the WARN message, the cleanup curl one-liner, the equivalent SQL, the strict-mode env var, and the CI guard. * docs/operator/rbac.md — Last-reviewed bump + pointer to the new env var + the security.md section. * cowork/auth-bundles-audit-2026-05-10.md — HIGH-12 row gains an 'A-8 follow-on CLOSED 2026-05-11' annotation describing the deferred Phase 2 leg now landed. * CHANGELOG.md — Unreleased ### Security entry summarizing the four legs (detector + cleanup + strict-mode flag + CI guard) and the acquisition-readiness narrative this closes. Operator-facing impact: this closes a credibility gap, not an exploitable vulnerability. The residue requires a regression elsewhere in the middleware chain to be exploitable. After this fix, the canonical narrative ('RBAC primitive with no synthetic- admin fallback') is fully true. Refs cowork/auth-bundles-fixes-2026-05-11/08-high-demo-mode-residual- cleanup.md.	2026-05-11 11:45:54 +00:00
shankar0123	ba0959ddc7	feat(auth/sessions): list-all gate + revoke-all-except-current (MED-1/2/3) Audit 2026-05-10 Fix 13 Phase A — close MED-1, MED-2, MED-3. MED-1 (verification only): Fix 01's CRIT-1 router-gate sweep already wraps every read endpoint with rbacGate(reg.Checker, '<resource>.read', ...). Verified post-sweep that GET /api/v1/certificates, /profiles, /issuers, /targets, /agents, /audit all carry the corresponding *.read permission gate. MED-2: ListSessions now gates ?actor_id=<other> on auth.session.list.all via the new permissionChecker projection installed by WithPermissionChecker. cmd/server/main.go threads the existing authCheckerAdapter into the handler. When caller's actor_id != caller.ActorID AND the handler has a checker, an inline CheckPermission(..., 'auth.session.list.all', 'global', nil) call fires; on false → 403 with explanatory message; on repository error → 500. Defense-in-depth: the router-level rbacGate enforces auth.session.list as the floor; the .list.all re-check is the privilege-elevation guard for cross-actor queries that the rbacGate can't express (it can't see the query parameter). MED-3: ship DELETE /api/v1/auth/sessions?except=current — the 'sign out all other sessions' flow. Gated by auth.session.revoke; the handler reads the caller's current session ID from session.SessionFromContext(ctx) (cookie-mode); empty for Bearer-mode callers (in which case ALL the actor's sessions revoke, matching 'log me out everywhere' semantic for API-key users). New repository method SessionRepository.RevokeAllExceptForActor: UPDATE sessions SET revoked_at = NOW() WHERE actor_id = AND actor_type = AND tenant_id = AND revoked_at IS NULL AND id != returning rowcount. Added to the interface in internal/repository/session.go, wired into postgres impl, and added to all SessionRepo test stubs (handler stubSessionRepo, service-test stubSessionRepo, benchmark slowSessionRepo). The session.SessionRepo internal interface also gains the method so the bench_test.go forwarder compiles. Audit row records the count for compliance evidence (one summary row per invocation per the existing audit policy). OpenAPI parity exception added for the new route — the unbounded-DELETE-with-query-flag shape doesn't fit standard REST CRUD operations cleanly; matches the documented-inline pattern set by the streaming audit-export endpoint. GUI button (SessionsPage 'Sign out all other sessions') deferred to Phase D. Refs: cowork/auth-bundles-audit-2026-05-10.md MED-1, MED-2, MED-3 Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase A	2026-05-10 21:49:35 +00:00
shankar0123	912ec3f547	fix(audit): ship streaming NDJSON audit export endpoint (HIGH-9 / HIGH-11) Audit 2026-05-10 HIGH-9 + HIGH-11 closure. HIGH-10 deferred to v3. HIGH-9 (verification only): Fix 01's CRIT-1 router-gate sweep already wraps every role-mgmt route with rbacGate. Verified via grep: - GET /api/v1/auth/roles → auth.role.list - POST /api/v1/auth/roles → auth.role.create - GET /api/v1/auth/roles/{id} → auth.role.list - PUT /api/v1/auth/roles/{id} → auth.role.edit - DELETE /api/v1/auth/roles/{id} → auth.role.delete - POST /api/v1/auth/roles/{id}/permissions → auth.role.edit - DELETE /api/v1/auth/roles/{id}/permissions/{perm} → auth.role.edit - POST /api/v1/auth/keys/{id}/roles → auth.role.assign - DELETE /api/v1/auth/keys/{id}/roles/{role_id} → auth.role.revoke Defense-in-depth invariant restored: privilege check fires at BOTH router and service layers; AST-level coverage is pinned by TestRouterRBACGateCoverage (Fix 01's CI guard). HIGH-11: ship GET /api/v1/audit/export — streaming NDJSON audit export gated by audit.export. Pre-fix, the permission was seeded into r-admin and r-auditor (migration 000031) but no endpoint enforced it; r-auditor's claim was misleading capability advertisement. Post-fix: - internal/api/handler/audit.go::ExportAudit emits one JSON event per line as application/x-ndjson — the de-facto compliance-archive format consumed by SIEMs (Splunk universal forwarder, Elastic Filebeat, Vector). - Required from/to (RFC3339) bounded to a 90-day max window; optional category filter (cert_lifecycle/auth/config); optional limit capped at 100k rows. - Content-Disposition: attachment; filename="certctl-audit-<from>_to_<to>.ndjson" so curl + browser downloads land with a sensible filename. - Recursively self-audits: every successful export emits an audit.export row capturing actor + range + category + row count so compliance reviewers can see who pulled which evidence and when. - Service layer: AuditService.ExportEventsByFilter reuses the existing repository.AuditFilter (From/To/EventCategory already supported); no SQL duplication. - OpenAPI parity exception added for the streaming-shape route (matches the ACME/SCEP/EST precedent at internal/api/router/openapi_parity_test.go::SpecParityExceptions). Regression matrix in audit_export_test.go (7 cases): - TestExportAudit_StreamsNDJSONLines (happy path; pins content-type + content-disposition + JSON-per-line shape + recursive self-audit) - TestExportAudit_RejectsRangeBeyond90Days (100-day window → 400) - TestExportAudit_RejectsMissingFromOrTo (3 cases) - TestExportAudit_RejectsInvalidCategory (unknown enum → 400) - TestExportAudit_AcceptsValidCategoryFilter (auth filter passes through) - TestExportAudit_RejectsNonGET (POST → 405) - TestExportAudit_RejectsToBeforeFrom (inverted range → 400) The auditor role's surface is now complete (read + export). The handler interface is extended with ExportEventsByFilter + RecordEventWithCategory; mockAuditService satisfies both with a self-audit trace (lastAuditAction / lastAuditCategory / lastAuditActor). HIGH-10 (scope + expiry on assignRoleRequest): DEFERRED to v3. Schema column already exists (ActorRole.ExpiresAt); load-bearing wire remains v3 work. Documented carve-out at HIGH-10's annotation. Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-9 HIGH-11 Spec: cowork/auth-bundles-fixes-2026-05-10/12-high-9-10-11-role-mgmt-cleanup.md	2026-05-10 21:36:01 +00:00
shankar0123	f1d97710e1	feat(gui+auth): break-glass admin GUI surface (CRIT-4 closure) Closes CRIT-4 of the 2026-05-10 audit. Bundle 2 Phase 7.5 shipped the break-glass backend (Argon2id + lockout + 4 endpoints) but no GUI surface. Operators recovering during an SSO outage had to hand-craft curl commands — operationally hostile and the opposite of what docs/operator/security.md advertised. This commit closes the gap. Three GUI surfaces: 1. LoginPage.tsx — inline "Use break-glass account (SSO outage recovery)" toggle below the API-key form. Clicking reveals an amber-bordered inline form (actor-id + password, autocomplete=off). Calls breakglassLogin(actor_id, password); on success navigates to "/" where AuthProvider re-validates via the session-cookie path. Intentionally low-visibility (text-amber-600 small text) — this is the deliberate-bypass path, not the everyday-login path. 2. web/src/pages/auth/BreakglassPage.tsx — admin page at /auth/breakglass (permission-gated by auth.breakglass.admin). Three sections: - Sticky security banner ("every action audited; use only during incidents"). - Set/rotate-password form (≥12-char + confirm-match). - Credentialed-actor table with rotate / unlock (disabled when not locked) / remove per row. Remove requires type-the-actor-id confirmation. 3. Layout.tsx nav — "Break-glass" entry under the auth section. Visible to all callers; the page itself permission-gates (server-side 403 is the load-bearing defense). Cosmetic hide-when-no-perm is deferred to fix 14's LOW bundle. Backend support (new endpoint required to enumerate credentialed actors): - internal/repository/breakglass.go — BreakglassCredentialRepository gains List(ctx, tenantID) method. - internal/repository/postgres/breakglass.go — postgres impl; reuses the existing breakglassColumns / scanBreakglass helpers. - internal/auth/breakglass/service.go — Service.List(ctx) method; returns ErrDisabled when CERTCTL_BREAKGLASS_ENABLED=false (handler maps to 404 for surface invisibility). - internal/api/handler/auth_breakglass.go — ListCredentials handler; password_hash field NEVER serialized to the wire (response shape is intentionally limited to actor_id + timestamps + failure_count + locked_until). - internal/api/router/router.go — registers GET /api/v1/auth/breakglass/credentials gated by auth.breakglass.admin. - internal/api/router/openapi_parity_test.go — SpecParityExceptions entry for the new endpoint (full OpenAPI row rides along with the next OpenAPI sweep). GUI api/client.ts gains breakglassListCredentials() + the BreakglassCredentialRow type matching the wire shape. Six Vitest cases in BreakglassPage.test.tsx pin the contract: permission gate (forbidden state when caller lacks the perm; admin surface when they have it), set-password mismatch rejection, set- password below-threshold-length rejection, unlock-disabled-when-not- locked, remove-modal type-confirm. Verification gate green: - gofmt -l clean on all touched files - go vet clean - go test -short -count=1 on internal/api/router (TestRouter_OpenAPIParity + TestRouterRBACGateCoverage + TestRouter_AuthExemptAllowlist), internal/api/handler (all BCL tests + ListCredentials), internal/auth/breakglass (Service.List + stubRepo.List), internal/repository/postgres, internal/domain/auth (auditor pin) — all pass. CRIT-1 + CRIT-2 + CRIT-3 from the same audit are already closed on this branch (commits `68ca42f`, `ca1e135`, `00eace8`). CRIT-5 (AllowedEmail- Domains lying field) remains the last Critical blocker for v2.1.0. Spec: cowork/auth-bundles-fixes-2026-05-10/04-crit-4-breakglass-gui.md. Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-4	2026-05-10 20:24:52 +00:00
shankar0123	1d01c87663	auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap + break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility) Phase 7 — OIDC first-admin bootstrap (Decision 3): - Optional AdminBootstrapHook closure on oidc.Service. When wired, HandleCallback consults the hook AFTER group resolution + user upsert and BEFORE the empty-mapping fail-closed check. Hook receives (providerID, groups, userID); returns grantAdmin=true when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no admin exists yet in the tenant. - cmd/server/main.go wires the hook as a closure that: Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured). * Probes AdminExists via authActorRoleRepo (admin-already-exists silently returns false; bootstrap mode is one-shot per tenant). * Walks group intersection. * On match: grants r-admin via authActorRoleRepo.Grant + emits the bootstrap.oidc_first_admin audit row with event_category=auth + INFO log. - Coexists with the Bundle 1 env-var-token bootstrap. Both paths can be configured; first match wins (admin-existence probe short-circuits the second). - HandleCallback's empty-mapping fail-closed check moved AFTER the hook so a fresh deployment with zero group_role_mappings can still mint the first admin. - 5 tests in service_test.go: hook grants admin on match, hook returns false preserves empty-mapping fail-closed, admin-already- exists silently falls through to normal mapping, hook-error wraps + bubbles, idempotent when admin is already in the mapped role set. Phase 7.5 — Break-glass admin (Decision 4, default-OFF): Migration 000038 ships: - breakglass_credentials table — at-most-one-credential-per-actor (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout state machine (failure_count, locked_until, last_failure_at). FK CASCADE on users(id) so deleting a user atomically removes their credential. - Two new permissions seeded into r-admin only: auth.breakglass.admin — set/rotate/unlock/remove credentials. auth.breakglass.login — actor uses break-glass to log in. CanonicalPermissions extended in lockstep. internal/auth/breakglass/service.go (~580 LOC): - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED. - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4, salt=16 random bytes, output=32 bytes); per-password random salt; PHC-format hash output. Min 12 / max 256 byte input. - Authenticate: constant-time-compare via subtle.ConstantTimeCompare on every code path. Identical 401 + identical timing across the wrong-password / locked-account / non-existent-actor paths so an attacker cannot probe whether a given actor has break-glass configured. Non-existent-actor + locked-account paths run a verifyDummy() Argon2id pass for timing parity. Lockout state machine: failure_count++ on every wrong attempt; threshold (default 5) trips locked_until = NOW() + duration (default 15m). Successful Authenticate resets the counter. Reset-window: failures aged out after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h) auto-reset on next attempt. - Unlock + RemoveCredential: admin-only (auth.breakglass.admin gated at the router via rbacGate). Audit rows on every operation. - All public methods refuse to act when Enabled()==false (returns ErrDisabled; the handler maps to HTTP 404 — surface invisibility). internal/repository/postgres/breakglass.go ships the 5-method postgres impl with atomic single-statement IncrementFailure (so concurrent racing wrong-password attempts can't observe an intermediate state and slip past the threshold) and idempotent ResetFailureCount. internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP surface: - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per source IP via the existing rate limiter; returns 404 when disabled). On success sets the post-login session cookie + CSRF cookie via SessionService.Create + 204. On any failure: uniform 401 + identical timing (the service has already audited the specific failure category). - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin) - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock (auth.breakglass.admin) - DELETE /api/v1/auth/breakglass/credentials/{actor_id} (auth.breakglass.admin) Admin endpoints share the surface-invisibility property: when CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns 404 (not 403) so probing via the admin surface gets the same signal as probing the login endpoint. Tests (internal/auth/breakglass/service_test.go): All 8 Phase 7.5 spec-mandated negative cases: 1. Service.Enabled()==false → all ops return ErrDisabled. 2. Wrong password → ErrInvalidCredentials, failure_count++, audit row with event_category=auth. 3. Failure_count exceeds threshold → locked, subsequent attempts (including with the CORRECT password) return identical-shape 401 while the lockout window holds. 4. Lockout window expires → next attempt with correct password succeeds + resets the counter. 5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword. 6. Password leak hygiene — the service has zero slog calls; the audit-row map literal never includes the password plaintext. 7. Argon2id hash never appears in logs OR API responses — pinned by `json:"-"` tag on BreakglassCredential.PasswordHash + a belt-and-braces json.Marshal probe asserting the hash bytes never appear in the marshaled output. 8. Constant-time-compare verified via timing-statistical test — wrong-password vs no-credential paths take statistically indistinguishable time (within 5x ratio). The verifyDummy() hash compute on the no-credential + locked paths is what keeps timing parity; absent that, an attacker could side- channel "actor doesn't have a credential" via timing. Plus coverage-lift batch covering: SetPassword first-time vs rotate, no-caller-id rejection, no-target-id rejection, RNG failure surface, Authenticate happy-path mints session, no-credential audit row, session-mint-failure surface, FailureResetInterval recycle, Unlock + RemoveCredential happy paths, hash-format unit tests (round-trip, mismatch, malformed/wrong-version/bad-base64 formats), nil-audit + nil-session pass-through. Coverage on internal/auth/breakglass/ at 91.5% per-statement (above the Phase 7.5 spec ≥ 90% floor). cmd/server/main.go wiring: - Constructs breakglassRepo + breakglassService + breakglassHandler after the OIDC service block. - breakglassSessionMinterAdapter shim bridges *session.Service.Create to the breakglass.SessionMinter port. - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator visibility for the deliberate SSO-bypass). internal/config/config.go gains: - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID). - AuthConfig.Breakglass nested struct with 4 env vars (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION + LOCKOUT_RESET_INTERVAL). Router wiring: - 4 new breakglass routes registered when reg.AuthBreakglass != nil; public login route via direct r.mux.Handle (auth-exempt), 3 admin routes via r.Register + rbacGate(auth.breakglass.admin). - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes allowlist with Phase 7.5 justification. - SpecParityExceptions extended with 4 new entries documenting the Phase 7.5 deferral of full per-endpoint OpenAPI rows (handler doc-block at the top of auth_breakglass.go is the operator-facing reference). Threat model (encoded in service.go + auth_breakglass.go doc-blocks + migration 000038 docstrings, to be promoted to docs/operator/auth- threat-model.md in Phase 12): - Break-glass is a deliberate bypass of the SSO security boundary. An attacker who phishes the password OR finds it in a compromised password manager bypasses MFA, OIDC, and every group-claim gate. - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady- state. Enable only during SSO-broken incidents. Disable after recovery. - WebAuthn pairing (v3 per Decision 12) is the load-bearing second factor. Without it, break-glass is best treated as an emergency- only path. - Audit trail surfaces every break-glass action under event_category=auth; the auditor role can monitor for unexpected break-glass logins. Verifications: gofmt clean, go vet clean across all touched packages, go test -short -count=1 green across internal/auth/oidc (3.0s; new Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives), internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage batch passing), internal/config + internal/domain/auth + internal/api/ router + internal/api/handler all green, no regressions in Bundle 1 packages.	2026-05-10 06:51:41 +00:00
shankar0123	9c679a5960	auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints), pre-login store, OpenID Connect Back-Channel Logout 1.0, cookieAuth scheme, 7 new auth permissions, CI guard, handler tests Phase 5 of the bundle puts the Phase 3 OIDC service + Phase 4 session service on the wire. 13 HTTP endpoints split into three logical groups: Public OIDC handshake (auth-exempt; protocol-mediated): GET /auth/oidc/login?provider=<id> -> 302 to IdP authorization URL + sets certctl_oidc_pending cookie (10-min TTL, Path=/auth/oidc/, SameSite=Lax) GET /auth/oidc/callback?code=...&state=... -> consume pre-login row, run Phase 3's 11-step token validation, mint post-login session, 302 to dashboard POST /auth/oidc/back-channel-logout -> OpenID Connect BCL 1.0 — IdP POSTs logout_token JWT; certctl validates signature against IdP JWKS via Phase 3 alg allow-list, required claims (iss/aud/iat/jti/ events; exactly one of sub/sid; nonce ABSENT per spec §2.4), revokes matching sessions, returns 200 with Cache-Control: no-store POST /auth/logout -> revoke caller's session Session management (RBAC-gated auth.session.): GET /api/v1/auth/sessions -> auth.session.list (own / all) DELETE /api/v1/auth/sessions/{id} -> auth.session.revoke (own bypass) OIDC provider + group-mapping CRUD (RBAC-gated auth.oidc.): GET /api/v1/auth/oidc/providers -> auth.oidc.list POST /api/v1/auth/oidc/providers -> auth.oidc.create (client_secret encrypted at rest via internal/crypto.EncryptIfKeySet) PUT /api/v1/auth/oidc/providers/{id} -> auth.oidc.edit DELETE /api/v1/auth/oidc/providers/{id} -> auth.oidc.delete (refused via ErrOIDCProviderInUse → 409 when users authenticated via this provider) POST /api/v1/auth/oidc/providers/{id}/refresh -> auth.oidc.edit (re-runs IdP downgrade defense via OIDCService.RefreshKeys) GET /api/v1/auth/oidc/group-mappings -> auth.oidc.list POST /api/v1/auth/oidc/group-mappings -> auth.oidc.edit DELETE /api/v1/auth/oidc/group-mappings/{id} -> auth.oidc.edit Migration 000037 ships: - oidc_pre_login_sessions table (10-min absolute TTL, FK CASCADE on oidc_provider_id, FK RESTRICT on signing_key_id; index on absolute_expires_at for the GC sweep); - 7 new permissions seeded into r-admin only: auth.session.list, auth.session.list.all, auth.session.revoke, auth.oidc.list, auth.oidc.create, auth.oidc.edit, auth.oidc.delete CanonicalPermissions extended in lockstep at internal/domain/auth/ validate.go. Pre-login machinery: - internal/repository/oidc.go gains PreLoginRepository interface + PreLoginSession struct + ErrPreLoginNotFound / ErrPreLoginExpired sentinels. - internal/repository/postgres/oidc_prelogin.go ships the impl; LookupAndConsume uses DELETE ... RETURNING for atomic single-use. - internal/auth/oidc/prelogin.go is the PreLoginAdapter that bridges the OIDC service's Phase 3 PreLoginStore interface to the new repository, signing the cookie value under the active SessionSigningKey via the same v1.<id>.<key>.<HMAC> wire format Phase 4 uses for post-login cookies. Defense-in-depth: the pre-login `pl-` prefix is enforced by ParseCookieValue(prefix); a stolen pre-login cookie cannot be replayed against the post-login Validate path (pinned by TestService_Validate_RejectsPreLoginCookieAtPostLoginGate). Session package extension: - internal/auth/session/service.go gains exported SignCookieValue, ParseCookieValue (with caller-supplied id-1 prefix), ComputeCookieHMAC, DecryptKeyMaterial wrappers so the OIDC pre-login adapter shares the same length-prefixed HMAC math without code duplication. - parseCookie no longer hardcodes the `ses-` prefix check (moved to Validate as defense-in-depth; pre-login cookie verification uses the `pl-` prefix via ParseCookieValue). Cookie attributes (all Phase 5 endpoints honor CERTCTL_SESSION_SAMESITE + Secure=true via SessionCookieAttrs from Phase 4 config): - certctl_oidc_pending: Path=/auth/oidc/, MaxAge=600s, SameSite=Lax (cannot be Strict because the IdP-initiated callback is a top-level navigation from a different origin). - certctl_session: Path=/, Expires=8h, SameSite=Lax\|Strict, HttpOnly. - certctl_csrf: Path=/, Expires=8h, HttpOnly=false (intentional — GUI must read it to echo into X-CSRF-Token header). Audit logging on every mutating operation (event_category="auth"): auth.oidc_login_succeeded / failed / unmapped_groups auth.oidc_back_channel_logout / failed auth.session_revoked auth.oidc_provider_{created,updated,deleted,refreshed} auth.group_mapping_{added,removed} OpenAPI updates: - cookieAuth security scheme added to api/openapi.yaml under components.securitySchemes (apiKey / cookie / certctl_session). - The 13 Phase 5 routes are added to SpecParityExceptions with a deferral note: full per-endpoint OpenAPI rows land in a follow-on commit alongside the GUI work (Phase 8) so the ergonomic shape can be validated against the live GUI client. CI guard: scripts/ci-guards/N-bundle-2-security-empty-preserved.sh asserts api/openapi.yaml has ≥ 14 'security: []' occurrences (the pre-Bundle-2 baseline). Reducing the count below 14 would silently force a Bearer-or-cookie requirement onto an endpoint that legitimately runs without certctl-issued credentials; the guard fires before that regression lands. Handler tests (internal/api/handler/auth_session_oidc_test.go): - All 6 prompt-mandated negative cases: BCL with missing events claim -> 400 BCL with nonce present -> 400 (per spec §2.4) BCL with sig signed by an unknown key -> 400 Callback with replayed state -> 400 Callback with PKCE verifier mismatch -> 400 Callback with expired pre-login row -> 400 - Plus happy paths for every endpoint, edge cases (missing-cookie, duplicate-name, in-use-409, wrong-tenant), and the Helper-function coverage (peekIssuer, classifyOIDCFailure, defaultIfBlank, defaultIntIfZero, clientIPFromRequest, encryptClientSecret). Coverage on internal/api/handler/auth_session_oidc.go: 80.9% per-function (above the Phase 5 spec's ≥ 80% floor). Server wiring (cmd/server/main.go): Wired AFTER sessionService (Phase 4) so the OIDC PreLoginAdapter can sign pre-login cookies under the active SessionSigningKey: oidcProviderRepo + oidcMappingRepo + oidcUserRepo + oidcPreLoginRepo -> preLoginAdapter -> oidcService -> authSessionOIDCHandler. sessionMinterAdapter shim bridges *session.Service.Create to the oidcsvc.SessionMinter port the OIDC service consumes. Router wiring (internal/api/router/router.go): 4 public OIDC routes via direct r.mux.Handle (auth-exempt; pinned in AuthExemptRouterRoutes); 9 RBAC-gated routes via r.Register + rbacGate(checker, perm, h). Routes only register when reg.AuthSessionOIDC != nil so pre-Phase-5 builds skip the block entirely. Verifications: gofmt clean, go vet clean across all touched packages, go test -short -count=1 green across internal/api/handler (74 tests + new Phase 5 batch), internal/api/router (parity + auth-exempt allowlist), internal/auth/oidc + session (no regressions), full domain + scheduler + config sweeps green, ci-guard N-bundle-2-security-empty-preserved.sh green (17 ≥ 14 baseline).	2026-05-10 06:08:27 +00:00
shankar0123	60a589ab96	auth-bundle-1 Phase 0-5 closure: demo-mode wire, named-key backfill, AuthCheck enrichment, OpenAPI schema, intermediate-ca comment refresh Closes the 5 gaps the post-Phase-5 audit flagged on dev/auth-bundle-1. C1: cmd/server/main.go now selects auth.NewDemoModeAuth() when CERTCTL_AUTH_TYPE=none and falls back to auth.NewAuthWithNamedKeys otherwise. Pre-closure, the no-op pass-through that NewAuthWithNamedKeys returns for empty keys would have left ActorIDKey / ActorTypeKey / TenantIDKey unpopulated and 401'd every Phase-3.5 rbacGate-wrapped admin route + every Phase-4 RBAC handler in demo deployments. NewDemoModeAuth injects the synthetic 'actor-demo-anon' actor seeded by migration 000029, which holds r-admin at global scope. C2: backfillNamedKeyActorRoles startup hook (cmd/server/auth_backfill.go) iterates CERTCTL_API_KEYS_NAMED entries (and legacy CERTCTL_AUTH_SECRET synthesized fallbacks) and grants r-admin or r-viewer to each via authActorRoleRepo.Grant before the HTTP server starts accepting requests. Idempotent via ON CONFLICT DO NOTHING in the repo. Failures log a warning but are non-fatal — the server still starts and the operator can fix grants via /v1/auth/keys. Helper extracted from main.go so the role-mapping invariant is pinned by 4 focused unit tests (admin->r-admin, non-admin->r-viewer, empty no-op, grant-error non-fatal, nil-logger safe). M1: HealthHandler.AuthCheck now returns actor_id, actor_type, tenant_id, roles, effective_permissions, and admin_via_role when the optional AuthCheckResolver is wired (production path: authCheckResolverAdapter wraps the postgres ActorRoleRepository in main.go). Nil resolver preserves the legacy {status, user, admin} contract for back-compat with pre-Bundle-1 GUIs and test fixtures. Adds 2 regression tests + 1 fake resolver shim. M2: refreshes the stale 'Admin gate: every method calls auth.IsAdmin first' comment on IntermediateCAHandler — the gate moved to router.go::rbacGate via auth.RequirePermission middleware in Phase 3.5; the new comment block points readers there. M4: 11 RBAC routes (auth/me, auth/permissions, 5 role lifecycle, 2 role-permission grant/revoke, 2 actor-role grant/revoke) added to api/openapi.yaml under the [Auth] tag with operationIds and shared AuthRole / AuthRolePermission schemas. AuthCheck path extended with the Bundle-1 enrichment fields. The 11 entries removed from openapi_parity_test.go::SpecParityExceptions. Tests: go vet + staticcheck + go test -short -count=1 green across cmd/server/, internal/auth/, internal/api/router/, and internal/api/handler/. New tests: 4 backfill unit tests, 2 AuthCheck M1 enrichment tests, 1 demo-mode + rbacGate chain integration test (TestRBACGate_DemoModeChainReachesHandler). Branch SECURITY.md (cowork/auth-bundle-1-SECURITY.md, not part of this commit) captures the full posture of dev/auth-bundle-1 as of this closure for the operator's pre-merge review.	2026-05-09 19:33:07 +00:00
shankar0123	7ff2e2de08	auth-bundle-1 Phase 3.5: handler IsAdmin -> router-wrapped RequirePermission Phase 3.5 atomic conversion. The five legacy admin-gated handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) had their in-body auth.IsAdmin checks removed; the gate moved to router.go via auth.RequirePermission middleware wrapping each route. Non-admin operators with the right scoped permission can now reach these endpoints; legacy in-body admin checks no longer block them. Migration 000030_rbac_admin_perms.up.sql ships five admin-only fine-grained permissions: cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage. All five are seeded into r-admin only; operator/viewer/agent/mcp/cli/auditor do not receive them by default. Operators can grant any of these to a custom role via the Phase 4 RBAC API. Idempotent + transaction-wrapped. internal/domain/auth/validate.go::CanonicalPermissions extended with the five new entries so RoleService.AddPermission accepts them. internal/api/router/router.go: HandlerRegistry gains a Checker field (auth.PermissionChecker). New rbacGate(checker, perm, handler) helper wraps a handler with auth.RequirePermission middleware; nil-checker fall-through preserves test/demo deployments without the RBAC stack. 12 admin routes wrapped: cert.bulk_revoke (POST /api/v1/certificates/bulk-revoke + POST /api/v1/est/certificates/bulk-revoke), crl.admin (GET /api/v1/admin/crl/cache), scep.admin (GET /api/v1/admin/scep/profiles + GET /api/v1/admin/scep/intune/stats + POST /api/v1/admin/scep/intune/reload-trust), est.admin (GET /api/v1/admin/est/profiles + POST /api/v1/admin/est/reload-trust), ca.hierarchy.manage (POST /api/v1/issuers/{id}/intermediates + GET /api/v1/issuers/{id}/intermediates + POST /api/v1/intermediates/{id}/retire + GET /api/v1/intermediates/{id}). cmd/server/main.go: HandlerRegistry.Checker wired with the same authPermissionCheckerAdapter shim Phase 4 introduced for AuthHandler. Same adapter; one source of truth. Handler bodies: removed eight in-body auth.IsAdmin checks across the 5 files. bulk_revocation.go's BulkRevoke + BulkRevokeEST, admin_crl_cache.go::ListCache, admin_scep_intune.go's three methods, admin_est.go's two methods, intermediate_ca.go's four methods. Replaced each with a comment naming the new gate location. Unused 'github.com/certctl-io/certctl/internal/auth' imports removed. Test triplet rewrite: deleted obsolete _NonAdmin_Returns403 and _AdminExplicitFalse_Returns403 tests across 6 test files (5 handler tests + bulk_revocation_est_test.go) — they tested the now-removed in-body gate. _AdminPermitted_ForwardsActor tests stay intact: they pin the actor-passthrough invariant which is still relevant. Added internal/api/router/rbac_gate_integration_test.go with four router-level integration tests pinning the new gate: deny → 403 + handler not reached, permit → 200 + handler reached, nil-checker → fall-through, no-actor → 401. M-008 admin-gate registry: AdminGatedHandlers map now empty (Phase 3.5 invariant: zero in-handler auth.IsAdmin call sites; only health.go's informational caller remains). m008_admin_gate_test.go retains the scan to enforce the invariant going forward; new admin-gated routes must wrap at router.go::rbacGate, not gate in-handler. Updated error message to direct future contributors to the new pattern. Verifications: gofmt clean across all touched files; go vet ./... clean; go test -short across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, cmd/server all green. Branch: dev/auth-bundle-1. Commit chain: `99a012e` (Phase 0 extract) -> `19497ee` (Phase 1 schema + repo) -> `bd54d5f` (Phase 2 service) -> `d473398` (Phase 3 primitive) -> `b169f25` (Phase 4 + 5) -> THIS (Phase 3.5 conversion). Phase 6+ (bootstrap, scope-down, auditor, approval-bypass closure, GUI, docs) on subsequent sessions.	2026-05-09 17:00:30 +00:00
shankar0123	b169f258de	auth-bundle-1 Phase 4 + 5: RBAC HTTP API + CLI surface Phase 4 (HTTP API): * internal/api/handler/auth.go: AuthHandler with 12 endpoints under /api/v1/auth/* — ListRoles, GetRole, CreateRole, UpdateRole, DeleteRole, ListPermissions, AddRolePermission, RemoveRolePermission, AssignRoleToKey, RevokeRoleFromKey, Me. callerFromRequest builds an authsvc.Caller from the Phase 3 ActorIDKey/ActorTypeKey/TenantIDKey context values. writeAuthError translates service + repository sentinels into HTTP status codes (401/403/404/409/400/500). 14 handler tests with in-memory fakes pin the HTTP shape + error mapping. * internal/api/router/router.go: HandlerRegistry gains an Auth field; 11 new routes registered. openapi_parity_test SpecParityExceptions extended with the new auth routes (OpenAPI YAML schema land in a Phase 4 follow-up commit so the schema review is its own atomic change; the route shape is fully documented inline via the Go type definitions until then). * cmd/server/main.go: wires the postgres auth repos (RoleRepository, PermissionRepository, ActorRoleRepository) + the Authorizer + RoleService/PermissionService/ActorRoleService into the new AuthHandler. Adds authPermissionCheckerAdapter to bridge the typed-string Authorizer signature to the auth.PermissionChecker interface (avoids an internal/auth → internal/service/auth import cycle). Phase 5 (CLI): * cmd/cli/main.go: adds 'auth' command dispatch with subcommands roles/permissions/keys/me. * internal/cli/auth.go: AuthMe, AuthListRoles, AuthGetRole, AuthListPermissions, AuthAssignRoleToKey, AuthRevokeRoleFromKey methods on Client. Mirrors the Phase 4 HTTP surface. Phase 3.5 (handler IsAdmin → middleware-wrapped RequirePermission) DEFERRED. Honest reasoning: (1) The 5 admin handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) currently gate via auth.IsAdmin checks INSIDE the handler bodies. Converting cleanly requires moving the gate to the router (auth.RequirePermission middleware wrap) AND removing the in-handler check AND rewriting the existing 3-test triplets per handler (M-008 pinned: _NonAdmin_Returns403 / _AdminExplicitFalse_Returns403 / _AdminPermitted_ForwardsActor) because the existing tests call the handler function directly, bypassing middleware. After conversion, those tests would pass without 403'ing because the gate moved away — the test invariants need to flow through a router-level integration setup instead. (2) Picking the right permission per handler is a security-review-worthy decision. Using existing operator-class perms (cert.revoke, issuer.edit) widens access from admin-only to operator-class; adding new admin-only perms (cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage) requires a migration 000030 plus a coordinated catalogue update in internal/domain/auth/validate.go. Both options are defensible but warrant a focused commit, not a 5-handler sweep mixed in with the API + CLI work. (3) The conversion can be done now without functional regressions IF we leave the in-handler IsAdmin checks in place AND add middleware wraps as defense-in-depth — but that's the worst of both worlds (legacy gate still blocks non-admin operators, defeating the point of RBAC; new gate adds runtime cost with no semantic change). A clean conversion needs the in-handler check removed. Concrete plan for Phase 3.5 (separate commit, next session): (a) add new admin-only perms via migration 000030 OR document the widening to operator-class; (b) wrap each of the 5 admin routes with auth.RequirePermission(checker, perm, nil) in router.go; (c) remove auth.IsAdmin checks from the 5 handler bodies; (d) move the M-008 _NonAdmin/_AdminExplicitFalse tests to router-level integration tests, keep _AdminPermitted as a direct handler test for actor-passthrough; (e) update m008_admin_gate_test.go registry to track auth.RequirePermission middleware wraps in router.go instead of auth.IsAdmin call sites in handler files. Verifications: go vet ./... clean; gofmt clean across all touched files; go test -short -count=1 across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, internal/cli, cmd/server, cmd/cli all green (one transient too-many-open-files retry on internal/cli + internal/api/router; second run clean). Branch: dev/auth-bundle-1. Commit chain: `99a012e` (Phase 0 extract) -> `19497ee` (Phase 1 schema + repo) -> `bd54d5f` (Phase 2 service) -> `d473398` (Phase 3 primitive) -> THIS (Phase 4 + 5).	2026-05-09 16:43:48 +00:00
shankar0123	f6ba5634fd	ci: fix Phase 4 post-push gofmt failure (map-literal alignment) CI on commit `4dc8d3f` (Phase 4) failed gofmt on internal/api/router/openapi_parity_test.go. The 6 new SpecParity- Exceptions entries I added for the Phase 4 routes had over-padded whitespace between key and value; the longest new key is '"GET /acme/profile/{id}/renewal-info/{cert_id}":' which sets the gofmt-canonical column width for the surrounding block, but my hand-aligned values used the wider Phase-2 column width (set by the even-longer 'POST /acme/profile/{id}/order/{ord_id}/finalize' key in that block). gofmt aligns map-literal columns per contiguous run between blank lines / structural breaks, not file-globally. The Phase 4 entries form their own run because they're separated from the Phase 2 block by the '// Phase 4 — key rollover + revocation + ARI.' comment. Fix: 'gofmt -w' on the file, which rewrote the 6 lines with the correct (narrower) intra-block alignment. No semantic change — just whitespace. Confirmed: 'gofmt -l .' clean; 'go vet ./internal/api/router/' clean (the test still passes after the formatting change).	2026-05-03 18:58:00 +00:00
shankar0123	4dc8d3fa5b	acme-server: key rollover + revocation + ARI (Phase 4/7) Closes the RFC 8555 + RFC 9773 surface beyond the issuance happy-path: - POST /acme/profile/<id>/key-change (RFC 8555 §7.3.5) - POST /acme/profile/<id>/revoke-cert (RFC 8555 §7.6) - GET /acme/profile/<id>/renewal-info/<cert-id> (RFC 9773 ARI) After this commit, ACME clients can rotate account keys, revoke certs through the ACME surface (rather than only via the certctl GUI/API), and fetch ARI for proactive renewal scheduling. Architecture: - Key rollover: outer JWS verified against the registered account key (existing kid path); the inner JWS — embedded as the outer's payload — verified against the embedded NEW jwk in a new dedicated routine (ParseAndVerifyKeyChangeInner) that enforces RFC 8555 §7.3.5 inner-only invariants: MUST use jwk + MUST NOT use kid, payload .account == outer.kid, payload.oldKey thumbprint-equals registered. A single WithinTx swaps the stored thumbprint+pem and writes the audit row. Concurrent-rollover safety via SELECT…FOR UPDATE on the conflicting account row in UpdateAccountJWKWithTx; the loser observes the winner's new thumbprint and is told to retry (409). - Revocation: two auth paths. kid → AccountOwnsCertificate single- indexed COUNT lookup over acme_orders. jwk → constant-time RFC 7638 thumbprint compare against the cert's pubkey. Both paths route through service.RevocationSvc.RevokeCertificateWithActor so the existing CRL/OCSP refresh + audit + metrics pipeline applies. RFC 5280 §5.3.1 numeric reason codes clamp to certctl's domain.ValidRevocationReasons; codes 8 (removeFromCRL) + 10 (aACompromise) clamp to 'unspecified' since they aren't in the set. - ARI is GET-only and unauth per RFC 9773 §4. Cert-id wire shape is base64url(AKI).base64url(serial); ParseARICertID strict-decodes, SerialHex emits the canonical certctl-shape lowercase-no-leading- zeros hex used in certificate_versions.serial_number. ComputeRenewalWindow has 3 branches: bound RenewalPolicy → [notAfter - days, notAfter - days/2]; no policy → last 33% of validity; past expiry → [now, now + 1d] (renew immediately). Retry-After honors CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL. What ships: - internal/api/acme/{keychange,ari}.go (+ phase4_test.go: 15 tests). - internal/api/acme/order.go: RevokeCertRequest wire shape. - internal/api/handler/acme.go: KeyChange, RevokeCert, RenewalInfo + 11 new writeServiceError mappings. - internal/repository/postgres/acme.go: UpdateAccountJWKWithTx (FOR UPDATE + expectedOldThumbprint precondition; ErrACMEAccountKey- ConcurrentUpdate sentinel) + AccountOwnsCertificate. - internal/service/acme.go: RotateAccountKey + RevokeCert + RenewalInfo; CertificateRevoker + RenewalPolicyLookup interfaces; SetRevocationDelegate + SetRenewalPolicyLookup wiring; 11 new sentinels; 6 new metrics. - internal/service/acme_phase4_test.go: service-layer tests for RotateAccountKey (happy + duplicate-key) + RevokeCert (kid mismatch + jwk mismatch + jwk happy + already-revoked + reason-clamping) + RenewalInfo (disabled + bad cert-id). - internal/api/router/router.go: 6 new register calls (3 per-profile + 3 shorthand). Router parity exceptions extended in lockstep (in-tree SpecParityExceptions + CI-only openapi-handler-exceptions .yaml). - cmd/server/main.go: SetRevocationDelegate(revocationSvc) + SetRenewalPolicyLookup(renewalPolicyRepo) at startup. - internal/config/config.go: CERTCTL_ACME_SERVER_ARI_ENABLED (default true) + CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL (default 6h); BuildDirectory's ariEnabled flag now flips on under cfg.ARIEnabled. - docs/acme-server.md: phase status flipped to Phase 4; endpoints table grows 6 rows (3 per-profile + 3 shorthand); FAQ section appended explaining how to rotate keys, revoke certs, and consume ARI. Tests: - 'go vet ./...' clean across the repo. - 'go test -short -count=1 ./...' green across every package. - phase4_test.go covers: keychange happy-path + 5 negatives + MapKeyChangeErrorToProblem coverage; ARI cert-id round-trip + 6 malformed cases + BuildARICertID from a generated cert; window- math 3 branches. - service-layer tests confirm: RotateAccountKey atomically swaps the thumbprint (verifies persisted state) and rejects duplicate keys; RevokeCert routes through the stub RevocationSvc with the right actor string + reason on the jwk path, rejects mismatched keys, rejects already-revoked certs, clamps reason codes correctly; RenewalInfo respects ARIEnabled + cert-id format. Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-4'.	2026-05-03 16:51:06 +00:00
shankar0123	9bc845304e	acme-server: HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (Phase 3/7) Wires up the actual challenge-validation machinery so profiles in acme_auth_mode='challenge' resolve end-to-end. After this commit, cert-manager 1.15+ with `solver: http01: ingress` against a challenge-mode profile completes a real HTTP-01 flow and gets a cert. DNS-01 + TLS-ALPN-01 share the same code path with the appropriate validator selection. Architecture (the load-bearing parts): - 3 separate semaphore-bounded worker pools (one per challenge type), so HTTP-01 and DNS-01 can't starve each other under load. Default weight 10 per type; tunable via CERTCTL_ACME_SERVER_HTTP01_CONCURRENCY, DNS01_CONCURRENCY, TLSALPN01_CONCURRENCY. - 30s per-challenge timeout (configurable via PoolConfig.PerChallengeTimeout). - HTTP-01 validator runs validation.IsReservedIPForDial (newly exported wrapper preserving the existing private impl byte-for-byte for the network scanner + ValidateSafeURL paths) on the resolved IP — both at the initial dial and every redirect hop. SSRF probes into private IP space are refused before the connect. - DNS-01 validator uses a dedicated resolver pointed at CERTCTL_ACME_SERVER_DNS01_RESOLVER (default 8.8.8.8:53) — does NOT use the system resolver to keep behavior deterministic across deployments. Wildcard handling: `.example.com` queries _acme-challenge.example.com. - TLS-ALPN-01 validator (RFC 8737) connects with ALPN `acme-tls/1`, inspects the id-pe-acmeIdentifier extension (OID 1.3.6.1.5.5.7.1.31), asserts the ASN.1 OCTET STRING value equals SHA-256 of the key authorization. Cert chain is intentionally NOT validated (InsecureSkipVerify=true is correct per RFC 8737 — the proof is in the extension, not the chain). Documented in docs/tls.md L-001 table + the //nolint:gosec comment carries the justification. SSRF guard: same posture as HTTP-01. - Validation is asynchronous: handler accepts the POST and returns 200 immediately with status=processing; the worker-pool fires a callback that updates challenge → authz → order in a fresh background-context WithinTx. The order auto-promotes to `ready` when ALL authzs become valid; auto-fails to `invalid` when ANY authz becomes invalid. What ships: - internal/api/acme/challenge.go: KeyAuthorization (RFC 8555 §8.1) + DNS01TXTRecordValue (§8.4) + TLSALPN01ExtensionValue (RFC 8737 §3) helpers; IDPEAcmeIdentifierOID; ChallengeProblemFromError mapper (4-way: connection / dns / tls / incorrectResponse); 9 sentinel errors covering every named failure mode. - internal/api/acme/validators.go: ChallengeValidator interface; Pool dispatcher with 3 semaphores + per-type in-flight + peak gauges; HTTP01Validator + DNS01Validator + TLSALPN01Validator implementations; Drain method called from cmd/server/main.go's shutdown sequence. - internal/api/acme/validators_test.go: KeyAuthorization round-trip, DNS01 / TLS-ALPN-01 helper tests, SSRF rejection, bounded- concurrency saturation test (peak-in-flight ≤ cap), type-isolation test (HTTP-01 saturation doesn't block DNS-01), UnknownType test, 7-case ChallengeProblemFromError mapping. - internal/repository/postgres/acme.go: GetChallengeByID + UpdateChallengeWithTx + UpdateAuthzStatusWithTx. - internal/service/acme.go: SetValidatorPool wires the acme.Pool; RespondToChallenge dispatches with account-ownership assertion + KeyAuthorization computation + processing-status transition (atomic + audit); recordChallengeOutcome callback persists the final challenge + cascading authz + order-promote/-fail in one WithinTx + audit row. 4 new metrics. - internal/api/handler/acme.go: Challenge handler; round-trips account.JWKPEM through ParseJWKFromPEM to recover the *jose.JSONWebKey the validator pool needs. - internal/api/router/router.go + openapi_parity_test.go + api/openapi-handler-exceptions.yaml: 2 new routes (per-profile + shorthand for challenge/{chall_id}) with parity exceptions. - cmd/server/main.go: constructs the Pool at startup with the per-type concurrency caps from cfg.ACMEServer; ACMEService.ValidatorPool() accessor exposed for the shutdown drain sequence. - internal/validation/ssrf.go: exported IsReservedIPForDial wrapper (private impl unchanged; network scanner + ValidateSafeURL paths byte-identical with prior behavior). - docs/tls.md: L-001 InsecureSkipVerify table extended with the TLS-ALPN-01 validator justification (RFC 8737 §3). - docs/acme-server.md: phase status updated; endpoints table grows the challenge row; phases-cross-reference flips Phase 3 → live. Tests: - 80%+ coverage on the new files. - BoundedConcurrency test: 10 challenges submitted against an HTTP-01 pool of weight 3; observed peak-in-flight ≤ 3, all 10 eventually complete, post-Drain in-flight returns to 0. - TypeIsolation test: HTTP-01 saturation does NOT block a DNS-01 submission; DNS-01 callback fires within 2s. - SSRF rejection test: a Validate against `localhost` is refused before the dial (ErrChallengeReservedIP or ErrChallengeConnection). Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-3".	2026-05-03 14:09:00 +00:00
shankar0123	c351bba41a	acme-server: orders + authorizations + finalize + cert download (Phase 2/7) Closes the issuance loop in trust_authenticated mode (commits `ec88a61` + `44a85d6` wired the foundation + JWS-verified account resource). After this commit, an ACME client running against a profile with acme_auth_mode='trust_authenticated' end-to-end-issues a real cert: POST /acme/profile/<id>/new-order → 201 + order URL (status=ready) POST /acme/profile/<id>/order/<oid> → POST-as-GET fetch POST /acme/profile/<id>/order/<oid>/finalize → 200 + status=valid + cert URL POST /acme/profile/<id>/cert/<cid> → 200 + PEM chain Profiles with acme_auth_mode='challenge' get the same code path with authz/challenge rows in `pending` state until Phase 3's validators wire up. The mode is read from the bound profile's column at request time, NOT cached at server start — operators flipping the column via SQL take effect on the next order without restart. Architecture (the load-bearing part): - Finalize routes through service.CertificateService.Create — the canonical certctl issuance entry point that wraps the managed_certificates row insert + audit row in s.tx.WithinTx. RenewalPolicy / CertificateProfile / per-issuer-type Prometheus metrics / audit rows all apply uniformly to ACME-issued certs via the same code path that already serves EST/SCEP/agent/REST issuance. - Identifier validation runs BEFORE order creation. Rejected identifiers return RFC 7807 with per-identifier subproblems and create no order row. - Source stamp on managed_certificates: domain.CertificateSourceACME. Operators bulk-revoke ACME-issued certs by filtering on Source=ACME. - 3-step atomicity boundary documented in code + this commit msg: (A) WithinTx-A marks order processing + audit row. (B) IssuerConnector.IssueCertificate + CertificateService.Create (each in its own WithinTx — Create wraps cert row + audit atomically). (C) WithinTx-C creates certificate_versions row + transitions order to valid + sets certificate_id + audit row. The brief window between B and C can leave a managed_certificates row whose order is still in `processing`. Phase 5's GC scheduler reconciles. Documented inline. What ships: - internal/api/acme/order.go: OrderResponseJSON + AuthorizationResponseJSON + ChallengeResponseJSON + NewOrderRequest + FinalizeRequest wire shapes; ValidateIdentifiers (Phase 2 syntactic checks, dns-only); CSRMatchesIdentifiers (RFC 8555 §7.4 strict equality, case-folded). - internal/domain/acme.go: ACMEOrder + ACMEAuthorization + ACMEChallenge + ACMEIdentifier + ACMEProblem domain types + closed status enums for each (order: pending\|ready\|processing\|valid\|invalid; authz: pending\|valid\|invalid\|deactivated\|expired\|revoked; challenge: pending\|processing\|valid\|invalid; challenge type: http-01\|dns-01\| tls-alpn-01). - internal/domain/profile.go: new ACMEAuthMode field reading from certificate_profiles.acme_auth_mode (added in migration 25). - internal/domain/certificate.go: new CertificateSourceACME enum value. - internal/repository/postgres/profile.go: extended SELECT/scanProfile to read the per-profile acme_auth_mode column with a COALESCE default of trust_authenticated. - internal/repository/postgres/acme.go: full order/authz/challenge CRUD (CreateOrderWithTx + GetOrderByID + UpdateOrderWithTx + CreateAuthzWithTx + GetAuthzByID + ListAuthzsByOrder + ListChallengesByAuthz + CreateChallengeWithTx) with proper sql.NullTime + JSONB handling. scanACMEOrder / scanACMEAuthz / scanACMEChallenge helpers. - internal/service/acme.go: extended ACMERepo interface; new SetIssuancePipeline wires certificateService + certificateRepo + issuerRegistry. CreateOrder (auth-mode-dispatched: trust_authenticated auto-marks order ready + authz valid + 1 placeholder http-01 challenge valid; challenge mode keeps everything pending). LookupOrder (with account-ownership assertion). LookupAuthz. ListAuthzsByOrder. FinalizeOrder (3-step atomicity boundary as above; CSR-vs-order SAN strict-equality check before issuance; persists FinalizeOrderResult {Order, CertID}). LookupCertificate. randIDSuffix + base32encode helpers for the human-readable acme-ord-* / acme-authz-* / acme-chall-* prefixes (CLAUDE.md "TEXT primary keys with human- readable prefixes" architecture decision). 8 new per-op metrics. - internal/service/acme_test.go: extended fakeACMERepo with Phase 2 interface stubs; new orderTrackingRepo for observable persistence; 2 new tests asserting trust_authenticated → auto-ready/valid and challenge → stays-pending. - internal/api/handler/acme.go: NewOrder + Order + OrderFinalize + Authz + Cert handler methods. orderURL / authzURL / certURL / challengeURLBuilder helpers; marshalOrderForResponse fetches per-order authzs to populate the URL list. parseOptionalTime for notBefore / notAfter. - internal/api/handler/acme_handler_test.go: extended mockACMEService with Phase 2 method stubs; 4 new handler tests (NewOrder happy + rejected-identifier + OrderFinalize bad-CSR + Cert happy). - internal/api/router/router.go: 10 new Register calls (5 per-profile + 5 shorthand) for new-order, order/{ord_id}, order/{ord_id}/finalize, authz/{authz_id}, cert/{cert_id}. - internal/api/router/openapi_parity_test.go + api/openapi-handler-exceptions.yaml: 10 new exception entries. - cmd/server/main.go: SetIssuancePipeline at startup, threading certificateService + certificateRepo + issuerRegistry into ACMEService. - docs/acme-server.md: phase status updated; endpoints table grows 5 rows for new-order/order/finalize/authz/cert (per-profile + shorthand variants); new section "Finalize routing through CertificateService.Create" documenting the 3-step atomicity boundary + the actor-string convention `acme:<account-id>`. Tests: ACME package + service + handler + router + config + domain all green under -short. New cases: - TestCreateOrder_TrustAuthenticated_AutoReady (asserts auto-ready transition + valid-status authz/challenge + audit row + metric bump). - TestCreateOrder_ChallengeMode_StaysPending (asserts pending-status cascading authz/challenge for challenge mode). - TestACMEHandler_NewOrder_HappyPath (asserts 201 + Location + finalize URL shape). - TestACMEHandler_NewOrder_RejectedIdentifier (asserts 400 + RFC 7807 rejectedIdentifier + per-identifier subproblems for type=ip). - TestACMEHandler_OrderFinalize_BadCSR (asserts 400 + badCSR for non-base64 CSR field). - TestACMEHandler_Cert_HappyPath (asserts 200 + PEM content-type + PEM chain in body). Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-2".	2026-05-03 13:46:10 +00:00
shankar0123	44a85d6f85	acme-server: account resource + JWS verifier (Phase 1b/7) Layers JWS-authenticated POST machinery onto the Phase 1a foundation (commit `ec88a61`). After this commit, an ACME client can run POST /acme/profile/<id>/new-account against certctl and successfully register an account. Account update + deactivation via POST /acme/profile/<id>/account/<acc-id> work. Orders + challenges remain Phase 2 / 3. Background: Two prior dispatch attempts at the original Phase 1 ("skeleton + directory + new-nonce + new-account" as a single commit) failed on go-jose v4 API speculation (jws.GetPayload, sig.Algorithm, jose.SHA256, etc. — none of those exist in v4). Splitting Phase 1 into 1a (foundation, no go-jose) and 1b (this commit, all go-jose in one place) concentrated the JWS work where attention pays off. The verifier reads the actual go-jose v4 surface — ParseSigned with closed alg allow-list, Header struct fields (Algorithm, KeyID, JSONWebKey, Nonce, ExtraHeaders[HeaderKey]), JWK.Thumbprint with stdlib crypto.SHA256. What ships: - internal/api/acme/jws.go: 487-line verifier + sentinel error family. Enforces RFC 8555 §6.2 + §6.4 + §6.5 invariants: - alg in {RS256, ES256, EdDSA} (closed allow-list passed to jose.ParseSigned — HS256 / none / etc. rejected at parse time) - exactly one of `kid` / `jwk` in protected header (per endpoint policy — new-account demands jwk, others demand kid) - protected `url` matches request URL exactly - protected `nonce` consumed against acme_nonces (badNonce on miss/replay/expiry per RFC 8555 §6.5.1) - kid round-trips against canonical AccountKID(accountID) URL (catches cross-profile / cross-host replay) - kid path: account exists + status=valid (deactivated / revoked accounts cannot authenticate) - signature verifies; post-Verify payload bytes equal UnsafePayloadWithoutVerification (defense in depth) + JWK persistence helpers (JWKToPEM / ParseJWKFromPEM round- trip a public-only JWK as a PEM-wrapped JSON envelope; stored as TEXT in acme_accounts.jwk_pem for diff-friendliness) + JWKThumbprint per RFC 7638. - internal/api/acme/jws_test.go: 16 cases covering happy paths (RS256 kid, ES256 jwk, EdDSA kid) + every named failure mode (alg-not-allowed, bad-sig, missing-nonce, unknown-nonce, replay, url-mismatch, mixed kid+jwk, deactivated-account, cross-host kid). Uses real keypairs + real go-jose Signer to build JWS objects. - internal/api/acme/account.go: NewAccountRequest / AccountUpdateRequest payload shapes (RFC 8555 §7.3 + §7.3.2 + §7.3.6) + AccountResponseJSON wire shape + MarshalAccount helper. - internal/domain/acme.go: ACMEAccount struct + ACMEAccountStatus closed enum (valid / deactivated / revoked). - internal/repository/postgres/acme.go: full account CRUD path (CreateAccountWithTx with 23505-unique-violation sentinel translation, GetAccountByID, GetAccountByThumbprint, UpdateAccountContactWithTx, UpdateAccountStatusWithTx) + sql.ErrNoRows-wrapped repository.ErrNotFound on lookup misses. - internal/service/acme.go: ACMERepo interface extended; SetTransactor + SetAuditService wires; NewAccount (idempotent re-registration per RFC 8555 §7.3.1 — same JWK returns existing row without an update or new audit event); LookupAccount; UpdateAccount; DeactivateAccount; VerifyJWS adapter that bridges api/acme.VerifierConfig to the service-layer ACMERepo; per-op metrics extended (new_account_total + _failures_total + _idempotent_total + update_account_total + _failures_total + deactivate_account_total). - internal/service/acme_test.go: 8 new tests covering new-account happy path / idempotent re-registration / only- return-existing match + no-match / contact update / deactivate / lookup-not-found / requires-transactor. - internal/api/handler/acme.go: NewAccount + Account handlers. Account dispatches POST-as-GET (RFC 8555 §6.3 — empty body or {} payload returns the account row), contact update, and deactivation from the same endpoint. Defense-in-depth check that the kid path-segment matches the URL path-segment (the verifier already round-tripped the kid against canonical URL, but the handler re-asserts to catch any future verifier refactor). - internal/api/handler/acme_handler_test.go: 7 new cases covering happy-create, idempotent-200, only-return-existing- no-match-400, malformed-JWS-400, kid-URL-mismatch-401, deactivate, contact-update, POST-as-GET. - internal/api/router/router.go: 4 new Register calls (per- profile + shorthand for new-account and account/{acc_id}). - internal/api/router/openapi_parity_test.go: SpecParityExceptions extended with the 4 new routes (RFC 8555 wire-protocol surface, not OpenAPI-shaped — same precedent as Phase 1a). - cmd/server/main.go: SetTransactor + SetAuditService on acmeService at startup so the WithinTx-based new-account / update / deactivate paths run with the same transactor instance shared across CertificateService / RevocationSvc / RenewalService. - docs/acme-server.md: Phase status updated; endpoints table grows new-account + account/<acc_id> rows; new "JWS verification (Phase 1b)" section enumerates the 7 invariants the verifier enforces; phases-cross-reference table marks 1b live. - go.mod / go.sum: github.com/go-jose/go-jose/v4 v4.0.4 added. Atomicity: every account-state mutation writes its acme_accounts row + its audit_events row inside one repository.Transactor.WithinTx call — the canonical certctl atomicity contract (matches CertificateService.Create at internal/service/certificate.go:131). Idempotent re-registration explicitly does NOT write an audit row (RFC 8555 §7.3.1 returns the existing row unmodified). Tests: 16 jws_test.go cases + 11 service tests + 11 handler tests all pass under -short. Bad-signature test uses a real registered account whose stored JWK is a different keypair from the signer's, so the JWS parses cleanly but jose.Verify rejects — exercises the ErrJWSSignatureInvalid path directly. Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1b".	2026-05-03 13:21:56 +00:00
shankar0123	ec88a61274	acme-server: foundation — directory + new-nonce + per-profile routing (Phase 1a/7) First slice of the RFC 8555 ACME server endpoint (master plan at cowork/acme-server-endpoint-prompt.md, per-phase prompts at cowork/acme-server-prompts/). This commit lands the smallest viable end-to-end deployable slice: an ACME client running curl -sk https://certctl/acme/profile/<id>/directory curl -sk -I https://certctl/acme/profile/<id>/new-nonce successfully fetches the directory document and a Replay-Nonce. Account creation, JWS verification, orders, challenges, and revocation are all out of scope for this phase and arrive in Phases 1b–4. Closes the Rank 1 LHF from the 2026-05-03 Infisical deep-research (cowork/infisical-deep-research-results.md). Pre-fix, certctl was an ACME consumer only — no /acme/directory endpoint, no JWS verifier, no challenge validators. K8s customers running cert-manager could not point at certctl as an ACME issuer; they had to deploy a certctl agent on every node. What ships: - internal/api/acme/{directory,nonce,errors}.go (+ tests). - internal/api/handler/acme.go + acme_handler_test.go. - internal/repository/postgres/acme.go (nonce ops only — Phase 1b extends with account CRUD; Phases 2-4 extend with order / authz / challenge CRUD). - internal/service/acme.go (BuildDirectory + IssueNonce stubs; Phase 1b adds VerifyJWS / NewAccount / etc.). - migrations/000025_acme_server.{up,down}.sql ships the full 5-table ACME schema (acme_accounts / acme_orders / acme_authorizations / acme_challenges / acme_nonces) PLUS the per-profile certificate_profiles.acme_auth_mode column. Phase 1a actively uses only acme_nonces; remaining tables are empty until Phases 1b-4 plug in. - internal/config/config.go: ACMEServerConfig struct + ACMEServer field on Config. Env vars use CERTCTL_ACME_SERVER_* prefix to avoid colliding with the existing consumer-side ACMEConfig at config.go:1746 (CERTCTL_ACME_DIRECTORY_URL / PROFILE / CHALLENGE_TYPE etc.). Phase 1a wires Enabled + DefaultAuthMode + DefaultProfileID + NonceTTL + DirectoryMeta; Order/Authz TTLs + per-challenge-type concurrency caps + DNS01 resolver are reserved fields parsed in 1a so operators can set them ahead of Phases 2/3. - cmd/server/main.go: wire ACMEHandler into the HandlerRegistry literal alongside the existing certificate / EST / SCEP / etc. handlers. - internal/api/router/router.go: HandlerRegistry.ACME field + 6 Register calls (3 per-profile + 3 shorthand). - internal/api/router/openapi_parity_test.go: 6 new entries in SpecParityExceptions. ACME is a wire-protocol surface (JWS-signed JSON over HTTPS per RFC 7515) whose semantics are dictated by RFC 8555 + RFC 9773 rather than by an OpenAPI document, same precedent as SCEP/EST. The canonical reference is docs/acme-server.md. - docs/acme-server.md: Phase-1a-shaped reference. Configuration table for every CERTCTL_ACME_SERVER_* env var. Per-profile auth-mode decision tree skeleton. TLS trust bootstrap section flagging cert-manager's ClusterIssuer.spec.acme.caBundle requirement (the single biggest first-time-deploy footgun; the full cert-manager walkthrough lands in Phase 6 but the requirement is documented up front). Architecture decisions baked in: - URL family is /acme/profile/<id>/* (per-profile, canonical) with /acme/* shorthand active when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID is set. Path matches existing per-profile precedent in EST + SCEP. - Auth mode is per-profile (acme_auth_mode column on certificate_profiles), NOT server-wide. One certctl-server can serve trust_authenticated for an internal-PKI profile and challenge for a public-trust-style profile simultaneously. The column is read at request time, not cached at server start — operators flipping a profile's mode via SQL take effect on the next order without restart. - Nonces are DB-backed (acme_nonces table). Survive server restart. The RFC 8555 §6.5 replay defense requires the store to outlast the client's nonce caching window; an in-memory-only nonce store would lose every in-flight order on restart. - Per-op atomic counters on service.ACMEService.Metrics() — certctl_acme_directory_total, certctl_acme_directory_failures_total, certctl_acme_new_nonce_total, certctl_acme_new_nonce_failures_total. Naming follows certctl frozen decision 0.10 cardinality discipline. Phase 1b will extend with new_account counters; Phase 2 with order / finalize / cert; Phase 3 with per-challenge-type counters. Audit fixes #11 + #12 (cowork/acme-server-prompts/audit-additions.md) applied: - #11: CERTCTL_ACME_SERVER_* prefix avoids the consumer-side CERTCTL_ACME_* namespace collision. - #12: prior-attempt WIP from two failed Phase-1 dispatches was discarded at phase start; this commit starts from a clean tree. Tests: - 14 unit tests in internal/api/acme/ (directory, nonce, errors). - 7 handler-level tests via httptest.NewServer + mockACMEService (mirrors the mockSCEPService pattern at scep_handler_test.go). - 7 service-layer tests with mocked repo + injected profileLookup. - All pass under -race -count=1 -short. Deferred to Phase 1b: - JWS verification (go-jose v4 — see master-prompt §8a for the API surface and audit doc for the speculation pitfalls). - new-account / account/<id> endpoints + AccountService. - Nonce consumption path (issue path is in this commit; consume is only invoked by JWS-verified POSTs which Phase 1b adds). Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1a". Per-phase implementation plan: cowork/acme-server-prompts/. Master plan + audit fixes: cowork/acme-server-endpoint-prompt.md + cowork/acme-server-prompt-audit.md + cowork/acme-server-prompts/audit-additions.md.	2026-05-03 12:55:40 +00:00
shankar0123	a12a437664	feat(scep): mTLS sibling route /scep-mtls/<pathID> (opt-in) SCEP RFC 8894 + Intune master bundle — Phase 6.5 of 14 (opt-in, enterprise-procurement-checkbox). Closes the procurement-team objection that 'shared password authentication' is a checkbox-fail regardless of how strong the password is. The clean answer: a sibling route that adds client-cert auth at the handler layer AND keeps the challenge password (defense in depth, not replacement). Devices present a bootstrap cert from a trusted CA (e.g. a manufacturing-time cert), then SCEP-enroll for their long-lived cert. Same model Apple's MDM and Cisco's BRSKI use. internal/config/config.go * SCEPProfileConfig gains MTLSEnabled bool + MTLSClientCATrustBundlePath string. Indexed env-var loader reads CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED + CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH. * Validate() refuses MTLSEnabled=true with empty bundle path — structural defense in depth ahead of the file-content preflight. cmd/server/main.go * preflightSCEPMTLSTrustBundle: file existence + PEM parse + ≥1 CERTIFICATE block + non-expired check. Returns the parsed x509.CertPool ready to inject into the per-profile SCEPHandler. Failures os.Exit(1) with the offending PathID in the structured log. SCEP startup loop walks each profile; when MTLSEnabled, runs preflight, builds the per-profile pool, contributes the bundle's certs to the union pool that backs the TLS-layer VerifyClientCertIfGiven, clones the SCEPHandler with SetMTLSTrustPool, and registers the parallel sibling route via apiRouter.RegisterSCEPMTLSHandlers. * Union pool published to outer scope as scepMTLSUnionPoolForTLS; passed to buildServerTLSConfigWithMTLS so the listener serves both /scep[/<pathID>] (no client cert) and /scep-mtls/<pathID> (cert required at handler layer) on the same socket. * Final-handler dispatch gains /scep-mtls + /scep-mtls/* prefix routing through the no-auth chain (auth boundary is the client cert + challenge password, NOT a Bearer token). cmd/server/tls.go * New buildServerTLSConfigWithMTLS that wraps buildServerTLSConfig + sets ClientCAs + ClientAuth=VerifyClientCertIfGiven when a non-nil pool is passed. nil pool = identical TLS shape to the pre-Phase-6.5 builder (no behavior change for deploys without mTLS profiles). * Critical: VerifyClientCertIfGiven (NOT RequireAndVerifyClientCert) so a client that doesn't present a cert can still hit the standard /scep route. The per-profile gate at the handler layer enforces 'cert required' on /scep-mtls/<pathID>. internal/api/handler/scep.go * SCEPHandler gains mtlsTrustPool x509.CertPool field + SetMTLSTrustPool method. Per-profile pool injected by cmd/server/main.go after preflight. HandleSCEPMTLS wrapper: gates on r.TLS.PeerCertificates non-empty + per-profile cert.Verify against THIS profile's pool. Returns HTTP 401 for missing/untrusted cert (mTLS failure is auth, not authorization). Returns HTTP 500 if mtlsTrustPool is nil (deploy bug — the route shouldn't have been registered). On success delegates to HandleSCEP — defense in depth: mTLS is additive, NOT replacement; the standard SCEP code path including the challenge-password gate still executes. * Per-profile re-verification via cert.Verify(...) is critical: the TLS layer verified against the UNION pool, so a cert that chains to profile A's bundle would pass TLS even when targeting profile B. The handler-layer gate prevents cross-profile bleed-through. internal/api/router/router.go * AuthExemptDispatchPrefixes gains '/scep-mtls' (auth boundary is client cert + challenge password, NOT Bearer token). * RegisterSCEPMTLSHandlers parallel to RegisterSCEPHandlers: empty PathID maps to /scep-mtls root; non-empty maps to /scep-mtls/<pathID>. Each handler in the map MUST have had SetMTLSTrustPool called. internal/api/router/openapi_parity_test.go * SpecParityExceptions allowlists 'GET /scep-mtls' + 'POST /scep-mtls' since the wire format is identical to /scep — documenting both routes separately would duplicate every operation row with no information gain. Documented alternative in docs/legacy-est-scep.md. internal/api/handler/scep_mtls_test.go (new, ~210 LoC) * 6 tests + 2 helpers covering the auth contract: 1. RejectsMissingClientCert — request with r.TLS=nil → 401 2. RejectsUntrustedClientCert — cert chains to a different CA → 401 (per-profile re-verification works) 3. AcceptsTrustedClientCert — cert chains to THIS profile's pool → 200 (delegates to HandleSCEP) 4. StillRoutesThroughHandleSCEP — pin Content-Type + body come from HandleSCEP delegate (defense in depth pin) 5. NoTrustPool_Returns500 — handler with SetMTLSTrustPool never called → 500 (deploy-bug surface) 6. StandardRoute_StillNoMTLS — pin /scep keeps working without a client cert even when mTLS pool is set * genSelfSignedECDSACA + signECDSAClientCert helpers materialise real cert chains (trusted-bootstrap-ca + trusted-device, untrusted-attacker-ca + untrusted-device) so the Verify path exercises real x509 chain validation, not mocks. docs/features.md * SCEP env-vars table extended with the two new MTLS env vars (CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED, CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH). Closes the G-3 'env var defined in Go but never documented' gate. docs/legacy-est-scep.md * New 'mTLS sibling route (Phase 6.5, opt-in)' section covering opt-in env vars, TLS server config (union pool + VerifyClientCertIfGiven), handler-layer per-profile gate, full auth chain on /scep-mtls/<pathID>, operator migration workflow from challenge-password-only to challenge+mTLS. cowork/CLAUDE.md::Active Focus * 'HALF 1 COMPLETE' updated from '(Phases 0-5 of 14 SHIPPED)' to '(Phases 0-6 + Phase 6.5 of 14 SHIPPED)'. Verification: * gofmt + go vet + staticcheck clean across api/handler / api/router / config / cmd/server. * go test -short -count=1 green across api/handler (with the new scep_mtls_test.go) / api/router / service / config / pkcs7 / cmd/server / connector/issuer/local. * G-3 docs-drift CI guard local check: empty in both directions after the new MTLS env vars landed in features.md. * The constitutional test ('can an operator flip the bit and observe the behavior change end-to-end?') is YES: setting CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED=true plus the trust bundle path produces a working /scep-mtls/<pathID> endpoint that accepts trusted client certs + rejects untrusted ones, with no further code changes required. Phase 6.5 of 14 in SCEP RFC 8894 + Intune master bundle. Half 1 (Phases 0-6 + 6.5) is now FEATURE-COMPLETE for the ChromeOS / general-MDM use case. Half 2 (Phases 7-12) adds the Microsoft Intune dynamic-challenge layer.	2026-04-29 13:58:18 +00:00
shankar0123	e720474fb7	Bundle D: Documentation & transparency sweep — 8 findings closed Closes H-009 + L-001 + L-007 + L-008 + L-016 + L-017 + L-018 + M-027 from comprehensive-audit-2026-04-25. H-009 — README JWT verified-already-clean README has zero JWT mentions at audit time. docs/architecture.md correctly documents JWT/OIDC integration via authenticating-gateway pattern (line 905-912). .github/workflows/ci.yml: new step 'Forbidden README JWT advertising regression guard (H-009)' greps README for JWT-as-supported phrasing; passes verbatim (gateway / pre-G-1) but fails build on net-new advertising. L-001 (CWE-295) — InsecureSkipVerify per-site justification Audit count was 8; recon found 13 production sites. docs/tls.md: new 'InsecureSkipVerify justifications' table enumerates each site by file:line with per-site rationale. cmd/agent/verify.go:78, internal/tlsprobe/probe.go:54, internal/service/network_scan.go:460: each previously-bare InsecureSkipVerify: true now carries //nolint:gosec. .github/workflows/ci.yml: new step 'Forbidden bare InsecureSkipVerify regression guard (L-001)' fails build if any net-new ISV lands in non-test .go without nolint:gosec on the same or preceding line. L-007 — README dependency-audit commands README.md: new Dependencies section with go list -m all \| wc -l, go mod why, govulncheck ./.... Honors operating-rules invariant. L-008 — Release-time govulncheck gate .github/workflows/release.yml: new 'Install govulncheck' + 'Run govulncheck (release gate)' steps in the matrix job. Pinned to same install path as ci.yml. Default exit code semantics (fail on called-vuln only, deferred-call advisories tracked on master via L-021) keeps the gate appropriate. L-016 — architecture.md drift fixes docs/architecture.md: system-components diagram's '21 tables' annotation removed (current 23; replaced with TEXT-keys descriptor); connector-architecture '9 connectors' prose replaced with grep ref + current 12-issuer list (added Entrust/GlobalSign/EJBCA which were missing); API-design '97 operations / 107 total' replaced with grep commands. Connector subgraphs verified-current at 12/13/6. L-017 — workspace CLAUDE.md verified-already-clean Bundle B's pre-commit-gate refactor already converted current- state numeric claims to grep commands. Phase 0 recon confirmed zero remaining hardcoded counts. L-018 — Defect age table cowork/comprehensive-audit-2026-04-25/defect-age.md (NEW): Tabulates all 9 High findings with first-mentioned commit, closing bundle, days-open. Methodology snippet for re-running. Key finding: 8 of 9 closed within 24h of audit publication. M-027 — OpenAPI parity verified-already-clean Audit's 'router 121 vs OpenAPI 125 — 4-op gap' was wrong methodology. The 4-op 'gap' was exactly the 4 routes registered via r.mux.Handle (auth-exempt allowlist) instead of r.Register. When you count both dispatch shapes the totals match exactly. internal/api/router/openapi_parity_test.go (NEW): TestRouter_OpenAPIParity AST-walks router.go for both Register and mux.Handle calls + walks api/openapi.yaml's path/method nesting + asserts the sets match. Adding a route without updating the spec fails CI permanently. Audit deliverables: audit-report.md: score 38/55 -> 46/55 closed (High 7/9 -> 8/9; Medium 20/27 -> 21/27; Low 8/19 -> 14/19) findings.yaml: 8 status flips open -> closed defect-age.md: new file certctl/CHANGELOG.md: Bundle D section Verification: TestRouter_OpenAPIParity PASS L-001 grep guard self-test (after //nolint:gosec adds) PASS H-009 grep guard self-test PASS go test -count=1 -short on changed packages green	2026-04-27 00:47:15 +00:00

17 Commits