certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 18:21:32 +00:00

Author	SHA1	Message	Date
shankar0123	5d7bc86451	fix(oidc): SEC-020 — wrap fetchUserinfoGroups via SafeOIDCContext Acquisition-audit Sprint 1 follow-up to SEC-001 (2026-05-16). The original SEC-001 sweep routed two OIDC discovery legs (test_discovery.go dry-run + service.go runtime provider load) through validation.SafeHTTPDialContext via the SafeOIDCContext(ctx) helper. This commit closes one of the two adjacent call sites the sweep missed: the userinfo-fallback path at service.go::fetchUserinfoGroups. Pre-fix: func (s Service) fetchUserinfoGroups(ctx, entry, token, path) { ... ts := entry.oauthConfig.TokenSource(ctx, token) uinfo, err := entry.provider.UserInfo(ctx, ts) ... } go-oidc/v3 Provider.UserInfo (oidc.go:351-374) derives its http.Client from ctx via getClient(ctx) (oidc.go:61-65). Without an override, the internal doRequest (oidc.go:87-92) falls through to http.DefaultClient — no SSRF guard, no DNS-rebinding re-resolve at dial time. An IdP whose discovery doc advertises a userinfo_endpoint pointing at a reserved address (loopback / link-local / 169.254.169.254 cloud-metadata) would trigger an unguarded HTTPS egress at userinfo-fetch time. Operator opt-in to fetch_userinfo=true turns the gap on; the leg fires whenever the ID token doesn't surface the configured groups claim. Post-fix: safeCtx := SafeOIDCContext(ctx) ts := entry.oauthConfig.TokenSource(safeCtx, token) uinfo, err := entry.provider.UserInfo(safeCtx, ts) Context-key shape: gooidc.ClientContext is implemented as context.WithValue(ctx, oauth2.HTTPClient, client) (go-oidc v3.18.0 oidc.go:57-59). Both go-oidc's getClient AND golang.org/x/oauth2's internal.ContextClient read the same oauth2.HTTPClient key, so the SINGLE SafeOIDCContext wrap covers go-oidc-driven HTTP calls (Provider.UserInfo / Verifier JWKS) AND oauth2-driven HTTP calls (Config.TokenSource refresh / Exchange). No additional context.WithValue(ctx, oauth2.HTTPClient, ...) is required. Files touched: internal/auth/oidc/service.go — wrap ctx in fetchUserinfoGroups internal/auth/oidc/safehttp.go — extend SEC-001 header comment block to enumerate the two newly-patched sites (SEC-020 here + SEC-021 in the next commit) and the oauth2.HTTPClient key-sharing rationale, so future audits don't re-flag the design as confused internal/auth/oidc/service_test.go — new test TestFetchUserinfoGroups_SSRF_BlocksReservedAddress that stands up a loopback discovery server whose discovery doc advertises userinfo_endpoint = http://169.254.169.254/userinfo, constructs gooidc.Provider via the test-bypassed oidcDiscoveryClient (setup_test.go's init() pattern), then RESTORES the production SafeHTTPDialContext-backed client just before the fetchUserinfoGroups call. Asserts the error wraps SafeHTTPDialContext's 'refusing to dial reserved address' rejection rather than a generic connect-refused. Companion to the TestDefaultBCLVerifier_SSRF_BlocksReservedAddress that SEC-021 (next commit) adds. Verified: gofmt -l internal/ docs/ (clean) go vet ./... (clean) go test -race -short ./internal/auth/oidc/... (all green) TestFetchUserinfoGroups_SSRF_BlocksReservedAddress (new; green) All 4 cited CI guards pass (openapi-handler-parity, openapi-codegen-drift, no-sh-c-in-connectors, skip-inventory-drift) Acceptance grep: internal/auth/oidc/service.go:963: uinfo, err := entry.provider.UserInfo(safeCtx, ts) internal/auth/oidc/service.go:1084: provider, err := gooidc.NewProvider(SafeOIDCContext(ctx), cfgRow.IssuerURL) No bare-ctx UserInfo / NewProvider remains in service.go. Closes acquisition-audit SEC-020. SEC-021 (BCL discovery re-fetch) lands in the next commit.	2026-05-16 16:41:05 +00:00
shankar0123	e6cfd756ac	fix(auth): SEC-001 — gate OIDC discovery through SafeHTTPDialContext + ValidateSafeURL Sprint 1 unified-master-audit closure. Two OIDC discovery call sites passed the bare request context to gooidc.NewProvider: - internal/auth/oidc/test_discovery.go:65 (dry-run validator) - internal/auth/oidc/service.go:1066 (runtime cache load) gooidc.NewProvider derives its HTTP client from the context via oidc.ClientContext; with no override it falls through to http.DefaultClient — no SSRF guard. An admin with auth.oidc.create could induce server-side HTTPS egress to loopback (127.0.0.1, ::1), RFC 1918, link-local (169.254.169.254 — cloud-instance metadata), and IPv6 link-local (fe80::/10). The companion JWKS reachability probe was already routed through SafeHTTPDialContext via the Bundle 5 R6 closure; the discovery + claims path bypassed that. Fix: - New internal/auth/oidc/safehttp.go: oidcDiscoveryClient (Transport DialContext = validation.SafeHTTPDialContext) + SafeOIDCContext helper. Both call sites now wrap ctx through SafeOIDCContext before NewProvider runs. - Defense-in-depth: OIDCProvider.Validate calls validation.ValidateSafeURL on the IssuerURL after the existing https/parse checks, refusing reserved-address issuers at provider-creation time. - TestDiscovery surfaces the SSRF policy error via the result's Errors slice up-front (early-fail UX rail) before invoking NewProvider. Test seams: - setup_test.go swaps oidcDiscoveryClient + validateIssuerSSRF for httptest loopback compatibility, mirroring the existing jwksProbeClient pattern. Regression coverage: - internal/auth/oidc/domain/types_test.go: 5-case table pinning loopback v4/v6, cloud metadata, link-local v4/v6 rejection. - internal/auth/oidc/coverage_fill_test.go: same 5 cases against Service.TestDiscovery via temporarily restoring the production gate. Closes SEC-001.	2026-05-16 03:31:42 +00:00
shankar0123	21aeed4f4e	legal: addlicense headers + normalize legacy variants (Phase 0 RED-4) Phase 0 closure (Path B2, post-rewrite): addlicense sweep — adds the canonical certctl LLC copyright + BUSL-1.1 SPDX header to every production Go file. Template: // Copyright 2026 certctl LLC. All rights reserved. // SPDX-License-Identifier: BUSL-1.1 Coverage: 338 / 338 production Go files (cmd/ + internal/, excluding _test.go and /testdata/). Pre-sweep coverage was 22 / 338 (6.5%); post-sweep is 338 / 338 (100%). Normalized 22 pre-existing legacy headers (`// Copyright (c) certctl` + `// SPDX-License-Identifier: BSL-1.1`) and 1 file using a `Certctl Contributors` attribution. The legacy SPDX ID `BSL-1.1` is non-standard; the official SPDX identifier for Business Source License 1.1 is `BUSL-1.1` (capital U). All 338 files now share the canonical form. Generated via: addlicense -c "certctl LLC" -y 2026 \ -f cowork/legal/copyright-header.tpl \ -ignore '/testdata/' -ignore '/_test.go' \ cmd/ internal/ Verification: find cmd internal -name '.go' -not -name '_test.go' \ -not -path '/testdata/' \ -exec grep -L '^// Copyright 2026 certctl LLC' {} \; \| wc -l Returns: 0 gofmt clean. Header additions are comments only, no compile impact. Closes: cowork/certctl-architecture-diligence-audit.html#fix-RED-4	2026-05-13 21:23:35 +00:00
shankar0123	fefeccfa59	harden(oidc): relax alg-downgrade IdP-bind check to intersection-empty (Keycloak compat) Phase-10 live-IdP smoke (Keycloak 26.x via testcontainers-go) revealed the IdP-bind alg-downgrade check was too strict for real-world IdPs. 6 of the integration tests in internal/auth/oidc/integration_keycloak_test.go were failing with: oidc: IdP advertises weak signing algorithms (HS/none); refusing to use as defense against downgrade attacks: HS256 Keycloak 26.x (and several other real-world IdPs — Auth0 when HS-mode is enabled, some Authentik configs) advertise EVERY alg they're capable of in the discovery doc's id_token_signing_alg_values_supported field, even when the realm only signs with RS256 in practice. Pre-fix the IdP-bind check refused on ANY HS* or 'none' advertisement → no real Keycloak deploy could ever bind a provider row, hence the integration-test failures. The strict-deny check was defense-in-depth on top of the load-bearing per-token alg-pin at sig-verify time (isDisallowedAlg, service.go L1177): that check rejects every ID token whose JWS header carries an alg outside DefaultAllowedAlgs, regardless of what the discovery doc advertises. A forged HS256 token signed with the IdP's RS256 pubkey as HMAC secret is rejected at sig-verify time → the actual algorithm-confusion attack is closed by the per-token pin, NOT by the discovery-doc check. Fix: relax the IdP-bind check to refuse only when the intersection of advertised vs DefaultAllowedAlgs is EMPTY (the pathological all-weak-alg IdP case). Keycloak (RS256 + HS256 advertised) now binds successfully; an HS-only IdP still fails closed. Changes: - internal/auth/oidc/service.go: rewrite the alg-check loop at L1067 in getOrLoad / RefreshKeys to compute the intersection set; refuse only when no acceptable alg is advertised. ErrIdPDowngradeAdvertised docstring updated to reflect new contract. DefaultAllowedAlgs docstring + the package-level design-comment block at L40-72 updated with v2.1.0-relaxed semantics callouts. - internal/auth/oidc/test_discovery.go: TestDiscovery dry-run validator rewritten to surface HS/none alongside RS as an informational note ('note: IdP advertises weak algorithms %v alongside acceptable ones') rather than a hard-fail error. HS-only / none-only still hard-fails. - internal/auth/oidc/service_test.go: TestService_IdPDowngradeDefense_* tests updated. Renamed: - RejectsHSAdvertised → RS256PlusHS256_BindsSuccessfully (positive) - RejectsNoneAdvertised → RejectsHSOnlyAdvertised (intersection-empty) - RefreshKeys_CatchesPostLoadDowngrade rotated to HS-only post-load - internal/auth/oidc/coverage_fill_test.go: TestTestDiscovery_AlgDowngradeDetected split into _HS256AlongsideRS256_BindsWithNote (positive, asserts note but no hard-fail) + _HSOnly_StillTrips_HardFail (intersection-empty). - docs/operator/auth-threat-model.md: OIDC token-validation alg-allow-list section rewritten to call out the load-bearing-defense hierarchy (per-token pin first, IdP-bind check defense-in-depth) and document the v2.1.0 relaxation rationale. - CHANGELOG.md: ### Security entry under Unreleased. Verify: go test ./internal/auth/oidc/ -short PASS; gofmt clean; go vet clean. The Keycloak integration tests should now pass when the operator re-runs 'make keycloak-integration-test'.	2026-05-11 15:34:59 +00:00
shankar0123	11b145b641	Merge Fix 06 (HIGH A-6): strict UA/IP binding — close request-empty bypass in MED-16 # Conflicts: # CHANGELOG.md # internal/api/handler/auth_session_oidc.go # internal/api/handler/auth_session_oidc_test.go	2026-05-11 11:19:04 +00:00
shankar0123	92519436a1	harden(oidc): strict UA/IP binding (A-6) — close request-empty bypass in MED-16 The MED-16 closure (`2a1a0b3`) added the RFC 9700 §4.7.1 pre-login UA/IP binding but the consume-side compare at internal/auth/oidc/service.go was gated by: if s.preLoginRequireUA && storedUA != "" && userAgent != "" { ... constant-time compare ... } if s.preLoginRequireIP && storedIP != "" && ip != "" { ... constant-time compare ... } The `userAgent != ""` and `ip != ""` arms were intended as rolling-deploy / headless-proxy compat ("if the request didn't supply a value, don't try to compare against nothing"). They achieve that — and they ALSO short-circuit the compare whenever the attacker controls the request side, which is always at /auth/oidc/callback. Threat model: 1. Attacker acquires a pre-login cookie (HMAC-protected; requires RNG break OR transit leak — not implausible, that's why the binding exists in the first place). 2. Attacker replays the cookie at /auth/oidc/callback from their own user-agent. 3. Attacker OMITS the User-Agent header. curl doesn't send one by default. Many programmatic HTTP clients omit it. Pre-A-6, step 3 trivially bypassed the binding check. The whole RFC 9700 §4.7.1 defense was theatre against the realistic threat — silent-allow when the attacker abandons the header they don't want checked. Fix: flipped to strict-when-stored. When the pre-login row carries a binding value (storedUA != "" or storedIP != ""), the request MUST present a matching value. An empty request side with a non-empty stored side now rejects with two new sentinels: ErrPreLoginUAMissing — request omitted User-Agent header ErrPreLoginIPMissing — request had no resolvable client IP Distinguished from the existing Mismatch sentinels so the audit row can tell apart "binding violation" (operator mis-configured the proxy) from "missing-header bypass attempt" (active exploit indicator). The handler-side classifyOIDCFailure adds typed errors.Is dispatch: ErrPreLoginUAMissing → "prelogin_ua_missing" ErrPreLoginIPMissing → "prelogin_ip_missing" SIEM rules can now alert specifically on the bypass-attempt category distinctly from operator config drift. Legacy-row compat preserved: pre-migration rows where storedUA == "" / storedIP == "" still pass through unchecked. That window is bounded by the 10-minute pre-login TTL — within 10 minutes of the MED-16 deploy every legacy row has expired and the strict path is universal. Operator escape hatches preserved: CERTCTL_OIDC_PRELOGIN_REQUIRE_UA=false (symmetric for IP) bypasses both the Mismatch AND the new Missing reject paths. Required for environments where a proxy strips the User-Agent header in transit (rare but documented in the operator advisory). Regression coverage: service_test.go (5 new tests under `Audit 2026-05-11 A-6 — strict-when-stored` block): TestService_HandleCallback_MED16_A6_UAStoredButRequestEmpty_Rejects — the load-bearing bypass-closure leg TestService_HandleCallback_MED16_A6_IPStoredButRequestEmpty_Rejects — symmetric for IP TestService_HandleCallback_MED16_A6_LegacyRowEmptyStoredStillPasses — legacy-row compat preserved TestService_HandleCallback_MED16_A6_ToggleOff_AllowsBypass — UA toggle off allows the bypass (operator escape hatch) TestService_HandleCallback_MED16_A6_ToggleOff_IP_AllowsBypass — IP toggle off allows the bypass auth_session_oidc_test.go::TestClassifyOIDCFailure extended: ErrPreLoginUAMismatch → prelogin_ua_mismatch (new explicit pin) ErrPreLoginIPMismatch → prelogin_ip_mismatch (new explicit pin) ErrPreLoginUAMissing → prelogin_ua_missing ErrPreLoginIPMissing → prelogin_ip_missing fmt.Errorf wrapped variants of the Missing sentinels round-trip through errors.Is (defense against future context-wrapping in the service layer) Verify gate green: gofmt clean, go vet clean, all 10 MED-16 tests + extended TestClassifyOIDCFailure pass; full short-mode test run across internal/auth/oidc + internal/api/handler also green. Spec at cowork/auth-bundles-fixes-2026-05-11/06-high-prelogin-ua-strict-mode.md. Audit doc: MED-16 row in cowork/auth-bundles-audit-2026-05-10.md appended with the A-6 follow-up closure annotation; status table row updated to "CLOSED + A-6 follow-up CLOSED 2026-05-11". Operator advisory in CHANGELOG.md v2.1.0 release notes covers the two operator-visible behaviour changes: (1) callback requests without User-Agent now reject when a binding was stored, and (2) the CERTCTL_OIDC_PRELOGIN_REQUIRE_UA=false escape hatch is the documented path for environments where the proxy strips the header.	2026-05-11 11:03:31 +00:00
shankar0123	78485f7429	fix(auth/users): close MED-11 lying field — DeactivatedAt loaded + enforced on login (A-2) The MED-11 closure shipped users.deactivated_at + DELETE /api/v1/auth/users/{id} + cascade-revoke, but the federated-user soft-delete was reversible: the next OIDC login under the same (provider, subject) tuple re-minted a session and re-elevated the user. Three legs of the chain were severed (each independently CRIT-shaped): Leg A — postgres/user.go::userColumns omitted `deactivated_at`, so scanUser never populated User.DeactivatedAt. Every Get / GetByOIDCSubject / ListAll returned DeactivatedAt = nil regardless of the column value. Leg B — postgres/user.go::Update SQL omitted `deactivated_at = $X`, so the handler's `u.DeactivatedAt = now()` mutation was a no-op write at the SQL level. Even with leg A closed, no row ever flipped. Leg C — oidc/service.go::upsertUser did not inspect DeactivatedAt on the existing-user path. Even with legs A + B closed, the OIDC login would still proceed normally. The cascade-session-revoke half of the original closure remained correct, but only for the duration of the user's current cookie. SOC 2 CC6.3 + ISO 27001 A.9.2.6 "user access removal" controls require both immediate revoke AND persistent block — this fix restores the persistent-block leg. Closure across layers: internal/repository/postgres/user.go - userColumns adds `deactivated_at` - scanUser reads via sql.NullTime intermediate (column is nullable) - Create writes deactivated_at explicitly (NULL for new active users; forward-compat for future seed-data flows that pre-populate the column) - Update writes deactivated_at on every call; nil DeactivatedAt → NULL (supports reactivation) internal/auth/oidc/service.go - New sentinel ErrUserDeactivated - upsertUser checks existing.DeactivatedAt != nil BEFORE mutating email / display_name / last_login_at — preserves last_login_at forensics on rejected login attempts (defense-in-depth pin against future "performance optimization" that reorders the gate) internal/api/handler/auth_session_oidc.go - classifyOIDCFailure adds typed errors.Is dispatch for ErrUserDeactivated → audit category "user_deactivated" (SOC/SIEM observability surface) internal/api/handler/auth_users.go - Self-deactivate guard on Deactivate: HTTP 409 + audit row auth.user_deactivate_self_rejected when caller targets own User row. Prevents an admin from one-way-door locking themselves out via the standard handler; break-glass remains the recovery path. - New Reactivate handler: inverse of Deactivate. Clears DeactivatedAt via Update; emits auth.user_reactivated audit row. Idempotent on already-active rows. Sessions revoked at deactivation stay revoked (cascade irreversible by design — user must complete fresh OIDC login). internal/api/router/router.go - POST /api/v1/auth/users/{id}/reactivate wired with auth.user.deactivate gate (reactivation is the inverse op, not a separate privilege) web/src/api/client.ts + web/src/pages/auth/UsersPage.tsx - authReactivateUser() client function - Reactivate button on deactivated rows in UsersPage Regression coverage: Postgres (testcontainers, skipped under -short): TestUserRepository_DeactivatedAt_RoundTrip — Create → set DeactivatedAt → Update → Get / GetByOIDCSubject / ListAll round-trip the value TestUserRepository_DeactivatedAt_CreateWritesNullForActive — new active user reads back DeactivatedAt = nil TestUserRepository_DeactivatedAt_CreatePersistsPreDeactivated — Create with non-nil DeactivatedAt round-trips (forward-compat path) OIDC service: TestService_HandleCallback_RejectsDeactivatedUser — errors.Is ErrUserDeactivated; CallbackResult nil; persisted email / last_login_at / deactivated_at NOT mutated by the rejected attempt TestService_HandleCallback_AllowsReactivatedUser — DeactivatedAt = nil → happy path resumes TestService_HandleCallback_DeactivatedUserPreservesForensics — defense-in-depth pin against future regressions that reorder the gate-vs-mutation sequence Classifier: TestClassifyOIDCFailure extended — typed dispatch + wrapped variant round-trip through errors.Is Handler: TestAuthUsers_Deactivate_RejectsSelfDeactivate — HTTP 409 + audit row + cascade-revoke NOT fired + row stays active TestAuthUsers_Deactivate_OtherUser_HappyPath — HTTP 204 + cascade fires + row soft-deleted TestAuthUsers_Reactivate_HappyPath / _IdempotentOnActiveUser / _UnknownID / _MissingID / _UpdateError Phase 6 verify gate green on the targeted packages: gofmt clean, go vet clean, go test -short pass across internal/auth/oidc, internal/api/handler, internal/api/router, internal/repository/postgres, internal/auth/..., internal/service/..., internal/tlsprobe/..., internal/trustanchor/..., internal/validation/... Spec at cowork/auth-bundles-fixes-2026-05-11/02-crit-deactivated-at-enforcement.md Closure annotation at cowork/auth-bundles-audit-2026-05-10.md MED-11 row. Operator advisory in CHANGELOG.md v2.1.0 release notes.	2026-05-11 02:21:05 +00:00
shankar0123	172b30b8f1	feat(auth): backend endpoints for MED-7 + MED-11 + MED-12 Audit 2026-05-10 MED-7 + MED-11 + MED-12 backend halves. WHAT. Three new admin-gated endpoints: GET /api/v1/auth/oidc/providers/{id}/jwks-status (auth.oidc.list) — MED-7 GET /api/v1/auth/users (auth.user.read) — MED-11 DELETE /api/v1/auth/users/{id} (auth.user.deactivate) — MED-11 GET /api/v1/auth/runtime-config (auth.role.assign) — MED-12 MED-7 — JWKS health surface - providerEntry gains 4 counters (statsMu, lastRefreshAt, refreshCount, lastError, rejectedJWSCount) updated under sync.Mutex - RefreshKeys increments refreshCount + records lastRefreshAt - New JWKSStatus(ctx, providerID) returns *JWKSStatusSnapshot — surfaced via the new endpoint - CurrentKIDs intentionally empty (go-oidc's internal JWKS cache isn't exposed); shape kept for forward compat MED-11 — federated-user admin - AuthUsersHandler.List with optional ?oidc_provider_id filter - AuthUsersHandler.Deactivate sets users.deactivated_at + cascade- revokes sessions via UserSessionsRevoker (best-effort; revoke failure does NOT roll back the deactivation) - Idempotent: re-deactivating an already-deactivated user is a no-op MED-12 — runtime config - AuthRuntimeConfigHandler.Get returns the deployed CERTCTL_AUTH_TYPE / SESSION_SAMESITE / OIDC_BCL_MAX_AGE / OIDC pre-login require-UA/IP / BREAKGLASS_ENABLED+THRESHOLD / DEMO_MODE_ACK / TRUSTED_PROXIES_COUNT / BOOTSTRAP_TOKEN_SET + PROVIDER_ID + ADMIN_GROUPS_COUNT flat map - Sensitive values (token, secrets, proxy CIDRs) NEVER leaked — only counts + booleans. Token presence surfaced as 'set/unset' - Gated auth.role.assign (admin-class) so non-admins can't enumerate the deployment's auth knobs cmd/server/main.go wires all three handlers into HandlerRegistry. internal/api/router/router.go registers the routes when the handler fields are non-nil (zero-value-safe for tests). VERIFY. - go vet ./internal/api/... ./internal/auth/... ./internal/repository/... PASS - go build ./cmd/server/... PASS - go test -short -count=1 ./internal/auth/oidc/... PASS (4.1s) - go test -short -count=1 ./internal/api/handler/... PASS (4.1s) GUI halves for MED-7 + MED-11 + MED-12 are the GUI batch (pending). Refs: cowork/auth-bundles-audit-2026-05-10.md MED-7, MED-11, MED-12 cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md items 11 14 15	2026-05-11 00:11:07 +00:00
shankar0123	e005c004e1	harden(oidc): JWKS auto-refresh on kid-not-in-cache (MED-6) Audit 2026-05-10 MED-6 closure. WHAT. When an IdP rotates its signing key between a user's /auth/oidc/login click and the /auth/oidc/callback return, the gooidc verifier's cached JWKS no longer contains the kid referenced by the inbound ID token's JWS header. Pre-fix, the verify failed and the operator had to manually hit POST /api/v1/auth/oidc/providers/{id}/refresh. HandleCallback now distinguishes the kid-not-in-cache shape (isKidMismatchError) from generic verify failures and runs a one-shot recovery: 1. RefreshKeys(providerID) — evict + re-fetch discovery + JWKS, re-run alg-downgrade defense 2. getOrLoad(providerID) — refresh the cached providerEntry 3. verifier.Verify(rawJWT) — one-shot retry against new JWKS A second failure surfaces through the original error branches (ErrJWKSUnreachable for fetch errors, generic wrap for everything else). NO retry loop — bounded recovery only. WHY. Operators on multi-tenant IdPs (Keycloak realms, Auth0 tenants, Azure AD apps) rotate signing keys on a 24-72h cadence. Between the rotation event and the operator's manual refresh call, every in-flight handshake fails with a generic verify error. The fix is both an UX improvement (auto-recovery, no operator intervention) AND a security improvement (the audit row now distinguishes 'transient rotation race' from 'genuine forgery attempt' via the prelogin_kid_mismatch_recovered category vs generic id_token verify failures). HOW. internal/auth/oidc/service.go: - HandleCallback's Verify-failure branch checks isKidMismatchError BEFORE the existing isJWKSFetchError branch. On match, runs RefreshKeys + getOrLoad + verifier.Verify exactly once. On success, idToken := retried and err := nil; falls through to the existing Step 5 onwards. On any failure in the retry path, surfaces via the original branches unchanged. - isKidMismatchError matcher: pinned go-oidc/v3 v3.18.0 substrings ('kid .* not found', 'signing key .* not found', 'no matching key', 'key with id .* not found'). Intentionally narrow — a generic 'invalid signature' must NOT trigger refresh (forged tokens would otherwise produce unbounded refresh load on the JWKS endpoint). internal/auth/oidc/service_test.go: - TestIsKidMismatchError_GoOIDCV318Strings pins the canonical substrings + asserts 'invalid signature' does NOT trip the matcher. - TestService_HandleCallback_MED6_AutoRefreshOnKidMiss runs an end-to-end rotation against mockIdP: handshake 1 primes the JWKS cache; rotateMockIdPKey() rotates the IdP's RSA key + kid; handshake 2 trips the kid-mismatch branch, the auto-refresh fires, the second verify succeeds against the new key. VERIFY. - go vet ./internal/auth/oidc/... PASS - go test -short -count=1 -run 'MED6\|KidMismatch' ./internal/auth/oidc/... PASS (2/2) - go test -short -count=1 ./internal/auth/oidc/... PASS (4.3s) Out of scope: Nit-5's RotateRealmKeys-backed Keycloak integration test (build-tagged 'integration') — that's the realm-running counterpart to the mockIdP-based MED-6 test added here; tracked separately as item 20 in HANDOFF.md. Refs: cowork/auth-bundles-audit-2026-05-10.md MED-6 cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 3	2026-05-10 23:28:57 +00:00
shankar0123	2a1a0b347c	harden(oidc): pre-login UA/IP binding (MED-16) — RFC 9700 §4.7.1 Audit 2026-05-10 MED-16 closure. WHAT. Binds the OIDC pre-login row to the (clientIP, userAgent) tuple of the /auth/oidc/login request, and enforces a constant-time compare against the /auth/oidc/callback request at consume time. Defeats replay of a stolen pre-login cookie by a different browser / source — the secondary defense layer recommended by RFC 9700 §4.7.1 when the primary layer (HMAC integrity + Path=/ + SameSite=Lax on the cookie) is bypassed via CSRF / XSS / TLS-termination leak. WHY. Pre-fix, the pre-login cookie's HMAC verified only that 'some' caller of /auth/oidc/login was talking to /auth/oidc/callback; it did not verify that the SAME browser / source was on both sides. An attacker who exfiltrated the cookie value via any vector could replay the bytes through their own user-agent and ride the victim's authorization. RFC 9700 §4.7.1 calls out the gap explicitly and recommends binding state to a user-agent fingerprint + source IP. HOW. Migration: migrations/000044_prelogin_uaip.up.sql ALTER TABLE oidc_pre_login_sessions ADD COLUMN IF NOT EXISTS client_ip TEXT, ADD COLUMN IF NOT EXISTS user_agent TEXT; Both nullable for in-flight rolling-deploy compat — the consume- side check only enforces when both row AND request carry non-empty values for the leg in question. Domain: internal/repository/oidc.go (PreLoginSession) — adds ClientIP + UserAgent fields. Repository: internal/repository/postgres/oidc_prelogin.go — Create persists via sql.NullString (empty → NULL); LookupAndConsume reads back. Re-uses package-local nullableString from discovery.go. Service: internal/auth/oidc/service.go - PreLoginStore.CreatePreLogin signature takes (clientIP, userAgent) as positions 5–6. - PreLoginStore.LookupAndConsume returns (clientIP, userAgent) as positions 5–6. - HandleAuthRequest signature gains (clientIP, userAgent), threaded to the store. - HandleCallback adds Step 1.5 — UA / IP constant-time compare between stored row and incoming request. Per-leg toggles via preLoginRequireUA / preLoginRequireIP service fields. Empty values on either side pass through (rolling-deploy + headless- proxy compat). - New sentinels ErrPreLoginUAMismatch, ErrPreLoginIPMismatch. - SetPreLoginBindingRequirements(requireUA, requireIP) helper for main.go config wiring. Adapter: internal/auth/oidc/prelogin.go — PreLoginAdapter passes the new fields through to the repo row. Handler: internal/api/handler/auth_session_oidc.go - OIDCAuthHandshaker.HandleAuthRequest signature updated. - LoginInitiate captures clientIPFromRequest + r.UserAgent() and passes to the service. - classifyOIDCFailure adds errors.Is dispatch for the two new sentinels → prelogin_ua_mismatch / prelogin_ip_mismatch audit categories. Config: internal/config/config.go + AuthConfig.OIDCPreLoginRequireUA (default true) env CERTCTL_OIDC_PRELOGIN_REQUIRE_UA + AuthConfig.OIDCPreLoginRequireIP (default true) env CERTCTL_OIDC_PRELOGIN_REQUIRE_IP cmd/server/main.go calls oidcService.SetPreLoginBindingRequirements from cfg.Auth.OIDCPreLoginRequire{UA,IP}. Tests (internal/auth/oidc/service_test.go): - TestService_HandleCallback_MED16_UAMismatchRejected - TestService_HandleCallback_MED16_IPMismatchRejected - TestService_HandleCallback_MED16_BothMatch_Succeeds - TestService_HandleCallback_MED16_LegacyRowEmptyValues (rolling- deploy compat — empty stored values pass through) - TestService_HandleCallback_MED16_RequireUAFalse_AllowsMismatch (operator escape-hatch — UA mismatch silently allowed) Mechanical fan-out: - stubPreLogin / stubPreLoginRepo signatures updated. - All existing call sites in service_test.go (~40), prelogin_test.go, bench_test.go, logging_test.go, provider_enabled_test.go, integration_keycloak_test.go, integration_okta_smoke_test.go, auth_session_oidc_test.go updated to pass empty strings for the new params — pre-existing tests do not exercise UA/IP binding semantics. VERIFY. - go vet ./internal/auth/oidc/... ./internal/api/handler/... ./internal/config/... PASS - go test -short -count=1 -run MED16 ./internal/auth/oidc/... PASS (5/5) - go test -short -count=1 ./internal/auth/oidc/... PASS (4.6s) - go test -short -count=1 ./internal/api/handler/... PASS (4.3s) - go test -short -count=1 ./internal/config/... PASS Refs: cowork/auth-bundles-audit-2026-05-10.md MED-16 cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 6 RFC 9700 §4.7.1 — OAuth 2.0 Security Best Current Practice	2026-05-10 23:18:23 +00:00
shankar0123	2cd2a5c52f	harden(oidc): RFC 9207 iss URL parameter check on callback (MED-17) Audit 2026-05-10 MED-17 closure. WHAT. When the matched IdP's discovery doc advertises authorization_response_iss_parameter_supported=true (RFC 9207 §3), HandleCallback now REQUIRES a non-empty `iss` query parameter on /auth/oidc/callback and enforces a constant-time compare against the configured provider's IssuerURL. Mismatch maps to two new sentinel errors (ErrIssParamMissing / ErrIssParamMismatch) that the handler's classifyOIDCFailure dispatches via errors.Is BEFORE the substring fall-through, so the audit failure_category remains distinguishable between the RFC 9207 leg (iss_param_missing / iss_param_mismatch) and the in-token iss claim leg (id_token_iss_mismatch). WHY. The RFC 9207 iss URL parameter is the load-bearing mix-up-attack defense for multi-tenant IdPs (Keycloak realms, Authentik tenants, Auth0 tenants, public-trust CAs). Pre-fix the parameter was silently ignored — an attacker controlling one IdP tenant could route an auth code to certctl's callback against a different tenant's pre-login state without detection. Modern Keycloak / Authentik / public-trust CAs ship the discovery flag by default; legacy IdPs that don't advertise are unaffected (back-compat preserved). HOW. - internal/auth/oidc/service.go - providerEntry gains issParamSupported bool. - getOrLoad extends the discovery-claims read to include authorization_response_iss_parameter_supported, alongside the existing id_token_signing_alg_values_supported defense. - HandleCallback's signature gains callbackIss string at position 5. Step 2.5 runs after the state compare + provider load: when issParamSupported is true, an empty callbackIss returns ErrIssParamMissing; a present-but-mismatched value returns ErrIssParamMismatch (constant-time compare). - Two new sentinels: ErrIssParamMissing, ErrIssParamMismatch. ErrIssuerMismatch's doc-string clarified to note it covers the in-token leg only. - internal/api/handler/auth_session_oidc.go - OIDCAuthHandshaker.HandleCallback signature updated. - LoginCallback reads r.URL.Query().Get("iss") (no TrimSpace — byte-strict compare upstream) and threads it through. - classifyOIDCFailure: typed errors.Is dispatch for the three iss-family sentinels BEFORE the substring fall-through, so the three cases stay distinguishable in the audit row. - internal/api/handler/auth_session_oidc_test.go - stubOIDCSvc.HandleCallback bumped to 7-arg signature. - TestClassifyOIDCFailure extended with 5 new cases pinning the iss-family dispatch + a wrapped-error round-trip. - internal/auth/oidc/service_test.go - mockIdP gains advertiseIssParameterSupported bool; the /.well-known/openid-configuration handler emits the claim only when set (so existing tests stay back-compat). - 4 new regression tests: * MED17_NoSupport_AnyIssAccepted — provider doesn't advertise; arbitrary callbackIss is ignored (back-compat). * MED17_SupportButMissing — provider advertises; missing iss → ErrIssParamMissing. * MED17_SupportButMismatch — provider advertises; wrong iss → ErrIssParamMismatch (load-bearing mix-up defense). * MED17_SupportAndCorrect — provider advertises; matching iss → success path proves the gate isn't over-eager. - internal/auth/oidc/bench_test.go, internal/auth/oidc/logging_test.go, internal/auth/oidc/integration_keycloak_test.go - Mechanical: all existing HandleCallback call sites updated to pass "" for callbackIss (matches pre-fix behavior for IdPs that don't advertise support — the Keycloak integration suite tests will be re-evaluated once the Keycloak fixture is run against a realm with the discovery flag enabled). VERIFY. - go vet ./internal/auth/oidc/... ./internal/api/handler/... PASS - go test -short -count=1 ./internal/auth/oidc/... PASS (3.4s) - go test -short -count=1 ./internal/api/handler/... PASS (5.4s) - 4 new MED-17 regression tests + extended TestClassifyOIDCFailure pass. Refs: cowork/auth-bundles-audit-2026-05-10.md MED-17 cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 7 RFC 9207 — OAuth 2.0 Authorization Server Issuer Identification	2026-05-10 23:05:52 +00:00
shankar0123	e7c4654b16	harden(auth/session+oidc): 503/401 split + go-oidc string pin (LOW-6 + Nit-2) Audit 2026-05-10 — close LOW-6 + Nit-2 from the HANDOFF.md backend batch (items 8 + 9). LOW-6: introduce ErrSessionTransient sentinel in session.Service. session.Validate now distinguishes: - errors.Is(err, repository.ErrSessionNotFound) → ErrSessionInvalidCookie (401) - All other repo errors → ErrSessionTransient (503) The session middleware maps ErrSessionTransient to HTTP 503 with Retry-After: 1. Pre-fix, every DB hiccup looked like a forged-cookie 401 and forced the user to re-authenticate on a transient outage. Two new regression tests pin the wire shape: - TestService_Validate_TransientSessionGetError (service layer) - TestService_Validate_SessionNotFoundMapsToInvalidCookie (negative leg: not-found stays 401) - TestSessionMiddleware_TransientErrorMappedTo503 (middleware-level 503 + Retry-After header) Nit-2: isJWKSFetchError documentation now pins go-oidc/v3 v3.18.0 as the source-of-truth string set. v3.18.0 exposes only *oidc.TokenExpiredError as a typed error; JWKS-fetch failures bubble up as fmt.Errorf-wrapped strings. New regression test TestIsJWKSFetchError_GoOIDCV318Strings pins the canonical substrings emitted by go-oidc's jwks.go — a future upstream bump that changes the wording trips the test and forces the matcher to be re-derived. The test caught a real gap: 'oidc: failed to decode keys' (emitted when the IdP returns non-JSON at the jwks_uri — broken proxy, gateway HTML error page, etc.) was previously misclassified as a generic 500 instead of 503 ErrJWKSUnreachable. Added 'decode keys' substring to the matcher. Status: LOW-6 + Nit-2 marked CLOSED in audit-doc table. Refs: cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md items 8, 9 cowork/auth-bundles-audit-2026-05-10.md LOW-6, Nit-2	2026-05-10 22:41:19 +00:00
shankar0123	925523e06e	feat(oidc): Enabled toggle on OIDCProvider (MED-9) Audit 2026-05-10 Fix 13 Phase B — close MED-9. MED-4/5/6/7 deferred to v3. MED-9: ship the OIDCProvider.Enabled boolean. Pre-fix, the only way to take a provider offline during an incident was DELETE, which breaks active user_oidc_provider FK references and orphans any session that minted under the provider. Post-fix: - Migration 000042 adds enabled BOOLEAN NOT NULL DEFAULT TRUE. Default-true means existing pre-migration rows are all enabled post-deploy; no breaking-change window. - internal/auth/oidc/domain/types.go::OIDCProvider.Enabled ships the domain field with JSON tag 'enabled'. - Repository read/write paths (List, Get, GetByName, Create, Update) all carry the column. - internal/auth/oidc/service.go::HandleAuthRequest rejects with the new ErrProviderDisabled sentinel when cfgRow.Enabled=false. - cmd/server/main.go::oidcProvidersListAdapter.List filters disabled providers before constructing OIDCProviderInfo so the LoginPage's 'Sign in with X' buttons never render for offline IdPs. - Defense-in-depth: the ErrProviderDisabled service-layer check is the guard for direct API / MCP / CLI callers that bypass the GUI. Regression test: internal/auth/oidc/provider_enabled_test.go warms the entry cache via a successful HandleAuthRequest, flips cfgRow.Enabled=false on the cached entry, then asserts the next call returns ErrProviderDisabled (errors.Is). Test fixtures (newValidProvider, makeProvider) updated to set Enabled: true so existing tests stay green. Operators can toggle Enabled today via the existing PUT /api/v1/auth/oidc/providers/{id} body field. A dedicated GUI toggle on OIDCProviderDetailPage and a single-purpose PUT-just-enabled endpoint are deferred to the v3 GUI-polish bundle — the load-bearing wire is in place now. MED-4 (GUI advanced fields on edit), MED-5 (POST .../test endpoint + button), MED-6 (JWKS auto-refresh on cache-miss), MED-7 (JWKS health endpoint + GUI panel): DEFERRED to v3 with explicit annotations in the audit doc. Workarounds: MED-4 fields are PUT-editable via curl/MCP; MED-5 → call refresh post-create; MED-6 → call refresh manually on key rotation. Refs: cowork/auth-bundles-audit-2026-05-10.md MED-4, MED-5, MED-6, MED-7, MED-9 Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase B	2026-05-10 21:59:17 +00:00
shankar0123	739745e9fe	fix(oidc): enforce AllowedEmailDomains allowlist in HandleCallback Closes CRIT-5 of the 2026-05-10 audit — the LAST Critical blocker for v2.1.0. The OIDCProvider.AllowedEmailDomains field shipped persisted (internal/auth/oidc/domain/types.go:47), API-surfaced (internal/api/handler/auth_session_oidc.go), MCP-surfaced (internal/mcp/tools_auth_bundle2.go), and GUI-editable, but the verifier in internal/auth/oidc/service.go::HandleCallback NEVER read it. Operators filling allowed_email_domains: ["acme.com"] expected "users outside acme.com cannot log in" — the field had zero effect. Textbook lying-field shape per CLAUDE.md's "complete path" rule. This commit: - Adds Step 7.5 to HandleCallback (between profile-claim resolve and group-claim resolve): when the provider's AllowedEmailDomains slice is non-empty, the user's email-domain MUST match a list entry (case- insensitive exact match; subdomains NOT auto-accepted — operators who want dev.acme.com authorized must list it explicitly). - Two new sentinel errors at the package level: - ErrEmailDomainNotAllowed — email is set but domain not in list - ErrEmailMissingButRequired — allowlist set + ID token has no email - New extractEmailDomain helper: case-folds + trims whitespace + uses LastIndex for the @ split + rejects empty input / no-@ / empty local-part / empty domain-part. Returns the lowercase domain or an error. - 21 regression tests in internal/auth/oidc/email_domain_test.go: - 10 extractEmailDomain shape cases (plain, mixed-case input, leading/trailing whitespace, subdomain preserved, empty, no @, empty local-part, empty domain-part, multiple @ via LastIndex). - 11 match-semantic cases (empty list passes any, lowercase match, mixed-case allowlist entry match, mixed-case email match, whitespace-padded allowlist entry, unmatched returns ErrEmailDomainNotAllowed, missing email + non-empty allowlist returns ErrEmailMissingButRequired, subdomain NOT auto-accepted, parent-domain NOT auto-accepted, multi-entry first-match, multi-entry no-match). Subdomain matching (alice@dev.acme.com against allowlist=[acme.com]) is intentionally NOT auto-accepted. The audit's MED-line tracks the wildcard / suffix support story for v3; v2.1 ships strict. Verification gate green: - gofmt clean - go vet clean - go test -short -count=1 ./internal/auth/oidc/... ./internal/api/... ./internal/domain/auth/ — all pass (incl. existing OIDC service test suite, the 4 BCL tests, the auditor pin, and the AST RBAC-gate coverage guard). Branch dev/auth-bundle-2 status post-commit: CRIT-1 (`68ca42f`), CRIT-2 (`ca1e135`), CRIT-3 (`00eace8`), CRIT-4 (`f1d9771`), CRIT-5 (this) — all five Criticals from the 2026-05-10 audit closed. v2.1.0 is unblocked. HIGH-1..HIGH-12 + MEDs + LOWs are independently mergeable follow-ups (spec at cowork/auth-bundles-fixes-2026-05-10/). Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-5	2026-05-10 20:30:32 +00:00
shankar0123	1d01c87663	auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap + break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility) Phase 7 — OIDC first-admin bootstrap (Decision 3): - Optional AdminBootstrapHook closure on oidc.Service. When wired, HandleCallback consults the hook AFTER group resolution + user upsert and BEFORE the empty-mapping fail-closed check. Hook receives (providerID, groups, userID); returns grantAdmin=true when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no admin exists yet in the tenant. - cmd/server/main.go wires the hook as a closure that: Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured). * Probes AdminExists via authActorRoleRepo (admin-already-exists silently returns false; bootstrap mode is one-shot per tenant). * Walks group intersection. * On match: grants r-admin via authActorRoleRepo.Grant + emits the bootstrap.oidc_first_admin audit row with event_category=auth + INFO log. - Coexists with the Bundle 1 env-var-token bootstrap. Both paths can be configured; first match wins (admin-existence probe short-circuits the second). - HandleCallback's empty-mapping fail-closed check moved AFTER the hook so a fresh deployment with zero group_role_mappings can still mint the first admin. - 5 tests in service_test.go: hook grants admin on match, hook returns false preserves empty-mapping fail-closed, admin-already- exists silently falls through to normal mapping, hook-error wraps + bubbles, idempotent when admin is already in the mapped role set. Phase 7.5 — Break-glass admin (Decision 4, default-OFF): Migration 000038 ships: - breakglass_credentials table — at-most-one-credential-per-actor (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout state machine (failure_count, locked_until, last_failure_at). FK CASCADE on users(id) so deleting a user atomically removes their credential. - Two new permissions seeded into r-admin only: auth.breakglass.admin — set/rotate/unlock/remove credentials. auth.breakglass.login — actor uses break-glass to log in. CanonicalPermissions extended in lockstep. internal/auth/breakglass/service.go (~580 LOC): - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED. - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4, salt=16 random bytes, output=32 bytes); per-password random salt; PHC-format hash output. Min 12 / max 256 byte input. - Authenticate: constant-time-compare via subtle.ConstantTimeCompare on every code path. Identical 401 + identical timing across the wrong-password / locked-account / non-existent-actor paths so an attacker cannot probe whether a given actor has break-glass configured. Non-existent-actor + locked-account paths run a verifyDummy() Argon2id pass for timing parity. Lockout state machine: failure_count++ on every wrong attempt; threshold (default 5) trips locked_until = NOW() + duration (default 15m). Successful Authenticate resets the counter. Reset-window: failures aged out after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h) auto-reset on next attempt. - Unlock + RemoveCredential: admin-only (auth.breakglass.admin gated at the router via rbacGate). Audit rows on every operation. - All public methods refuse to act when Enabled()==false (returns ErrDisabled; the handler maps to HTTP 404 — surface invisibility). internal/repository/postgres/breakglass.go ships the 5-method postgres impl with atomic single-statement IncrementFailure (so concurrent racing wrong-password attempts can't observe an intermediate state and slip past the threshold) and idempotent ResetFailureCount. internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP surface: - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per source IP via the existing rate limiter; returns 404 when disabled). On success sets the post-login session cookie + CSRF cookie via SessionService.Create + 204. On any failure: uniform 401 + identical timing (the service has already audited the specific failure category). - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin) - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock (auth.breakglass.admin) - DELETE /api/v1/auth/breakglass/credentials/{actor_id} (auth.breakglass.admin) Admin endpoints share the surface-invisibility property: when CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns 404 (not 403) so probing via the admin surface gets the same signal as probing the login endpoint. Tests (internal/auth/breakglass/service_test.go): All 8 Phase 7.5 spec-mandated negative cases: 1. Service.Enabled()==false → all ops return ErrDisabled. 2. Wrong password → ErrInvalidCredentials, failure_count++, audit row with event_category=auth. 3. Failure_count exceeds threshold → locked, subsequent attempts (including with the CORRECT password) return identical-shape 401 while the lockout window holds. 4. Lockout window expires → next attempt with correct password succeeds + resets the counter. 5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword. 6. Password leak hygiene — the service has zero slog calls; the audit-row map literal never includes the password plaintext. 7. Argon2id hash never appears in logs OR API responses — pinned by `json:"-"` tag on BreakglassCredential.PasswordHash + a belt-and-braces json.Marshal probe asserting the hash bytes never appear in the marshaled output. 8. Constant-time-compare verified via timing-statistical test — wrong-password vs no-credential paths take statistically indistinguishable time (within 5x ratio). The verifyDummy() hash compute on the no-credential + locked paths is what keeps timing parity; absent that, an attacker could side- channel "actor doesn't have a credential" via timing. Plus coverage-lift batch covering: SetPassword first-time vs rotate, no-caller-id rejection, no-target-id rejection, RNG failure surface, Authenticate happy-path mints session, no-credential audit row, session-mint-failure surface, FailureResetInterval recycle, Unlock + RemoveCredential happy paths, hash-format unit tests (round-trip, mismatch, malformed/wrong-version/bad-base64 formats), nil-audit + nil-session pass-through. Coverage on internal/auth/breakglass/ at 91.5% per-statement (above the Phase 7.5 spec ≥ 90% floor). cmd/server/main.go wiring: - Constructs breakglassRepo + breakglassService + breakglassHandler after the OIDC service block. - breakglassSessionMinterAdapter shim bridges *session.Service.Create to the breakglass.SessionMinter port. - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator visibility for the deliberate SSO-bypass). internal/config/config.go gains: - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID). - AuthConfig.Breakglass nested struct with 4 env vars (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION + LOCKOUT_RESET_INTERVAL). Router wiring: - 4 new breakglass routes registered when reg.AuthBreakglass != nil; public login route via direct r.mux.Handle (auth-exempt), 3 admin routes via r.Register + rbacGate(auth.breakglass.admin). - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes allowlist with Phase 7.5 justification. - SpecParityExceptions extended with 4 new entries documenting the Phase 7.5 deferral of full per-endpoint OpenAPI rows (handler doc-block at the top of auth_breakglass.go is the operator-facing reference). Threat model (encoded in service.go + auth_breakglass.go doc-blocks + migration 000038 docstrings, to be promoted to docs/operator/auth- threat-model.md in Phase 12): - Break-glass is a deliberate bypass of the SSO security boundary. An attacker who phishes the password OR finds it in a compromised password manager bypasses MFA, OIDC, and every group-claim gate. - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady- state. Enable only during SSO-broken incidents. Disable after recovery. - WebAuthn pairing (v3 per Decision 12) is the load-bearing second factor. Without it, break-glass is best treated as an emergency- only path. - Audit trail surfaces every break-glass action under event_category=auth; the auditor role can monitor for unexpected break-glass logins. Verifications: gofmt clean, go vet clean across all touched packages, go test -short -count=1 green across internal/auth/oidc (3.0s; new Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives), internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage batch passing), internal/config + internal/domain/auth + internal/api/ router + internal/api/handler all green, no regressions in Bundle 1 packages.	2026-05-10 06:51:41 +00:00
shankar0123	854135dfb7	auth-bundle-2 Phase 3: OIDC service (HandleAuthRequest, HandleCallback, RefreshKeys), hand-rolled group-claim resolver, 21+ negative-test matrix, token-leak hygiene, IdP downgrade-attack defense Phase 3 of the bundle ships the business logic that turns the Phase 2 storage primitives into a working OpenID Connect 1.0 + RFC 7636 PKCE authorization-code flow against any enterprise IdP (Okta / Azure AD / Google Workspace / Keycloak / Authentik / Auth0). Service surface: - Service.HandleAuthRequest(providerID) -> authURL, cookie, preLoginID Builds the IdP redirect with PKCE-S256 (mandatory; RFC 9700 §2.1.1), server-generated 32-byte state + nonce, persisted to the pre-login row keyed by the cookie value. - Service.HandleCallback(cookie, code, state, ip, ua) -> CallbackResult 11-step validation: pre-login lookup-and-consume (single-use), constant-time state compare, code-for-token exchange with PKCE verifier, ID-token verify (alg pin via go-oidc/v3), service-layer re-checks of iss / aud / azp (multi-aud requires it; mismatch rejected) / at_hash (REQUIRED when access_token returned — Phase 3 lifts the OIDC core "MAY" to a service-level "MUST") / exp / iat-window / nonce, group-claim resolution with userinfo fallback, group->role mapping (fail-closed on no match), user upsert, session mint via SessionMinter port. - Service.RefreshKeys(providerID) — explicit cache eviction + re-load. Re-runs the IdP downgrade-attack defense so a provider that later rotates to advertising HS / none is caught BEFORE the next user login attempt. Security posture (every fail-closed branch is a sentinel error + test): - Algorithm pinning: allow-list {RS256, RS512, ES256, ES384, EdDSA}; deny-list {HS256, HS384, HS512, none}. Belt-and-braces re-check via isDisallowedAlg after go-oidc.Verify. - PKCE-S256 mandatory (oauth2.GenerateVerifier + S256ChallengeOption); `plain` rejection sentinel exists for defense-in-depth. - State + nonce: 32-byte crypto/rand, base64url-no-pad, constant-time compare, single-use. - IdP downgrade-attack defense: at provider creation / RefreshKeys, reject any IdP whose discovery doc advertises HS* / none in id_token_signing_alg_values_supported. - JWKS fail-closed: in-flight login fails 503; existing sessions untouched. isJWKSFetchError detects the gooidc verify-error shape; ErrJWKSUnreachable is the wire mapping. - Token-leak hygiene: ID tokens, access tokens, refresh tokens, authorization codes, PKCE verifiers, state, nonce, signing key bytes — NEVER logged at any level. logging_test.go pins the invariant via a slog buffer + grep-assert across HandleAuthRequest, HandleCallback, alg rejection, and provider-load paths. Group-claim resolver (internal/auth/oidc/groupclaim/): - Hand-rolled per Decision 10 (no JSON-path lib; ~150 LOC). - URL-shape paths (https:// / http://) treated as a single literal key — Auth0 namespaced claims like https://your-namespace/groups work without splitting on the dots in the URL. - Dot-separated paths walked through nested map[string]interface{}. - []interface{} / []string / single-string normalized to []string; bool / number / object / nil → fail closed. - 18 unit tests + sentinels (ErrPathEmpty, ErrSegmentMissing, ErrSegmentNotObject, ErrInvalidValueType). Test surface: - service_test.go: 57 test functions including all 21 prompt-mandated negative cases (wrong aud / wrong iss / expired / unknown alg / alg=none / HMAC alg / azp missing on multi-aud / azp mismatched / at_hash missing / at_hash mismatched / iat in future / iat too old / nonce mismatched / state mismatched / state replayed / PKCE plain sentinel / pre-login replay / forged cookie / IdP downgrade / group-claim missing / group-claim unmapped) plus the userinfo fallback matrix (happy path + endpoint-missing + endpoint-failing + userinfo-also-empty), HandleAuthRequest entry point + RNG-failure paths, upsertUser update + create + display-name fallback + Validate-error paths, decryptClientSecret real-encrypt round-trip + bad-passphrase, alg-parser malformed-header matrix. - logging_test.go: 4 hygiene tests pinning no token / code / verifier / state / cookie / client_secret / alg name appears in any captured log line. - groupclaim/resolver_test.go: 18 cases covering Okta string-array, Keycloak realm_access.roles, Auth0 namespaced URL claim, single-string normalization, deeply-nested 3-segment walks, and every fail-closed branch. Coverage: internal/auth/oidc 92.2% (floor: 90) internal/auth/oidc/groupclaim 100.0% (floor: 95) internal/auth/oidc/domain 96.2% (floor: 90) Coverage gates added at .github/coverage-thresholds.yml so a future regression in any fail-closed branch fails CI before the commit lands. Phase 3 of cowork/auth-bundle-2-prompt.md is closed. Next up: Phase 4 (Session service: cookies, revocation, sliding-vs-absolute expiry).	2026-05-10 04:56:03 +00:00

16 Commits