mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:41:30 +00:00
harden(oidc): JWKS auto-refresh on kid-not-in-cache (MED-6)
Audit 2026-05-10 MED-6 closure.
WHAT.
When an IdP rotates its signing key between a user's /auth/oidc/login
click and the /auth/oidc/callback return, the gooidc verifier's
cached JWKS no longer contains the kid referenced by the inbound
ID token's JWS header. Pre-fix, the verify failed and the operator
had to manually hit POST /api/v1/auth/oidc/providers/{id}/refresh.
HandleCallback now distinguishes the kid-not-in-cache shape
(isKidMismatchError) from generic verify failures and runs a
one-shot recovery:
1. RefreshKeys(providerID) — evict + re-fetch discovery + JWKS,
re-run alg-downgrade defense
2. getOrLoad(providerID) — refresh the cached providerEntry
3. verifier.Verify(rawJWT) — one-shot retry against new JWKS
A second failure surfaces through the original error branches
(ErrJWKSUnreachable for fetch errors, generic wrap for everything
else). NO retry loop — bounded recovery only.
WHY.
Operators on multi-tenant IdPs (Keycloak realms, Auth0 tenants,
Azure AD apps) rotate signing keys on a 24-72h cadence. Between
the rotation event and the operator's manual refresh call, every
in-flight handshake fails with a generic verify error. The fix is
both an UX improvement (auto-recovery, no operator intervention)
AND a security improvement (the audit row now distinguishes
'transient rotation race' from 'genuine forgery attempt' via the
prelogin_kid_mismatch_recovered category vs generic id_token verify
failures).
HOW.
internal/auth/oidc/service.go:
- HandleCallback's Verify-failure branch checks isKidMismatchError
BEFORE the existing isJWKSFetchError branch. On match, runs
RefreshKeys + getOrLoad + verifier.Verify exactly once. On
success, idToken := retried and err := nil; falls through to
the existing Step 5 onwards. On any failure in the retry path,
surfaces via the original branches unchanged.
- isKidMismatchError matcher: pinned go-oidc/v3 v3.18.0 substrings
('kid .* not found', 'signing key .* not found', 'no matching
key', 'key with id .* not found'). Intentionally narrow — a
generic 'invalid signature' must NOT trigger refresh (forged
tokens would otherwise produce unbounded refresh load on the
JWKS endpoint).
internal/auth/oidc/service_test.go:
- TestIsKidMismatchError_GoOIDCV318Strings pins the canonical
substrings + asserts 'invalid signature' does NOT trip the
matcher.
- TestService_HandleCallback_MED6_AutoRefreshOnKidMiss runs an
end-to-end rotation against mockIdP: handshake 1 primes the
JWKS cache; rotateMockIdPKey() rotates the IdP's RSA key + kid;
handshake 2 trips the kid-mismatch branch, the auto-refresh
fires, the second verify succeeds against the new key.
VERIFY.
- go vet ./internal/auth/oidc/... PASS
- go test -short -count=1 -run 'MED6|KidMismatch'
./internal/auth/oidc/... PASS (2/2)
- go test -short -count=1 ./internal/auth/oidc/... PASS (4.3s)
Out of scope: Nit-5's RotateRealmKeys-backed Keycloak integration
test (build-tagged 'integration') — that's the realm-running
counterpart to the mockIdP-based MED-6 test added here; tracked
separately as item 20 in HANDOFF.md.
Refs: cowork/auth-bundles-audit-2026-05-10.md MED-6
cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 3
This commit is contained in:
@@ -34,6 +34,20 @@
|
||||
RFC-9207 discovery. Providers that don't advertise support (the majority
|
||||
today) keep pre-fix behavior — back-compat is preserved.
|
||||
|
||||
- **JWKS auto-refresh on cache-miss (Audit 2026-05-10 MED-6).** When
|
||||
the IdP rotates its signing key between pre-login + callback, the
|
||||
cached JWKS no longer contains the kid referenced by the inbound ID
|
||||
token's JWS header. Pre-fix, the verify failed with a generic error
|
||||
and the operator had to manually call `POST
|
||||
/api/v1/auth/oidc/providers/{id}/refresh`. The service now detects
|
||||
the kid-not-in-cache shape (`isKidMismatchError`) and runs a
|
||||
one-shot `RefreshKeys` (evict cache → re-fetch discovery + JWKS →
|
||||
re-run alg-downgrade defense) before retrying the verify exactly
|
||||
once. Bounded recovery: a second failure surfaces as
|
||||
`ErrJWKSUnreachable` per the original branches; no retry loop. A
|
||||
separate matcher (`isKidMismatchError`) is intentionally narrow
|
||||
so generic signature failures don't trigger refresh.
|
||||
|
||||
- **OIDC provider test endpoint (Audit 2026-05-10 MED-5).** New
|
||||
`POST /api/v1/auth/oidc/test` dry-runs an OIDC provider configuration
|
||||
without persisting: fetches the discovery doc, runs the alg-downgrade
|
||||
|
||||
Reference in New Issue
Block a user