Phase-10 live-IdP smoke (Keycloak 26.x via testcontainers-go) revealed
the IdP-bind alg-downgrade check was too strict for real-world IdPs.
6 of the integration tests in internal/auth/oidc/integration_keycloak*_test.go
were failing with:
oidc: IdP advertises weak signing algorithms (HS*/none);
refusing to use as defense against downgrade attacks: HS256
Keycloak 26.x (and several other real-world IdPs — Auth0 when HS-mode is
enabled, some Authentik configs) advertise EVERY alg they're capable of
in the discovery doc's id_token_signing_alg_values_supported field, even
when the realm only signs with RS256 in practice. Pre-fix the IdP-bind
check refused on ANY HS* or 'none' advertisement → no real Keycloak deploy
could ever bind a provider row, hence the integration-test failures.
The strict-deny check was defense-in-depth on top of the load-bearing
per-token alg-pin at sig-verify time (isDisallowedAlg, service.go L1177):
that check rejects every ID token whose JWS header carries an alg outside
DefaultAllowedAlgs, regardless of what the discovery doc advertises.
A forged HS256 token signed with the IdP's RS256 pubkey as HMAC secret
is rejected at sig-verify time → the actual algorithm-confusion attack
is closed by the per-token pin, NOT by the discovery-doc check.
Fix: relax the IdP-bind check to refuse only when the intersection of
advertised vs DefaultAllowedAlgs is EMPTY (the pathological all-weak-alg
IdP case). Keycloak (RS256 + HS256 advertised) now binds successfully;
an HS-only IdP still fails closed.
Changes:
- internal/auth/oidc/service.go: rewrite the alg-check loop at L1067 in
getOrLoad / RefreshKeys to compute the intersection set; refuse only
when no acceptable alg is advertised. ErrIdPDowngradeAdvertised
docstring updated to reflect new contract. DefaultAllowedAlgs
docstring + the package-level design-comment block at L40-72 updated
with v2.1.0-relaxed semantics callouts.
- internal/auth/oidc/test_discovery.go: TestDiscovery dry-run validator
rewritten to surface HS*/none alongside RS* as an informational note
('note: IdP advertises weak algorithms %v alongside acceptable ones')
rather than a hard-fail error. HS-only / none-only still hard-fails.
- internal/auth/oidc/service_test.go: TestService_IdPDowngradeDefense_*
tests updated. Renamed:
- RejectsHSAdvertised → RS256PlusHS256_BindsSuccessfully (positive)
- RejectsNoneAdvertised → RejectsHSOnlyAdvertised (intersection-empty)
- RefreshKeys_CatchesPostLoadDowngrade rotated to HS-only post-load
- internal/auth/oidc/coverage_fill_test.go: TestTestDiscovery_AlgDowngradeDetected
split into _HS256AlongsideRS256_BindsWithNote (positive, asserts note
but no hard-fail) + _HSOnly_StillTrips_HardFail (intersection-empty).
- docs/operator/auth-threat-model.md: OIDC token-validation alg-allow-list
section rewritten to call out the load-bearing-defense hierarchy
(per-token pin first, IdP-bind check defense-in-depth) and document
the v2.1.0 relaxation rationale.
- CHANGELOG.md: ### Security entry under Unreleased.
Verify: go test ./internal/auth/oidc/ -short PASS; gofmt clean; go vet
clean. The Keycloak integration tests should now pass when the operator
re-runs 'make keycloak-integration-test'.
40 KiB
Authentication & authorization threat model
Last reviewed: 2026-05-10
This document describes the attack surface around authentication and
authorization in certctl after Bundle 1 (the RBAC primitive) AND Bundle
2 (OIDC + sessions + back-channel logout + break-glass) land. It
complements rbac.md and the per-IdP runbooks at
oidc-runbooks/index.md - those docs
explain how to USE the controls; this one explains what those controls
defend against and which threats they explicitly do NOT close.
The post-Bundle-2 attack surface is meaningfully wider than Bundle 1's: Bundle 1 closed the API-key axis (one credential type, one validation path); Bundle 2 adds OIDC-federated humans, session cookies with length-prefixed HMAC + CSRF, back-channel logout, OIDC first-admin bootstrap, and a default-OFF break-glass admin path. Each surface brings its own threat catalogue + mitigations, documented below alongside the Bundle 1 ones.
Threat actors
- External attacker with no credential - probing the public HTTP surface. The default trust boundary for everything except the protocol-level endpoints (ACME / SCEP / EST / OCSP / CRL, which authenticate via embedded credentials per their own RFCs).
- Authenticated caller with the wrong role - has a valid API key but the role doesn't grant the requested operation. The primary RBAC threat model.
- Compromised API key - attacker holds a valid Bearer token that an honest operator originally provisioned. The key may carry any role.
- Insider operator - legitimate access; potentially trying to escalate privilege or bypass the approval workflow.
- Compromised audit reviewer (auditor role) - read-only access to audit events but otherwise untrusted.
The following actors are NEW with Bundle 2:
- OIDC-federated end user - authenticates via the organization's IdP (Keycloak / Okta / Auth0 / Entra ID / Authentik / Workspace-via-broker). The user's credential lives at the IdP; certctl never sees it. Attack vectors center on token forgery, session hijacking, and group-claim manipulation.
- Stolen session cookie holder - attacker holds a valid
certctl_sessioncookie value (typically via XSS, network MITM, or a developer who pasted a token into a chat / pastebin). Holds the attacker-side ability to make requests as the legitimate user until the cookie expires (idle 1h / absolute 8h defaults) or is revoked. - Compromised IdP - the upstream IdP itself is rogue: signs tokens for arbitrary users, mints groups arbitrarily, etc. Largely out of certctl's control; mitigations are bounded to "the audit trail records the source provider on every login, blast radius is bounded by group_role_mapping configured for that provider."
- Break-glass-password holder (Phase 7.5 path) - operator with the local Argon2id password set up for SSO outages. Bypasses the OIDC + group-claim layer entirely. The default-OFF posture is the load-bearing mitigation; once enabled the password is the entire attack surface.
Defenses Bundle 1 ships
API-key authentication
- API keys live in
CERTCTL_API_KEYS_NAMED(env-var) orapi_keys(DB row, written by Bundle 1 Phase 6 bootstrap and the future role-management API). Keys hash via SHA-256; the middleware compares hashes viacrypto/subtle.ConstantTimeCompareto defeat timing attacks. - The auth middleware populates
ActorIDKey/ActorTypeKey/TenantIDKeyon every authenticated request context. Audit rows attribute every action to the named-key actor instead of the pre-Bundle-1 hardcodedapi-key-userplaceholder. - Demo mode (
CERTCTL_AUTH_TYPE=none) injects the syntheticactor-demo-anonactor with admin grants. Production deploys MUST NOT use demo mode.
Authorization (RBAC)
- Every gated handler routes through
auth.RequirePermission(or the router-levelrbacGatewrap from Phase 3.5). The middleware resolves the actor's effective permissions via theAuthorizer.CheckPermissionservice-layer call; on miss, the handler returns HTTP 403 BEFORE the body runs. This is the load-bearing gate. - The five admin-only fine-grained perms (
cert.bulk_revoke/crl.admin/scep.admin/est.admin/ca.hierarchy.manage) are seeded intor-adminonly. To delegate one, an operator creates a custom role with the specific perm and grants it to the right actor. - The auditor split:
r-auditorholds onlyaudit.read+audit.export. Pinned by theinternal/domain/auth/auditor_test.goinvariants. A regulator with the auditor key cannot read certificates, profiles, issuers, or any mutating surface. - The privilege-escalation guard: granting or revoking a role
requires the caller to hold
auth.role.assign(enforced ininternal/service/auth/actor_role_service.go). A non-admin cannot self-grant admin. - The reserved-actor guard: mutations against
actor-demo-anonreturn HTTP 409 from the service layer (ErrAuthReservedActor). The synthetic actor is operator- inaccessible.
Day-0 bootstrap
CERTCTL_BOOTSTRAP_TOKENis constant-time-compared byEnvTokenStrategy.Validate. The strategy is one-shot viasync.Mutex-guardedconsumedbool; the second call returnsErrDisabled(HTTP 410), notErrInvalidToken(HTTP 401), so a probing attacker cannot distinguish "wrong token, retry" from "already consumed".- The strategy also re-probes admin existence on every Validate. If an admin actor lands during the gap between Available and Validate, the second caller still gets HTTP 410.
- The minted plaintext key is written to the response body once.
It is NEVER logged. The token-leak hygiene test in
internal/api/handler/auth_bootstrap_test.goredirectsslog.Defaultto a buffer and grep-asserts that neither the bootstrap token nor the minted key appears in any log line, audit row, or HTTP header. - The minted key is hashed before persistence. Lost key → rotate via the regular RBAC API; the plaintext is not recoverable from the DB.
Approval workflow + Phase 9 loophole closure
CertificateProfile.RequiresApproval=truegates two surfaces: (a) issuance + renewal of every cert pointing at the profile, (b) edits to the profile itself (Bundle 1 Phase 9). The Phase 9 closure prevents the flip-flop bypass where an admin disables approval, mutates, re-enables.- Same-actor self-approve is rejected at the service layer with
ErrApproveBySameActorfor bothcert_issuanceandprofile_editkinds. Two-person integrity is the load-bearing invariant; pinned by tests ininternal/service/approval_test.go.
Audit trail
- Every mutating operation flows through
AuditService.RecordEventorRecordEventWithCategory. Bundle 1 Phase 8 added theevent_categorycolumn with aCHECKconstraint enforcing the closed enum (cert_lifecycle/auth/config); the category surfaces the auth-mutation slice to the auditor view. - The WORM trigger from migration 000018
(
audit_events_worm_trigger) blocksUPDATEandDELETEat the database layer. Even an admin DB user cannot tamper with audit history without dropping the trigger. - Bundle-6's redactor (
internal/service/audit_redact.go) scrubs credentials + PII from thedetailsJSONB before persistence; an_redacted_keysfield surfaces what the redactor took out for compliance review.
Protocol-endpoint allowlist
ACME / SCEP / EST / OCSP / CRL endpoints authenticate via
embedded credentials defined by their own RFCs (JWS-signed,
challenge passwords, mTLS, public-by-RFC). The auth middleware
explicitly bypasses these via IsProtocolEndpoint. The Phase 12
internal/api/router/phase12_protocol_allowlist_test.go pins
the invariant at three layers (middleware bypass, allowlist
constant, router-level no-rbacGate-wraps-protocol-paths).
Defenses Bundle 2 ships
OIDC token validation (Phase 3)
- Algorithm allow-list, never
none, never HMAC. The service- layer pinning lives ininternal/auth/oidc/service.go::disallowedAlgsisDisallowedAlg. The per-token alg check at sig-verify time (isDisallowedAlg, line ~1177) is the load-bearing defense — every ID token whose JWS header carries an alg outside the allow-list (RS256 / RS512 / ES256 / ES384 / EdDSA) is rejected withErrAlgRejected. coreos/go-oidc additionally enforces the allow-list per-token at verify time as defense-in-depth against an upstream library regression. The IdP-downgrade-attack secondary defense at provider creation /RefreshKeys(v2.1.0-relaxed semantics) intersects the IdP's advertisedid_token_signing_alg_values_supportedwith the allow-list and rejects only when the intersection is EMPTY — i.e., the IdP advertises NO acceptable alg. Pre-v2.1.0 the check strict-denied on ANY HS*/noneadvertisement; that broke against Keycloak 26.x (which lists every alg it's capable of in its discovery doc, including HS*, even when the realm only signs with RS256). The relaxation is safe because the per-token alg pin already prevents a real algorithm-confusion attack — a forged HS256 token using the IdP's RS256 pubkey as HMAC secret is rejected at sig-verify regardless of what the discovery doc advertises. Operators worried about a compromised IdP rotating to weak algs without rotating its certctl provider config get defense-in-depth fromJWKSStatus+ the alert hooks in the GUI panel.
- Exact
issmatch. ID-tokenissclaim must equal the configuredOIDCProvider.IssuerURLbyte-for-byte (sentinelErrIssuerMismatch). A token from a different IdP - even one with the sameaud- cannot ride a misconfigured provider row. aud+azpchecks. Service-layer re-verification of the audience claim (must includeclient_id) plus theazpclaim for multi-aud tokens (per OIDC core §3.1.3.7 step 5; sentinelsErrAudienceMismatch,ErrAZPRequired,ErrAZPMismatch). An attacker with a token issued for a different client cannot replay it against certctl.at_hashREQUIRED when access_token is present. OIDC core treatsat_hashas a "MAY"; certctl tightens to "MUST" (ErrATHashRequired). A substituted access token cannot ride alongside a clean ID token through the verifier.- Single-use state + nonce. Both 32-byte random server-generated
values, persisted in the pre-login row keyed by the cookie. The
pre-login row is consumed via
DELETE...RETURNINGon lookup (atomic single-use).subtle.ConstantTimeCompareon both. State replay returnsErrPreLoginNotFound; nonce mismatch returnsErrNonceMismatch. - PKCE-S256 mandatory. RFC 9700 §2.1.1 requires PKCE on auth-
code; certctl hard-codes S256 via
oauth2.GenerateVerifier+oauth2.S256ChallengeOption. Theplainmethod is not just unsupported - theErrPKCEPlainRejectedsentinel exists so a future regression that surfaces a plain path trips a test. iatwindow. Configurable per-provider (default 300s, capped at 600s by the domain validator). Defends against clock-skew attacks where an attacker submits a stale-but-valid token.- JWKS rotation handled transparently by coreos/go-oidc's built-
in cache, plus the operator-triggered
Service.RefreshKeysfor forced refresh (and the auto-refresh on JWKS-cache TTL expiry, default 3600s). - JWKS-fetch failure during a key rotation: fail closed. The
service maps go-oidc's network errors to
ErrJWKSUnreachable(HTTP 503 to the in-flight login). Existing sessions are untouched. No exponential backoff, no auto-retry; the operator triages. - Encrypted
client_secretat rest. AES-256-GCM viainternal/crypto.EncryptIfKeySet(the same v3-blob path issuer- target credentials use). The
client_secret_encryptedcolumn isjson:"-"on the domain type so a misconfigured handler cannot wire-leak.
- target credentials use). The
Session minting + cookies (Phases 4 + 6)
- Length-prefixed HMAC. Cookie wire format is
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>. HMAC input is length-prefixed aslen(sid):sid:len(kid):kid- NOT bare-concat. The bare-concat form admits a collision
attack:
<a, bc>and<ab, c>produce identical HMAC inputs, letting a forger swap one byte across the boundary. Pinned byTestComputeHMAC_LengthPrefixDefeatsConcatCollision+TestService_Validate_ConcatenationCollisionDefeatedByLengthPrefix. Thev1.version prefix is reserved; unknown prefixes are rejected with no fallback.
- NOT bare-concat. The bare-concat form admits a collision
attack:
- Cookie hardening.
HttpOnly=true(no JS access; defends XSS cookie theft),Secure=true(HTTPS-only; defends network MITM given HTTPS-Everywhere v2.2 milestone),SameSite=Laxdefault (configurable to Strict viaCERTCTL_SESSION_SAMESITE),Path=/, no domain attribute (host-only). - Idle + absolute timeouts. 1h idle / 8h absolute defaults
(configurable via
CERTCTL_SESSION_IDLE_TIMEOUT/_ABSOLUTE_TIMEOUT). The session row trackslast_seen_at,idle_expires_at,absolute_expires_atindependently; the scheduler'ssessionGCLoop(default 1h) sweeps expired rows. - CSRF defense. Plaintext CSRF token in the JS-readable
certctl_csrfcookie (intentionallyHttpOnly=falseso the GUI reads it for theX-CSRF-Tokenheader). SHA-256 hash on the session row.CSRFMiddlewareon state-changing methods usessubtle.ConstantTimeCompareagainst the hash. API-key actors (no session row) are CSRF-exempt - pinned by the bundle-1-compat CI guard. - Optional defense-in-depth IP / UA bind (default OFF;
CERTCTL_SESSION_BIND_IP/_BIND_USER_AGENT). Mismatch returnsErrSessionIPMismatch/ErrSessionUAMismatch. Use with care - mobile clients on changing networks fail closed. - Signing-key rotation primitive.
RotateSigningKeymints a new HMAC key; the old key stays valid for the configured retention window (default 24h viaCERTCTL_SESSION_SIGNING_KEY_RETENTION) so existing cookies validate during the rollover. Past retention, the old key's row is dropped and any cookie still signed under it returnsErrSigningKeyNotFound. - EnsureInitialSigningKey is fail-fatal at server boot. Wired
in
cmd/server/main.govialogger.Error + os.Exit(1)so a server with a broken DB or RNG cannot boot into a state where session validation is impossible. - Pre-login cookie discriminated from post-login. Pre-login
carries the
pl-id prefix; post-login carriesses-. Defense- in-depth:Validaterejects pre-login cookies (pinned byTestService_Validate_RejectsPreLoginCookieAtPostLoginGate) so a stolen pre-login cookie cannot be replayed against the post-login gate.
Back-channel logout (Phase 5)
- OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414).
Endpoint:
POST /auth/oidc/back-channel-logout. The IdP signs a logout JWT and POSTs it to certctl when a user logs out at the IdP. The handler validates the JWT against the IdP's JWKS via the same alg allow-list as the login flow. - Required claims pinned.
iss/aud/iat/jti/events(with the spec-mandated logout event type); exactly one ofsub/sid;nonceMUST be absent (per spec §2.4- logout tokens MUST NOT carry a nonce). All four pinned by Phase 5 negative tests.
jti-based replay defense. The Phase 5 implementation tracks recently-seenjtivalues to defeat logout-token replay attacks where an attacker captures a logout JWT and replays it.- Cache-Control: no-store on the response per spec §2.5.
OIDC first-admin bootstrap (Phase 7)
- Coexists with Bundle 1's env-var-token bootstrap. Both can be configured; the admin-existence probe ensures only one wins.
- Group-scoped.
CERTCTL_BOOTSTRAP_ADMIN_GROUPSis a comma- separated allowlist of IdP group names; users in any one of those groups become admins on FIRST login per tenant. Non-empty intersection with the user's resolved groups is required. - One-shot per tenant via admin-existence probe. Once any actor
holds
r-adminin the tenant, the bootstrap hook silently falls through to normal mapping (no admin grant). Operators rely on this to avoid an "always-admin-on-login" backdoor. - Explicit OIDC provider gate.
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_IDpins which provider's tokens are eligible. A multi-IdP deploy cannot have any provider's group claims become admin. - Audit row on every grant.
bootstrap.oidc_first_adminevent withevent_category=auth+ INFO log; the auditor monitors.
Break-glass admin (Phase 7.5)
- Default-OFF.
CERTCTL_BREAKGLASS_ENABLED=falseis the default; the entire surface (4 endpoints) is disabled. Operators flip it on during SSO incidents and back off after recovery. - Surface invisibility via 404-not-403. Every endpoint returns
HTTP 404 when disabled - public login AND admin endpoints. A
scanner cannot distinguish "endpoint disabled" from "endpoint
doesn't exist." All five service-layer methods short-circuit with
ErrDisabledbefore any DB lookup; the handler maps tohttp.NotFound. - Argon2id with OWASP 2024 params.
m=64MiB,t=3,p=4, 16-byte salt, 32-byte output, per-password random salt, PHC-format hash. The hash column isjson:"-"so handlers cannot wire-leak. - Lockout state machine.
CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD(default 5) failures withinCERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL(default 1h) trip aCERTCTL_BREAKGLASS_LOCKOUT_DURATIONlock (default 30s; bumped from 100ms after the test discovered Argon2id verify itself takes ~80-200ms each, making a millisecond-scale lockout invisible). Atomic single-statementIncrementFailuredefeats concurrent racing attempts. IdempotentResetFailureCount. - Constant-time across all failure paths.
verifyDummy()runs a real Argon2id pass against an all-zeros throwaway salt on the no-credential and locked-account paths so all three failure modes (wrong password / locked / no actor) take statistically indistinguishable time. Pinned byTestPhase7_5_ConstantTimeAcrossWrongPasswordAndNoCredentialPaths(asserts within 5x ratio on durations). - Audit row + WARN log at boot.
auth.breakglass_login_*events withevent_category=auth.cmd/server/main.goemits a WARN-level log whenENABLED=trueso the operator's log review notices an over-long enablement. - Rate limit on the public login endpoint. 5 attempts/minute
via the existing
middleware.NewRateLimiter.
Bundle 2 threat catalogue
The following sub-sections enumerate the threat surface introduced by Bundle 2 and the mitigations the platform ships. They are deliberately exhaustive - if a threat is listed here it has a concrete mitigation or a documented "operator-driven, out of scope" framing. New threats discovered post-2026-05-10 should be added here with a dated commit note.
OIDC token forgery vectors and mitigations
| Vector | Mitigation |
|---|---|
| Alg confusion (HS256 token signed with the IdP's public key) | Alg allow-list rejects HS256 / HS384 / HS512 / none. Service-layer + go-oidc enforce in two layers. IdP-downgrade-attack defense at provider-creation time. |
| Audience injection (token issued for a different client) | Service-layer aud re-check post-go-oidc verify; multi-aud tokens require matching azp. Sentinels ErrAudienceMismatch / ErrAZPRequired / ErrAZPMismatch. |
| Issuer mismatch (token from a different IdP with the same alg + key shape) | Exact iss string match (ErrIssuerMismatch). The 21-case Phase 3 negative-test matrix pins the byte-for-byte requirement. |
| Nonce replay (capturing a fresh token + replaying with the same nonce) | Single-use nonce stored in the pre-login row; LookupAndConsume is DELETE...RETURNING (atomic). Second use returns ErrPreLoginNotFound. |
| State replay (CSRF on the IdP redirect) | Same single-use mechanism as nonce. State is subtle.ConstantTimeCompared. |
at_hash substitution (clean ID token with a swapped access token) |
at_hash REQUIRED when access_token present (Phase 3 tightening of OIDC core's MAY → MUST). ErrATHashRequired if missing; ErrATHashMismatch if non-matching. |
iat window manipulation (stale token replay) |
iat_window_seconds configurable per-provider (default 300, cap 600). Future iat returns ErrIATInFuture; older-than-window returns ErrIATTooOld. |
| JWKS rotation mid-login | coreos/go-oidc's built-in cache + auto-refresh on TTL expiry. Operator-triggered Service.RefreshKeys for forced refresh. |
| JWKS-fetch failure during a key rotation | ErrJWKSUnreachable (HTTP 503 to in-flight login). Existing sessions untouched. Operator clicks "Refresh discovery cache" once IdP recovers. No exponential backoff. |
Session hijacking vectors and mitigations
| Vector | Mitigation |
|---|---|
| Cookie theft via XSS | HttpOnly on the session cookie; CSP headers from Bundle B's H-1 work prevent inline-script execution. |
| Cookie theft via network MITM | Secure flag + TLS 1.3-only control plane (HTTPS-Everywhere v2.2 milestone). |
| CSRF on state-changing methods | SameSite=Lax default + double-submit-cookie pattern with hashed CSRF token on the session row. CSRFMiddleware fires on POST/PUT/PATCH/DELETE for session-authenticated callers; API-key actors are exempt. |
| Session-cookie forgery via concatenation collision | Length-prefixed HMAC input (len(sid):sid:len(kid):kid). Pinned by two tests + a doc-block at the top of service.go. |
| Stolen-cookie replay (attacker uses a valid cookie until expiry) | Short idle timeout (1h default) + admin-revoke-all-for-actor + back-channel logout from IdP + GUI session revocation. |
| Cross-tab session interference | Cookie value is opaque + length-prefixed; tabs sharing the cookie share the session row. Sign-out in one tab calls POST /auth/logout; the next request from any tab gets a missing-row 401. |
| Session-row race on sign-out vs in-flight request | Validate is the single point that reads the row; missing row = 401. There is no "stale read" path because every request re-validates. |
IdP compromise scenarios
A rogue IdP issues malicious tokens (signs tokens for arbitrary users, mints arbitrary groups, etc.). Mitigations are largely out of certctl's control - the trust root is the IdP. Documented behaviors:
- Operator should monitor IdP audit logs. Federated identity is
only as trustworthy as the IdP it federates from. The
issclaim on every certctl audit row points at the source IdP so the operator can correlate against IdP-side audit. - Operator can rotate group-role mappings from the GUI without
redeploying. If the IdP is compromised but not yet
decommissioned, the operator can dial down access via
Auth → OIDC Providers → <provider> → Group → role mappingsand remove every mapping. Subsequent logins fail closed (ErrGroupsUnmapped); existing sessions continue until expiry. - The audit trail records every OIDC login including the source
provider. Blast radius is bounded by the
group_role_mappingtable for that provider. A compromised provider configured with onlyengineers → r-operatorcannot escalate tor-adminvia any token forgery. - The provider-delete path returns 409 when sessions exist for it.
ErrOIDCProviderInUseforces the operator to revoke the provider's active sessions before deletion - prevents accidental loss of audit lineage on a hot incident.
Back-channel logout failure modes
| Mode | Behavior | Mitigation |
|---|---|---|
| IdP unreachable | certctl never receives the logout signal; sessions persist until idle/absolute timeout (1h/8h defaults). | Operator keeps absolute timeout short relative to risk tolerance. Manual revoke via GUI is always available. |
| Logout token signature invalid | certctl returns 400; no session revoked; auth.oidc_back_channel_logout_failed audit row. |
Operator-monitored audit row surfaces forged-logout-token attempts. |
| Logout token replay (attacker captures + replays a valid logout JWT) | jti-based deduplication rejects the replay; first delivery succeeds, second returns 400. |
Pinned by Phase 5 negative tests. |
| Logout token alg confusion | Same alg allow-list as the login flow; HS-family rejected. | Phase 3 alg allow-list applies to BCL too (same Provider.RemoteKeySet). |
Missing events claim |
Spec §2.4 requires the OIDC-defined logout event type; missing returns 400. | Pinned by negative test. |
nonce claim present |
Spec §2.4 requires nonce MUST NOT appear in logout tokens; presence returns 400. |
Pinned by negative test. |
Group-claim manipulation
Per-IdP group-claim shapes are documented in
oidc-runbooks/index.md. Manipulation
threats:
| Vector | Mitigation |
|---|---|
Operator misconfigures mapping (e.g. engineers → r-admin instead of r-operator) |
auth.group_mapping_added / _removed audit row with event_category=auth. The auditor role monitors. |
Operator misconfigures groups_claim_path (e.g. groups when Auth0 emits https://your-namespace/groups) |
User's group claim is ignored, user lands at "no roles assigned" screen. The GUI's OIDC provider detail page surfaces the configured path so the operator can verify. |
IdP renames a group (e.g. engineers → eng-team) |
Mappings silently break; users get fewer roles than expected. auth.oidc_login_unmapped_groups audit row fires on every such login; auditor monitors for unexpected spikes. |
| IdP user maintainer adds a user to an unintended group | Group is mapped to a higher-privilege role than intended; user gets the role on next login. Bounded blast radius: the group→role mapping is what they got, not arbitrary admin. Defense-in-depth: review mappings periodically; the auditor role can pull auth.oidc_login_succeeded rows by details.subject to spot drift. |
Bootstrap phase risks (post-Bundle-2)
This section extends Bundle 1's bootstrap section with the OIDC first-admin path.
| Vector | Mitigation |
|---|---|
CERTCTL_BOOTSTRAP_TOKEN (Bundle 1 fallback) leaks |
One-shot via consumed bool + admin-existence probe. Both arms close the path the moment any admin lands. (Bundle 1.) |
CERTCTL_BOOTSTRAP_ADMIN_GROUPS misconfigured to a wide group (e.g. everyone) |
Unintended user becomes admin on first OIDC login. Mitigation: scope-down via certctl-cli auth keys scope-down --suggest. Operators configure narrow groups. The audit row on bootstrap.oidc_first_admin surfaces every grant. |
| Both bootstrap strategies enabled simultaneously | Whichever fires first wins; the second sees admin-already-exists and falls through to normal mapping. No double-admin landing. |
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID left unset with multi-IdP deploy |
Hook fires on ANY provider's tokens. Mitigation: explicit gate documented in cmd/server/main.go startup logging; operator audit reviewed pre-tag. |
Break-glass risks (Phase 7.5)
| Vector | Mitigation |
|---|---|
| Phished password (operator gives password to attacker) | Bypasses OIDC + every group-claim gate. Mitigation: default-OFF posture; lockout after 5 failures; WebAuthn pairing (v3 / Decision 12) closes the gap properly. |
| Brute-force online | Lockout state machine + 5/min rate limit on /auth/breakglass/login. |
| Brute-force offline (DB compromise) | Argon2id with OWASP 2024 params (~80-200ms per verify). Cracking remains expensive even with GPU. |
| Operator forgets to disable post-incident | Break-glass becomes a permanent backdoor. Mitigation: WARN log at boot when ENABLED=true; audit row on every break-glass login; runbook prescribes "disable within 24h of SSO recovery." |
| Side-channel timing on no-credential vs wrong-password vs locked | All three paths take statistically indistinguishable time via verifyDummy(). Pinned by the timing-statistical test. |
| Surface fingerprinting (scanner identifies break-glass exists) | All four endpoints return 404 (NOT 403) when disabled. Surface-invisibility - identical to a non-existent route. |
Reserved-actor actor-demo-anon mutation via break-glass admin |
Service layer rejects with ErrAuthReservedActor (HTTP 409). Same gate as the Bundle 1 RBAC path. |
Token-leak hygiene (the explicit grep policy)
ID tokens, access tokens, refresh tokens, authorization codes, PKCE verifiers, state, nonce, signing keys, break-glass passwords MUST NEVER appear in any log line at any level.
The invariant is enforced by per-package logging_test.go files that
redirect slog.Default to a buffer, run the service paths, and
grep-assert the secret values are absent from every captured line.
Bundle 1's internal/auth/bootstrap/service_test.go is the pattern.
Phases 3, 4, and 7.5 follow the same shape:
internal/auth/oidc/logging_test.go- token / code / verifier / state / nonce / cookie / client_secret / alg name absent from HandleAuthRequest, HandleCallback, alg-rejection, and provider- load paths.internal/auth/session/service_test.go- signing-key bytes absent from cookie-mint + validate paths.internal/auth/breakglass/service_test.go- plaintext password + Argon2id hash absent from every audit row + log line + HTTP-response shape (json:"-" probe viajson.Marshal).
The details JSONB column on audit_events runs through
Bundle-6's redactor (internal/service/audit_redact.go) before
persistence; the redactor's allow-list is conservative enough that
adding a new token-shaped field to a new audit row defaults to
redacted, not leaked.
Threats Bundle 1 does NOT close (Bundle 2 closure status)
The list below was the Bundle-1-era deferred-threats catalogue. Status updated 2026-05-10 to reflect what Bundle 2 closed and what remains deferred. The label "Bundle 1 does NOT close" is preserved for historical traceability; readers should consult the marker at the end of each item for current status.
- OIDC / SAML / WebAuthn federation - ✅ OIDC closed (Bundle 2 Phases 1-7); SAML deferred to v3; WebAuthn deferred to v3 (Decision 12 - WebAuthn pairs with break-glass for hardware- token-MFA). The break-glass path (Phase 7.5) is a partial mitigation for the no-MFA case during SSO incidents.
- Session management - ✅ closed (Bundle 2 Phases 4 + 6). HMAC-
signed
certctl_sessioncookie with length-prefixed wire format, 1h idle / 8h absolute expiry, scheduler-driven GC, server-side revocation list (delete the row), GUI's "Sessions" page surfaces own + all-actor revocation, back-channel logout from the IdP. - Local password accounts (break-glass) - ✅ closed (Bundle 2 Phase 7.5). Argon2id + lockout + default-OFF + 404-not-403 surface invisibility. NOT for general human auth - only the "SSO is broken, need admin access right now" path. WebAuthn pairing on the v3 roadmap.
- Time-bound role grants / JIT elevation - still deferred to
v3. The schema still reserves
actor_roles.expires_atwith no UI/API to set it. Bundle 2 introduces session-level idle/absolute expiry but does not propagate that to role grants. - MFA / hardware tokens for the operator console - ⚠️ partial closure. WebAuthn / FIDO2 second factor remains v3 (Decision 12). Bundle 2's break-glass (Phase 7.5) provides a separate password factor that operators can pair with OIDC, but it's not a true second factor on the OIDC login path - the OIDC IdP remains the sole token source on the federation path.
- Rate limiting on the bootstrap endpoint - acceptable
(one-shot by construction; per-IP rate limiting on the broader
API is in place via Bundle C's
middleware.NewRateLimiter). Bundle 2 adds the same rate-limit primitive to the break-glass/auth/breakglass/loginendpoint at 5/min. scope_idFK enforcement - still deferred. Operators can grant a permission at scopeprofile/p-boguswithout the bogus profile existing. The gate still works (no rows match at request time) but a strict 404 on grant would be cleaner.TODO(bundle-2)comment is nowTODO(v3).- OIDC-first-admin bootstrap - ✅ closed (Bundle 2 Phase 7).
CERTCTL_BOOTSTRAP_ADMIN_GROUPS+CERTCTL_BOOTSTRAP_OIDC_PROVIDER_IDenv vars + group-scoped + admin-existence-probe. - GUI E2E suite via Playwright - still deferred to a follow-on bundle. The Phase 8 GUI ships 28 new Vitest unit-test cases (5 new test files); full Playwright E2E for the 15 flow checks from the Bundle 2 prompt's Phase 8 (auth-code login + group-claim parsing + revoke-revokes-session + JWKS rotation + etc.) is the operator's call on whether to land before tag.
Threats Bundle 2 does NOT close
These are the v3 / future-work deferrals at the post-Bundle-2 mark:
- WebAuthn / FIDO2 second factor - operator console is OIDC (or break-glass password) only. No hardware-token requirement even on the admin path. Decision 12.
- Time-bound role grants / JIT elevation - the
actor_roles.expires_atcolumn exists, no UI/API yet. - SAML federation - OIDC only. Operators on SAML-only IdPs use the broker pattern (run Keycloak as a SAML-to-OIDC bridge); see the Google Workspace runbook for the same broker shape.
- Multi-tenant data isolation activation - the schema and
repository layer carry tenant_id columns + the Phase 13 query-
coverage CI guard, but tenant ACLs are not enforced. Bundle 2
ships single-tenant only (
t-defaultseeded). The managed- service hosting work (operator decision item) is where multi- tenant flips on. - HSM / FIPS-validated signing key for sessions - the session
signing key is software-only (HMAC-SHA256, in-memory key
material, encrypted at rest via
internal/crypto). Operators in FIPS 140-3 environments need to supply their ownSignerimplementation; the abstraction atinternal/crypto/signer/accommodates this but no PKCS#11 driver ships yet. - OIDC RP-initiated logout (the "/end_session_endpoint" flow where certctl signs a logout token + redirects the browser to the IdP). Bundle 2 implements ONLY the back-channel flow (IdP → certctl). Operators wanting the full bidirectional logout pair wait on a follow-on bundle.
- GUI E2E via Playwright - tracked alongside #9 above.
- Per-IdP runbook external-tester sign-off - encouraged via
the operator-sign-off footers in
oidc-runbooks/*.mdbut NOT a merge gate (operator decision 2026-05-10; the earlier "≥ 2 external testers" requirement was retired).
Compliance mapping
The control set in this document supports the following framework requirements. This is a mapping; it is not a claim of formal certification.
- SOC 2 CC6.1 (logical access controls) - RBAC primitive with role-based gating on every mutating endpoint.
- SOC 2 CC6.3 (privileged access management) -
r-adminrole separation + role-grant audit trail with two-person integrity on approval-tier profile edits. - HIPAA §164.312(b) (audit controls) -
event_categorycolumn lets the auditor role review authentication / authorization changes specifically. WORM trigger keeps the audit table append-only at the database layer. - NIST SSDF PO.5.2 (separation of duties) - two-person
integrity for compliance-tier issuance via the
RequiresApprovalflow + Bundle 1 Phase 9's closure of the flip-flop bypass. - FedRAMP AU-9 (audit information protection) - WORM enforcement + auditor-only read access (the auditor role cannot mutate, the WORM trigger blocks UPDATE/DELETE).
- PCI-DSS §10 (audit logging) - every mutating operation emits an audit row with actor + action + resource + timestamp + category. The audit table is append-only.
Operator-facing checks
Run these periodically to verify the controls are working.
certctl-cli auth keys list- confirm no unexpected actor holdsr-admin. Audit any new admin grants against the audit log.SELECT actor, action, COUNT(*) FROM audit_events WHERE action LIKE 'approval_%' AND timestamp > NOW() - INTERVAL '7 days' GROUP BY actor, action;- confirm approvals are happening and not concentrated in a single approver.SELECT COUNT(*) FROM audit_events WHERE actor = 'system-bypass';- MUST return 0 in production. A non-zero count meansCERTCTL_APPROVAL_BYPASS=truewas set; production deploys MUST leave it unset.SELECT actor, COUNT(*) FROM audit_events WHERE action = 'bootstrap.consume';- MUST return at most one row per tenant. Multiple rows means the bootstrap endpoint was called more than once, which the strategy's one-shot guard should have prevented; investigate.certctl-cli auth mewhile authenticated as the auditor key -effective_permissionsmust containaudit.read+audit.exportONLY. Any other permission means a role grant widened the auditor's surface; revoke immediately.
The following checks are NEW with Bundle 2:
SELECT COUNT(*) FROM oidc_providers;- confirm only the expected providers are configured. An unexpected row is a compromise indicator. Cross-check with theauth.oidc_provider_createdaudit row to find when + by whom.SELECT actor_id, COUNT(*) FROM sessions WHERE NOT revoked AND absolute_expires_at > NOW() GROUP BY actor_id ORDER BY 2 DESC;- confirm no actor has an unexpectedly large session count. Multi-session-per-actor is normal (laptop + phone), but a single actor with 50+ active sessions is a compromised-key signal.
SELECT COUNT(*) FROM audit_events WHERE action LIKE 'auth.oidc_login_unmapped_groups' AND timestamp > NOW() - INTERVAL '7 days';- non-zero rows mean users are completing IdP authentication but failing the group-mapping step. Either the IdP renamed a group, or an unauthorized user attempted access. Investigate.SELECT COUNT(*) FROM audit_events WHERE action LIKE 'auth.breakglass_%' AND timestamp > NOW() - INTERVAL '7 days';- non-zero rows in steady state mean break-glass is being used
outside an SSO incident OR was left enabled. Confirm
CERTCTL_BREAKGLASS_ENABLEDisfalsein non-incident windows.
- non-zero rows in steady state mean break-glass is being used
outside an SSO incident OR was left enabled. Confirm
SELECT COUNT(*) FROM audit_events WHERE action = 'bootstrap.oidc_first_admin';- MUST return at most one row per tenant. Multiple rows means the OIDC bootstrap hook fired more than once per tenant, which the admin-existence probe should have prevented; investigate.SELECT COUNT(*) FROM session_signing_keys WHERE retired_at IS NOT NULL AND retired_at < NOW() - INTERVAL '7 days';- retired keys past the retention window should have been GC'd. Non-zero rows mean the scheduler'ssessionGCLoopis wedged.
Cross-references
Bundle 1 (RBAC) anchors:
rbac.md- the operator how-tosecurity.md- the wider security postureapproval-workflow.md- the two-person integrity gatedocs/migration/api-keys-to-rbac.md- upgrade flowinternal/auth/- middleware + keystore + RequirePermission + bootstrapinternal/service/auth/- Authorizer + privilege-escalation guard + reserved-actor guardmigrations/000029_rbac.up.sql- schema + seedmigrations/000030_rbac_admin_perms.up.sql- five admin-only fine-grained permsmigrations/000032_audit_category.up.sql- auditor surfacemigrations/000033_approval_kinds.up.sql- approval-bypass closure
Bundle 2 (OIDC + sessions + back-channel logout + break-glass) anchors:
oidc-runbooks/index.md- per-IdP setup guides (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace) with cross-IdP recurring concepts at the topinternal/auth/oidc/- OIDC service (HandleAuthRequest / HandleCallback / RefreshKeys), hand-rolled groupclaim resolver, alg allow-list, IdP downgrade-attack defenseinternal/auth/session/- session service (length-prefixed HMAC, cookie minting, idle/absolute expiry, signing-key rotation, GC), CSRF middleware, chained-auth combinatorinternal/auth/breakglass/- default-OFF break-glass admin (Argon2id + lockout + constant-time + surface-invisibility)internal/auth/oidc/testfixtures/- Phase 10 Keycloak testcontainers harness (//go:build integration)migrations/000034_oidc_providers.up.sql- OIDC providers + group-role mappings tablesmigrations/000035_sessions.up.sql- sessions + session-signing- keys tablesmigrations/000036_users.up.sql- users (federated-human identity) tablemigrations/000037_oidc_pre_login.up.sql- pre-login table + 7 new auth permissionsmigrations/000038_breakglass_credentials.up.sql- break-glass credentials table + 2 new permissionsscripts/ci-guards/N-bundle-2-security-empty-preserved.sh- OpenAPI security: [] count guardscripts/ci-guards/bundle-1-compat-regression.sh- Bundle-1-only-compat assertions (5 invariants)scripts/ci-guards/bundle-1-to-2-upgrade-regression.sh- upgrade-path assertions (6 invariants)