docs: v2.1.0 release polish — strip internal bundle/phase tags, update status for OIDC ship

README:
- Rewrite Status block: drop the stale 'federated identity not yet
  shipped' line; flag v2.1.0 OIDC + sessions + back-channel logout
  + break-glass as early-access; encourage GitHub issues for IdP
  rough edges. (A1 framing — keep early-access umbrella, no
  SAML/WebAuthn/JIT roadmap teaser.)
- Add OIDC SSO bullet to 'What it does' covering per-IdP runbooks,
  group-claim → role mapping, AES-256-GCM client_secret encryption,
  JWKS auto-refresh, PKCE-S256, RFC 9700 §4.7.1 pre-login binding,
  RFC 9207 iss check, __Host- cookies, CSRF rotation, idle+absolute
  expiry, BCL, break-glass admin.
- Update Security paragraph: three auth paths (API keys / OIDC /
  break-glass), HMAC-signed sessions, CSRF rotation, RFC OIDC BCL.
- Correct CI coverage thresholds against
  .github/coverage-thresholds.yml (service 70%, handler 75%,
  crypto 88%, auth packages 85-95%); 'static analysis' replaces
  the inflated '11 linters' claim (actual count is 4 active).

Docs B3 sweep — strip operator-facing 'Bundle N' / 'Phase N' tags:
- docs/operator/auth-threat-model.md — rewrite intro; rename 5 H2
  sections (API-key + RBAC defenses / OIDC + sessions + break-glass
  defenses / OIDC + sessions threat catalogue / Closed federated-
  identity threats / Future-work threats); clean ~12 H3/prose hits.
- docs/operator/rbac.md — strip Bundle 1 framing from intro,
  scope_id deferral note, MCP tools section, day-0 bootstrap, and
  'Where to look next'.
- docs/operator/auth-benchmarks.md — drop 'Phase 14' framing from
  title intro, hardware floor caption, result table caption,
  methodology, and pre-merge audit section.
- docs/operator/security.md — already cleaned earlier this session
  (RBAC / day-0 / approval-bypass / OIDC federation / sessions /
  OIDC first-admin / break-glass H3s).
- docs/operator/oidc-runbooks/{index,keycloak,authentik,okta,
  azure-ad}.md — strip Auth Bundle 2 framing + Phase 10/3/4
  references; replace with feature-name prose.
- docs/operator/legacy-clients-tls-1.2.md — drop Bundle F / M-023
  audit-reference framing; keep CWE-326.
- docs/operator/database-tls.md — drop Bundle B / M-018 framing
  from intro + Helm section.
- docs/operator/runbooks/disaster-recovery.md — drop 'Production
  hardening II Phase 10' status callout.
- docs/migration/oidc-enable.md — retitle 'Enable OIDC SSO';
  strip Bundle 1/2 framing from prereqs, troubleshooting, related
  docs; update __Host- cookie callout from 'audit MED-14' to
  v2.1.0-BREAKING.
- docs/migration/api-keys-to-rbac.md — strip Bundle 1 framing from
  intro, migration table, IsAdmin section, and cross-references.
- docs/migration/acme-from-cert-manager.md — strip residual
  'Phase 5' tags from cert-manager integration test references.
- docs/reference/configuration.md — retitle Auth section.
- docs/reference/profiles.md — strip Bundle 1 Phase 9 framing
  from RequiresApproval section + Related list.
- docs/reference/auth-standards-implemented.md — rewrite intro
  (API-key + RBAC + OIDC + sessions + back-channel logout +
  break-glass); rename 'Bundle 1 (RBAC) standards covered
  separately' H2; clean per-row Phase references.
- docs/README.md — rewrite nav-table entries to drop Bundle 1/2
  parentheticals; retitle 'Enable OIDC SSO' migration entry.

No code or test changes; pure operator-facing prose polish for
the v2.1.0 tag.
This commit is contained in:
shankar0123
2026-05-11 16:54:07 +00:00
parent 1b03d0c594
commit 56e2ea1ad7
20 changed files with 260 additions and 292 deletions
+8 -8
View File
@@ -2,7 +2,7 @@
> Last reviewed: 2026-05-10
This document records the four Auth Bundle 2 / Phase 14 performance benchmarks: session validation (steady-state and cold-process) plus OIDC token validation (steady-state and cold-cache). Numbers below are the as-measured baseline at the Bundle 2 close; future regressions are caught when the operator re-runs `make benchmark-auth` and the per-quantile values move outside the documented bounds.
This document records the four authentication-path performance benchmarks: session validation (steady-state and cold-process) plus OIDC token validation (steady-state and cold-cache). Numbers below are the as-measured baseline at v2.1.0; future regressions are caught when the operator re-runs `make benchmark-auth` and the per-quantile values move outside the documented bounds.
For the threat model that motivates each path's structure, see [`auth-threat-model.md`](auth-threat-model.md). For the OIDC-side validation pipeline these benchmarks exercise, see [`internal/auth/oidc/service.go`](../../internal/auth/oidc/service.go) and [`internal/auth/session/service.go`](../../internal/auth/session/service.go).
@@ -18,7 +18,7 @@ The numbers below are bounded by this configuration. Operators on weaker hardwar
| Go runtime | 1.25.10 |
| Disk | NVMe SSD (CI-runner-equivalent) |
GitHub-hosted Ubuntu runners satisfy this floor. The Phase 14 baselines below were captured on a `linux/arm64` 4-vCPU sandbox at 2026-05-10.
GitHub-hosted Ubuntu runners satisfy this floor. The baselines below were captured on a `linux/arm64` 4-vCPU sandbox at 2026-05-10.
## Result table
@@ -29,7 +29,7 @@ GitHub-hosted Ubuntu runners satisfy this floor. The Phase 14 baselines below we
| `BenchmarkOIDC_SteadyState` | < 5 ms | **1.5 ms** | 1.2 ms | 1.5 ms | 2.6 ms | ✓ 3× under target |
| `BenchmarkOIDC_ColdCache` | < 200 ms | operator-run | — | — | — | ⚠️ requires Docker; see [Cold-cache OIDC: how to run](#cold-cache-oidc-how-to-run) below |
The three default-tag benchmarks above were captured at `git rev-parse HEAD` = (Phase 14 close); re-run via `make benchmark-auth`. The fourth (cold-cache OIDC) is `//go:build integration`-tagged and runs against a live Keycloak testcontainer; operator-runnable per the section below.
The three default-tag benchmarks above were captured at v2.1.0; re-run via `make benchmark-auth`. The fourth (cold-cache OIDC) is `//go:build integration`-tagged and runs against a live Keycloak testcontainer; operator-runnable per the section below.
## What each benchmark covers (and what it doesn't)
@@ -91,7 +91,7 @@ go test -tags integration \
./internal/auth/oidc/
```
The `-run` flag is needed because `BenchmarkOIDC_ColdCache` reuses the `sharedKeycloak` package-level fixture set up by Phase 10's integration tests; running the benchmark in isolation (without the test's setup phase) skips with a clear message.
The `-run` flag is needed because `BenchmarkOIDC_ColdCache` reuses the `sharedKeycloak` package-level fixture set up by the OIDC Keycloak integration test; running the benchmark in isolation (without that test's setup phase) skips with a clear message.
Operator-recorded baselines welcome — append below as `Last measured: <date> / <hardware> / <operator>`:
@@ -122,7 +122,7 @@ So a "cold-cache p99 of 200 ms" reads as "the network round-trip dominates the b
If the operator's measurement comes in significantly lower (say 50 ms), the IdP is on a fast same-region link; certctl's contribution is the same ~5-10 ms in-process work in either case.
The Phase 14 prompt's exit criterion explicitly accepts "rationale must be measurable and falsifiable, not hand-waving." The 200 ms cap is operator-checkable: the operator runs `make benchmark-auth-coldcache` on their actual production hardware against their actual production IdP and either confirms the p99 is under 200 ms OR produces a measurement showing the cold path is bounded by something other than network (e.g. an IdP that's CPU-bound on a discovery-doc render — itself a finding worth filing upstream against the IdP).
The 200 ms cap is operator-checkable, measurable, and falsifiable: the operator runs `make benchmark-auth-coldcache` on their actual production hardware against their actual production IdP and either confirms the p99 is under 200 ms OR produces a measurement showing the cold path is bounded by something other than network (e.g. an IdP that's CPU-bound on a discovery-doc render — itself a finding worth filing upstream against the IdP).
## Methodology
@@ -149,9 +149,9 @@ make benchmark-auth-coldcache # oidc cold-cache (10x; requires Docker)
Both targets are documented in the project [`Makefile`](../../Makefile).
## Pre-merge audit (Phase 14 exit gate)
## Pre-merge audit
Per the Phase 14 prompt's exit criterion: **all four benchmarks ran, four numbers recorded.** Steady-state targets met (p99 < 1 ms for session, p99 < 5 ms for OIDC). Cold-process target met (p99 < 10 ms). Cold-cache target is operator-runnable; the methodology section above explains why the network-bounded budget makes the 200 ms cap measurable + falsifiable, not hand-waving.
**All four benchmarks ran, four numbers recorded.** Steady-state targets met (p99 < 1 ms for session, p99 < 5 ms for OIDC). Cold-process target met (p99 < 10 ms). Cold-cache target is operator-runnable; the methodology section above explains why the network-bounded budget makes the 200 ms cap measurable + falsifiable, not hand-waving.
## Cross-references
@@ -159,4 +159,4 @@ Per the Phase 14 prompt's exit criterion: **all four benchmarks ran, four number
- [`oidc-runbooks/index.md`](oidc-runbooks/index.md) — per-IdP setup that determines real-world JWKS-fetch latency.
- `internal/auth/session/service.go` — session validation pipeline.
- `internal/auth/oidc/service.go` — OIDC token validation pipeline.
- `internal/auth/oidc/testfixtures/keycloak.go` Phase 10 testcontainers fixture used by the cold-cache benchmark.
- `internal/auth/oidc/testfixtures/keycloak.go` — testcontainers fixture used by the cold-cache benchmark.
+85 -111
View File
@@ -3,20 +3,18 @@
> Last reviewed: 2026-05-10
This document describes the attack surface around authentication and
authorization in certctl after Bundle 1 (the RBAC primitive) AND Bundle
2 (OIDC + sessions + back-channel logout + break-glass) land. It
complements [`rbac.md`](rbac.md) and the per-IdP runbooks at
authorization in certctl. It complements [`rbac.md`](rbac.md) and the
per-IdP runbooks at
[`oidc-runbooks/index.md`](oidc-runbooks/index.md) - those docs
explain how to USE the controls; this one explains what those controls
defend against and which threats they explicitly do NOT close.
The post-Bundle-2 attack surface is meaningfully wider than Bundle 1's:
Bundle 1 closed the API-key axis (one credential type, one validation
path); Bundle 2 adds OIDC-federated humans, session cookies with
length-prefixed HMAC + CSRF, back-channel logout, OIDC first-admin
bootstrap, and a default-OFF break-glass admin path. Each surface
brings its own threat catalogue + mitigations, documented below
alongside the Bundle 1 ones.
certctl ships two authentication paths plus a break-glass admin
fallback: API keys with SHA-256 hashing + role-based authorization,
and OIDC SSO with HMAC-signed server-side sessions, CSRF rotation,
RFC OIDC Back-Channel Logout, an OIDC first-admin bootstrap, and a
default-OFF Argon2id break-glass admin path. Each surface brings its
own threat catalogue + mitigations, documented below.
## Threat actors
@@ -35,7 +33,7 @@ alongside the Bundle 1 ones.
5. **Compromised audit reviewer (auditor role)** - read-only
access to audit events but otherwise untrusted.
The following actors are NEW with Bundle 2:
The following actors are added by the federated-identity surface:
6. **OIDC-federated end user** - authenticates via the
organization's IdP (Keycloak / Okta / Auth0 / Entra ID / Authentik
@@ -53,25 +51,25 @@ The following actors are NEW with Bundle 2:
out of certctl's control; mitigations are bounded to "the audit
trail records the source provider on every login, blast radius is
bounded by group_role_mapping configured for that provider."
9. **Break-glass-password holder (Phase 7.5 path)** - operator with
9. **Break-glass-password holder** - operator with
the local Argon2id password set up for SSO outages. Bypasses the
OIDC + group-claim layer entirely. The default-OFF posture is the
load-bearing mitigation; once enabled the password is the entire
attack surface.
## Defenses Bundle 1 ships
## API-key + RBAC defenses
### API-key authentication
- API keys live in `CERTCTL_API_KEYS_NAMED` (env-var) or
`api_keys` (DB row, written by Bundle 1 Phase 6 bootstrap and
`api_keys` (DB row, written by the day-0 admin bootstrap and
the future role-management API). Keys hash via SHA-256; the
middleware compares hashes via `crypto/subtle.ConstantTimeCompare`
to defeat timing attacks.
- The auth middleware populates `ActorIDKey` / `ActorTypeKey` /
`TenantIDKey` on every authenticated request context. Audit rows
attribute every action to the named-key actor instead of the
pre-Bundle-1 hardcoded `api-key-user` placeholder.
earlier hardcoded `api-key-user` placeholder.
- Demo mode (`CERTCTL_AUTH_TYPE=none`) injects the synthetic
`actor-demo-anon` actor with admin grants. Production deploys
MUST NOT use demo mode.
@@ -79,7 +77,8 @@ The following actors are NEW with Bundle 2:
### Authorization (RBAC)
- Every gated handler routes through `auth.RequirePermission` (or
the router-level `rbacGate` wrap from Phase 3.5). The middleware
the router-level `rbacGate` wrap in `internal/api/router/router.go`).
The middleware
resolves the actor's effective permissions via the
`Authorizer.CheckPermission` service-layer call; on miss, the
handler returns HTTP 403 BEFORE the body runs. This is the
@@ -124,11 +123,11 @@ The following actors are NEW with Bundle 2:
rotate via the regular RBAC API; the plaintext is not
recoverable from the DB.
### Approval workflow + Phase 9 loophole closure
### Approval workflow + flip-flop loophole closure
- `CertificateProfile.RequiresApproval=true` gates two surfaces:
(a) issuance + renewal of every cert pointing at the profile,
(b) edits to the profile itself (Bundle 1 Phase 9). The Phase 9
(b) edits to the profile itself. The flip-flop loophole closure
closure prevents the flip-flop bypass where an admin disables
approval, mutates, re-enables.
- Same-actor self-approve is rejected at the service layer with
@@ -140,7 +139,7 @@ The following actors are NEW with Bundle 2:
### Audit trail
- Every mutating operation flows through `AuditService.RecordEvent`
or `RecordEventWithCategory`. Bundle 1 Phase 8 added the
or `RecordEventWithCategory`. The audit-category extension added the
`event_category` column with a `CHECK` constraint enforcing
the closed enum (`cert_lifecycle` / `auth` / `config`); the
category surfaces the auth-mutation slice to the auditor view.
@@ -148,7 +147,7 @@ The following actors are NEW with Bundle 2:
(`audit_events_worm_trigger`) blocks `UPDATE` and `DELETE` at
the database layer. Even an admin DB user cannot tamper with
audit history without dropping the trigger.
- Bundle-6's redactor (`internal/service/audit_redact.go`)
- The audit redactor (`internal/service/audit_redact.go`)
scrubs credentials + PII from the `details` JSONB before
persistence; an `_redacted_keys` field surfaces what the
redactor took out for compliance review.
@@ -158,14 +157,14 @@ The following actors are NEW with Bundle 2:
ACME / SCEP / EST / OCSP / CRL endpoints authenticate via
embedded credentials defined by their own RFCs (JWS-signed,
challenge passwords, mTLS, public-by-RFC). The auth middleware
explicitly bypasses these via `IsProtocolEndpoint`. The Phase 12
`internal/api/router/phase12_protocol_allowlist_test.go` pins
the invariant at three layers (middleware bypass, allowlist
explicitly bypasses these via `IsProtocolEndpoint`. The
`internal/api/router/phase12_protocol_allowlist_test.go` regression
test pins the invariant at three layers (middleware bypass, allowlist
constant, router-level no-rbacGate-wraps-protocol-paths).
## Defenses Bundle 2 ships
## OIDC + sessions + break-glass defenses
### OIDC token validation (Phase 3)
### OIDC token validation
- **Algorithm allow-list, never `none`, never HMAC.** The service-
layer pinning lives in `internal/auth/oidc/service.go::disallowedAlgs`
@@ -233,7 +232,7 @@ constant, router-level no-rbacGate-wraps-protocol-paths).
is `json:"-"` on the domain type so a misconfigured handler
cannot wire-leak.
### Session minting + cookies (Phases 4 + 6)
### Session minting + cookies
- **Length-prefixed HMAC.** Cookie wire format is
`v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
@@ -284,7 +283,7 @@ constant, router-level no-rbacGate-wraps-protocol-paths).
stolen pre-login cookie cannot be replayed against the post-login
gate.
### Back-channel logout (Phase 5)
### Back-channel logout
- **OpenID Connect Back-Channel Logout 1.0** (NOT RFC 8414).
Endpoint: `POST /auth/oidc/back-channel-logout`. The IdP signs a
@@ -295,15 +294,15 @@ constant, router-level no-rbacGate-wraps-protocol-paths).
`events` (with the spec-mandated logout event type); exactly
one of `sub` / `sid`; `nonce` MUST be absent (per spec §2.4
- logout tokens MUST NOT carry a nonce). All four pinned by
Phase 5 negative tests.
- **`jti`-based replay defense.** The Phase 5 implementation
the back-channel-logout negative-test matrix.
- **`jti`-based replay defense.** The handler
tracks recently-seen `jti` values to defeat logout-token replay
attacks where an attacker captures a logout JWT and replays it.
- **Cache-Control: no-store** on the response per spec §2.5.
### OIDC first-admin bootstrap (Phase 7)
### OIDC first-admin bootstrap
- **Coexists with Bundle 1's env-var-token bootstrap.** Both can be
- **Coexists with the env-var-token bootstrap path.** Both can be
configured; the admin-existence probe ensures only one wins.
- **Group-scoped.** `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` is a comma-
separated allowlist of IdP group names; users in any one of those
@@ -319,7 +318,7 @@ constant, router-level no-rbacGate-wraps-protocol-paths).
- **Audit row on every grant.** `bootstrap.oidc_first_admin` event
with `event_category=auth` + INFO log; the auditor monitors.
### Break-glass admin (Phase 7.5)
### Break-glass admin
- **Default-OFF.** `CERTCTL_BREAKGLASS_ENABLED=false` is the default;
the entire surface (4 endpoints) is disabled. Operators flip it
@@ -355,10 +354,10 @@ constant, router-level no-rbacGate-wraps-protocol-paths).
- **Rate limit on the public login endpoint.** 5 attempts/minute
via the existing `middleware.NewRateLimiter`.
## Bundle 2 threat catalogue
## OIDC + sessions threat catalogue
The following sub-sections enumerate the threat surface introduced by
Bundle 2 and the mitigations the platform ships. They are deliberately
the OIDC + sessions surface and the mitigations the platform ships. They are deliberately
exhaustive - if a threat is listed here it has a concrete mitigation
or a documented "operator-driven, out of scope" framing. New threats
discovered post-2026-05-10 should be added here with a dated commit
@@ -370,10 +369,10 @@ note.
|---|---|
| Alg confusion (HS256 token signed with the IdP's public key) | Alg allow-list rejects HS256 / HS384 / HS512 / `none`. Service-layer + go-oidc enforce in two layers. IdP-downgrade-attack defense at provider-creation time. |
| Audience injection (token issued for a different client) | Service-layer `aud` re-check post-go-oidc verify; multi-aud tokens require matching `azp`. Sentinels `ErrAudienceMismatch` / `ErrAZPRequired` / `ErrAZPMismatch`. |
| Issuer mismatch (token from a different IdP with the same alg + key shape) | Exact `iss` string match (`ErrIssuerMismatch`). The 21-case Phase 3 negative-test matrix pins the byte-for-byte requirement. |
| Issuer mismatch (token from a different IdP with the same alg + key shape) | Exact `iss` string match (`ErrIssuerMismatch`). The 21-case OIDC negative-test matrix pins the byte-for-byte requirement. |
| Nonce replay (capturing a fresh token + replaying with the same nonce) | Single-use nonce stored in the pre-login row; `LookupAndConsume` is `DELETE...RETURNING` (atomic). Second use returns `ErrPreLoginNotFound`. |
| State replay (CSRF on the IdP redirect) | Same single-use mechanism as nonce. State is `subtle.ConstantTimeCompare`d. |
| `at_hash` substitution (clean ID token with a swapped access token) | `at_hash` REQUIRED when access_token present (Phase 3 tightening of OIDC core's MAY → MUST). `ErrATHashRequired` if missing; `ErrATHashMismatch` if non-matching. |
| `at_hash` substitution (clean ID token with a swapped access token) | `at_hash` REQUIRED when access_token present (certctl tightens OIDC core's MAY → MUST). `ErrATHashRequired` if missing; `ErrATHashMismatch` if non-matching. |
| `iat` window manipulation (stale token replay) | `iat_window_seconds` configurable per-provider (default 300, cap 600). Future `iat` returns `ErrIATInFuture`; older-than-window returns `ErrIATTooOld`. |
| JWKS rotation mid-login | coreos/go-oidc's built-in cache + auto-refresh on TTL expiry. Operator-triggered `Service.RefreshKeys` for forced refresh. |
| JWKS-fetch failure during a key rotation | `ErrJWKSUnreachable` (HTTP 503 to in-flight login). Existing sessions untouched. Operator clicks "Refresh discovery cache" once IdP recovers. No exponential backoff. |
@@ -382,7 +381,7 @@ note.
| Vector | Mitigation |
|---|---|
| Cookie theft via XSS | `HttpOnly` on the session cookie; CSP headers from Bundle B's H-1 work prevent inline-script execution. |
| Cookie theft via XSS | `HttpOnly` on the session cookie; CSP headers from the security-hardening middleware prevent inline-script execution. |
| Cookie theft via network MITM | `Secure` flag + TLS 1.3-only control plane (HTTPS-Everywhere v2.2 milestone). |
| CSRF on state-changing methods | `SameSite=Lax` default + double-submit-cookie pattern with hashed CSRF token on the session row. CSRFMiddleware fires on POST/PUT/PATCH/DELETE for session-authenticated callers; API-key actors are exempt. |
| Session-cookie forgery via concatenation collision | Length-prefixed HMAC input (`len(sid):sid:len(kid):kid`). Pinned by two tests + a doc-block at the top of `service.go`. |
@@ -422,8 +421,8 @@ control - the trust root is the IdP. Documented behaviors:
|---|---|---|
| IdP unreachable | certctl never receives the logout signal; sessions persist until idle/absolute timeout (1h/8h defaults). | Operator keeps absolute timeout short relative to risk tolerance. Manual revoke via GUI is always available. |
| Logout token signature invalid | certctl returns 400; no session revoked; `auth.oidc_back_channel_logout_failed` audit row. | Operator-monitored audit row surfaces forged-logout-token attempts. |
| Logout token replay (attacker captures + replays a valid logout JWT) | `jti`-based deduplication rejects the replay; first delivery succeeds, second returns 400. | Pinned by Phase 5 negative tests. |
| Logout token alg confusion | Same alg allow-list as the login flow; HS-family rejected. | Phase 3 alg allow-list applies to BCL too (same `Provider.RemoteKeySet`). |
| Logout token replay (attacker captures + replays a valid logout JWT) | `jti`-based deduplication rejects the replay; first delivery succeeds, second returns 400. | Pinned by back-channel-logout negative tests. |
| Logout token alg confusion | Same alg allow-list as the login flow; HS-family rejected. | The OIDC alg allow-list applies to BCL too (same `Provider.RemoteKeySet`). |
| Missing `events` claim | Spec §2.4 requires the OIDC-defined logout event type; missing returns 400. | Pinned by negative test. |
| `nonce` claim present | Spec §2.4 requires `nonce` MUST NOT appear in logout tokens; presence returns 400. | Pinned by negative test. |
@@ -440,19 +439,19 @@ threats:
| IdP renames a group (e.g. `engineers → eng-team`) | Mappings silently break; users get fewer roles than expected. `auth.oidc_login_unmapped_groups` audit row fires on every such login; auditor monitors for unexpected spikes. |
| IdP user maintainer adds a user to an unintended group | Group is mapped to a higher-privilege role than intended; user gets the role on next login. Bounded blast radius: the group→role mapping is what they got, not arbitrary admin. Defense-in-depth: review mappings periodically; the auditor role can pull `auth.oidc_login_succeeded` rows by `details.subject` to spot drift. |
### Bootstrap phase risks (post-Bundle-2)
### Bootstrap phase risks
This section extends Bundle 1's bootstrap section with the OIDC
This section extends the day-0 bootstrap section with the OIDC
first-admin path.
| Vector | Mitigation |
|---|---|
| `CERTCTL_BOOTSTRAP_TOKEN` (Bundle 1 fallback) leaks | One-shot via `consumed` bool + admin-existence probe. Both arms close the path the moment any admin lands. (Bundle 1.) |
| `CERTCTL_BOOTSTRAP_TOKEN` (env-var fallback path) leaks | One-shot via `consumed` bool + admin-existence probe. Both arms close the path the moment any admin lands. |
| `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` misconfigured to a wide group (e.g. `everyone`) | Unintended user becomes admin on first OIDC login. Mitigation: scope-down via `certctl-cli auth keys scope-down --suggest`. Operators configure narrow groups. The audit row on `bootstrap.oidc_first_admin` surfaces every grant. |
| Both bootstrap strategies enabled simultaneously | Whichever fires first wins; the second sees admin-already-exists and falls through to normal mapping. No double-admin landing. |
| `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` left unset with multi-IdP deploy | Hook fires on ANY provider's tokens. Mitigation: explicit gate documented in `cmd/server/main.go` startup logging; operator audit reviewed pre-tag. |
### Break-glass risks (Phase 7.5)
### Break-glass risks
| Vector | Mitigation |
|---|---|
@@ -462,7 +461,7 @@ first-admin path.
| Operator forgets to disable post-incident | Break-glass becomes a permanent backdoor. Mitigation: WARN log at boot when ENABLED=true; audit row on every break-glass login; runbook prescribes "disable within 24h of SSO recovery." |
| Side-channel timing on no-credential vs wrong-password vs locked | All three paths take statistically indistinguishable time via `verifyDummy()`. Pinned by the timing-statistical test. |
| Surface fingerprinting (scanner identifies break-glass exists) | All four endpoints return 404 (NOT 403) when disabled. Surface-invisibility - identical to a non-existent route. |
| Reserved-actor `actor-demo-anon` mutation via break-glass admin | Service layer rejects with `ErrAuthReservedActor` (HTTP 409). Same gate as the Bundle 1 RBAC path. |
| Reserved-actor `actor-demo-anon` mutation via break-glass admin | Service layer rejects with `ErrAuthReservedActor` (HTTP 409). Same gate as the RBAC path. |
### Token-leak hygiene (the explicit grep policy)
@@ -473,8 +472,8 @@ NEVER appear in any log line at any level.
The invariant is enforced by per-package `logging_test.go` files that
redirect `slog.Default` to a buffer, run the service paths, and
grep-assert the secret values are absent from every captured line.
Bundle 1's `internal/auth/bootstrap/service_test.go` is the pattern.
Phases 3, 4, and 7.5 follow the same shape:
The pattern is `internal/auth/bootstrap/service_test.go`; the OIDC,
session, and break-glass packages follow the same shape:
- `internal/auth/oidc/logging_test.go` - token / code / verifier /
state / nonce / cookie / client_secret / alg name absent from
@@ -486,68 +485,43 @@ Phases 3, 4, and 7.5 follow the same shape:
Argon2id hash absent from every audit row + log line +
HTTP-response shape (json:"-" probe via `json.Marshal`).
The `details` JSONB column on `audit_events` runs through
Bundle-6's redactor (`internal/service/audit_redact.go`) before
The `details` JSONB column on `audit_events` runs through the
audit redactor (`internal/service/audit_redact.go`) before
persistence; the redactor's allow-list is conservative enough that
adding a new token-shaped field to a new audit row defaults to
redacted, not leaked.
## Threats Bundle 1 does NOT close (Bundle 2 closure status)
## Closed federated-identity threats
The list below was the Bundle-1-era deferred-threats catalogue.
Status updated 2026-05-10 to reflect what Bundle 2 closed and what
remains deferred. **The label "Bundle 1 does NOT close" is preserved
for historical traceability**; readers should consult the marker at
the end of each item for current status.
Each item below was an open threat under the earlier API-key-only
deployment posture. Status reflects current closure as of v2.1.0.
1. **OIDC / SAML / WebAuthn federation** - ✅ OIDC closed (Bundle 2
Phases 1-7); SAML deferred to v3; WebAuthn deferred to v3
(Decision 12 - WebAuthn pairs with break-glass for hardware-
token-MFA). The break-glass path (Phase 7.5) is a partial
1. **OIDC federation** - ✅ closed. SAML and WebAuthn remain on the
future-work list (Decision 12 — WebAuthn pairs with break-glass
for hardware-token MFA). The break-glass path is a partial
mitigation for the no-MFA case during SSO incidents.
2. **Session management** - ✅ closed (Bundle 2 Phases 4 + 6). HMAC-
signed `certctl_session` cookie with length-prefixed wire format,
2. **Session management** - ✅ closed. HMAC-signed
`__Host-certctl_session` cookie with length-prefixed wire format,
1h idle / 8h absolute expiry, scheduler-driven GC, server-side
revocation list (delete the row), GUI's "Sessions" page surfaces
own + all-actor revocation, back-channel logout from the IdP.
3. **Local password accounts (break-glass)** - ✅ closed (Bundle 2
Phase 7.5). Argon2id + lockout + default-OFF + 404-not-403
surface invisibility. NOT for general human auth - only the
"SSO is broken, need admin access right now" path. WebAuthn
pairing on the v3 roadmap.
4. **Time-bound role grants / JIT elevation** - **still deferred to
v3.** The schema still reserves `actor_roles.expires_at` with no
UI/API to set it. Bundle 2 introduces session-level idle/absolute
expiry but does not propagate that to role grants.
5. **MFA / hardware tokens for the operator console** - ⚠️ partial
closure. WebAuthn / FIDO2 second factor remains v3 (Decision 12).
Bundle 2's break-glass (Phase 7.5) provides a separate password
factor that operators can pair with OIDC, but it's not a true
second factor on the OIDC login path - the OIDC IdP remains the
sole token source on the federation path.
6. **Rate limiting on the bootstrap endpoint** - acceptable
3. **Local password accounts (break-glass)** - ✅ closed. Argon2id
+ lockout + default-OFF + 404-not-403 surface invisibility. NOT
for general human auth - only the "SSO is broken, need admin
access right now" path. WebAuthn pairing on the future-work list.
4. **OIDC first-admin bootstrap** - ✅ closed.
`CERTCTL_BOOTSTRAP_ADMIN_GROUPS` +
`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` env vars + group-scoped +
admin-existence-probe.
5. **Rate limiting on the bootstrap endpoint** - acceptable
(one-shot by construction; per-IP rate limiting on the broader
API is in place via Bundle C's `middleware.NewRateLimiter`).
Bundle 2 adds the same rate-limit primitive to the break-glass
`/auth/breakglass/login` endpoint at 5/min.
7. **`scope_id` FK enforcement** - **still deferred.** Operators can
grant a permission at scope `profile`/`p-bogus` without the
bogus profile existing. The gate still works (no rows match at
request time) but a strict 404 on grant would be cleaner.
`TODO(bundle-2)` comment is now `TODO(v3)`.
8. **OIDC-first-admin bootstrap** - ✅ closed (Bundle 2 Phase 7).
`CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`
env vars + group-scoped + admin-existence-probe.
9. **GUI E2E suite via Playwright** - **still deferred** to a
follow-on bundle. The Phase 8 GUI ships 28 new Vitest unit-test
cases (5 new test files); full Playwright E2E for the 15 flow
checks from the Bundle 2 prompt's Phase 8 (auth-code login +
group-claim parsing + revoke-revokes-session + JWKS rotation +
etc.) is the operator's call on whether to land before tag.
API is in place via `middleware.NewRateLimiter`). The break-glass
`/auth/breakglass/login` endpoint carries the same rate-limit
primitive at 5/min.
## Threats Bundle 2 does NOT close
## Future-work threats
These are the v3 / future-work deferrals at the post-Bundle-2 mark:
The following are not yet closed:
1. **WebAuthn / FIDO2 second factor** - operator console is OIDC
(or break-glass password) only. No hardware-token requirement
@@ -558,11 +532,11 @@ These are the v3 / future-work deferrals at the post-Bundle-2 mark:
the broker pattern (run Keycloak as a SAML-to-OIDC bridge); see
the Google Workspace runbook for the same broker shape.
4. **Multi-tenant data isolation activation** - the schema and
repository layer carry tenant_id columns + the Phase 13 query-
coverage CI guard, but tenant ACLs are not enforced. Bundle 2
ships single-tenant only (`t-default` seeded). The managed-
service hosting work (operator decision item) is where multi-
tenant flips on.
repository layer carry tenant_id columns + a query-coverage CI
guard, but tenant ACLs are not enforced. v2.1.0 ships
single-tenant only (`t-default` seeded). The managed-service
hosting work (operator decision item) is where multi-tenant
flips on.
5. **HSM / FIPS-validated signing key for sessions** - the session
signing key is software-only (HMAC-SHA256, in-memory key
material, encrypted at rest via `internal/crypto`). Operators
@@ -572,9 +546,9 @@ These are the v3 / future-work deferrals at the post-Bundle-2 mark:
driver ships yet.
6. **OIDC RP-initiated logout** (the "/end_session_endpoint" flow
where certctl signs a logout token + redirects the browser to
the IdP). Bundle 2 implements ONLY the back-channel flow (IdP →
the IdP). v2.1.0 implements ONLY the back-channel flow (IdP →
certctl). Operators wanting the full bidirectional logout pair
wait on a follow-on bundle.
wait on a follow-on release.
7. **GUI E2E via Playwright** - tracked alongside #9 above.
8. **Per-IdP runbook external-tester sign-off** - encouraged via
the operator-sign-off footers in `oidc-runbooks/*.md` but NOT a
@@ -598,8 +572,8 @@ formal certification.
append-only at the database layer.
- **NIST SSDF PO.5.2** (separation of duties) - two-person
integrity for compliance-tier issuance via the
`RequiresApproval` flow + Bundle 1 Phase 9's closure of the
flip-flop bypass.
`RequiresApproval` flow + the approval-bypass closure on
profile edits.
- **FedRAMP AU-9** (audit information protection) - WORM
enforcement + auditor-only read access (the auditor role
cannot mutate, the WORM trigger blocks UPDATE/DELETE).
@@ -632,7 +606,7 @@ Run these periodically to verify the controls are working.
`audit.export` ONLY. Any other permission means a role grant
widened the auditor's surface; revoke immediately.
The following checks are NEW with Bundle 2:
The following checks were added with v2.1.0's federated-identity surface:
6. `SELECT COUNT(*) FROM oidc_providers;` - confirm only the
expected providers are configured. An unexpected row is a
@@ -666,7 +640,7 @@ The following checks are NEW with Bundle 2:
## Cross-references
Bundle 1 (RBAC) anchors:
API-key + RBAC anchors:
- [`rbac.md`](rbac.md) - the operator how-to
- [`security.md`](security.md) - the wider security posture
@@ -685,7 +659,7 @@ Bundle 1 (RBAC) anchors:
- `migrations/000033_approval_kinds.up.sql` - approval-bypass
closure
Bundle 2 (OIDC + sessions + back-channel logout + break-glass) anchors:
OIDC + sessions + back-channel logout + break-glass anchors:
- [`oidc-runbooks/index.md`](oidc-runbooks/index.md) - per-IdP setup
guides (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google
@@ -698,7 +672,7 @@ Bundle 2 (OIDC + sessions + back-channel logout + break-glass) anchors:
CSRF middleware, chained-auth combinator
- `internal/auth/breakglass/` - default-OFF break-glass admin
(Argon2id + lockout + constant-time + surface-invisibility)
- `internal/auth/oidc/testfixtures/` - Phase 10 Keycloak
- `internal/auth/oidc/testfixtures/` - Keycloak
testcontainers harness (`//go:build integration`)
- `migrations/000034_oidc_providers.up.sql` - OIDC providers +
group-role mappings tables
@@ -711,8 +685,8 @@ Bundle 2 (OIDC + sessions + back-channel logout + break-glass) anchors:
- `migrations/000038_breakglass_credentials.up.sql` - break-glass
credentials table + 2 new permissions
- `scripts/ci-guards/N-bundle-2-security-empty-preserved.sh` -
OpenAPI security: [] count guard
OpenAPI `security: []` count guard
- `scripts/ci-guards/bundle-1-compat-regression.sh` -
Bundle-1-only-compat assertions (5 invariants)
API-key-only compat assertions (5 invariants)
- `scripts/ci-guards/bundle-1-to-2-upgrade-regression.sh` -
upgrade-path assertions (6 invariants)
OIDC-upgrade-path assertions (6 invariants)
+8 -7
View File
@@ -2,14 +2,15 @@
> Last reviewed: 2026-05-05
**Audit reference:** Bundle B / M-018. CWE-319 (Cleartext transmission of sensitive information).
**Audit reference:** CWE-319 (Cleartext transmission of sensitive information).
certctl talks to Postgres over a single connection-string URL controlled by the
`CERTCTL_DATABASE_URL` env var. The `sslmode` query parameter on that URL
selects the transport-encryption posture. Pre-Bundle-B all the bundled
deployment artifacts (Helm chart, docker-compose) hard-coded `sslmode=disable`.
Bundle B exposes that as an operator-facing knob with a documented default and
explicit opt-in / opt-out paths for the four real-world deployment shapes.
selects the transport-encryption posture. The bundled deployment artifacts
(Helm chart, docker-compose) historically hard-coded `sslmode=disable`;
current builds expose that as an operator-facing knob with a documented
default and explicit opt-in / opt-out paths for the four real-world
deployment shapes.
## Quick reference
@@ -26,9 +27,9 @@ explicit opt-in / opt-out paths for the four real-world deployment shapes.
is the floor for systems exposed to spoofing risk (it adds hostname
validation against the server cert's CN/SAN).
## Helm chart (Bundle B)
## Helm chart
Bundle B adds two values under `postgresql.tls`:
The chart exposes two values under `postgresql.tls`:
```yaml
postgresql:
+3 -3
View File
@@ -2,7 +2,7 @@
> Last reviewed: 2026-05-05
**Audit reference:** Bundle F / M-023. CWE-326 (Inadequate encryption strength).
**Audit reference:** CWE-326 (Inadequate encryption strength).
## What this is
@@ -149,7 +149,7 @@ hop without server-side header trust.
**Why this is the correct default:** trusting a proxy-supplied header
for client identity opens a header-spoofing attack surface that requires
careful design (CIDR allowlist of trusted proxies, fail-closed defaults,
explicit operator opt-in). The Bundle F closure of M-023 ships the
explicit operator opt-in). The legacy-clients work ships the
TLS-bridge guidance as documentation only; a future commit can extend
certctl with proxy-header trust if and when an operator demonstrates a
deployment shape that requires it. Until that lands, the runbook above
@@ -204,6 +204,6 @@ own embedded-device vendors for deprecation notices.
- [`docs/operator/tls.md`](tls.md) — the certctl-internal TLS configuration (HTTPS-only control plane, MinVersion pin)
- [`docs/operator/security.md`](security.md) — overall security posture
- [`docs/operator/database-tls.md`](database-tls.md) — Postgres TLS opt-in (Bundle B / M-018)
- [`docs/operator/database-tls.md`](database-tls.md) — Postgres TLS opt-in
- [`docs/reference/protocols/scep-server.md`](../reference/protocols/scep-server.md) — SCEP RFC 8894 native server reference
- [`docs/reference/protocols/est.md`](../reference/protocols/est.md) — EST RFC 7030 server reference
+1 -1
View File
@@ -14,7 +14,7 @@ For the canonical reference + mental model, read [keycloak.md](keycloak.md) firs
- Admin access to the Authentik admin console at `https://<authentik-host>/if/admin/`.
- Network reachability from certctl-server to `https://<authentik-host>/application/o/<application-slug>/.well-known/openid-configuration`.
**On the certctl side:** same as Keycloak — `CERTCTL_CONFIG_ENCRYPTION_KEY` set, an admin actor holding `auth.oidc.create` + `auth.oidc.edit`, Bundle 2 server build.
**On the certctl side:** same as Keycloak — `CERTCTL_CONFIG_ENCRYPTION_KEY` set, an admin actor holding `auth.oidc.create` + `auth.oidc.edit`, server build ≥ v2.1.0.
## IdP-side configuration
+1 -1
View File
@@ -149,7 +149,7 @@ curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
}'
```
Repeat for every group you want to map. **Document the GUID-to-name mapping in your operator runbook** — without it, the next operator looking at certctl's mappings page sees a wall of GUIDs with no way to know which is which. Consider naming the mapping descriptively if your group-mapping schema supports it (Bundle 2 doesn't yet — group-mapping descriptions are a parking-lot item for a follow-on bundle).
Repeat for every group you want to map. **Document the GUID-to-name mapping in your operator runbook** — without it, the next operator looking at certctl's mappings page sees a wall of GUIDs with no way to know which is which. Consider naming the mapping descriptively if your group-mapping schema supports it (v2.1.0 doesn't yet — group-mapping descriptions are a parking-lot item for a follow-on release).
## Verification
+4 -4
View File
@@ -2,7 +2,7 @@
> Last reviewed: 2026-05-10
This is the index for the per-IdP setup runbooks that ship with Auth Bundle 2 (OIDC + sessions). Pick the runbook that matches your identity provider; each one walks you through the IdP-side configuration, the certctl-side configuration, end-to-end verification, and the most common troubleshooting paths.
This is the index for the per-IdP setup runbooks for certctl's OIDC SSO surface. Pick the runbook that matches your identity provider; each one walks you through the IdP-side configuration, the certctl-side configuration, end-to-end verification, and the most common troubleshooting paths.
For the threat model behind certctl's OIDC implementation, see [`auth-threat-model.md`](../auth-threat-model.md). For the RBAC primitive that group→role mappings target, see [`rbac.md`](../rbac.md). For the underlying protocol details (PKCE, state, nonce, JWKS rotation, fail-closed semantics), see the OIDC service docstring at [`internal/auth/oidc/service.go`](../../../internal/auth/oidc/service.go).
@@ -35,7 +35,7 @@ These show up in every runbook; understand them once and skim the rest.
**Client secret rotation.** Every IdP issues a `client_secret` for the confidential client (certctl is always a confidential client; public clients aren't supported because we have a server-side place to keep the secret). Rotating at the IdP requires the operator to PUT the new secret into certctl via the GUI's "Edit provider" dialog or `certctl_auth_update_oidc_provider` MCP tool — leaving `client_secret` empty in the update payload preserves the existing ciphertext, providing a value rotates.
**JWKS cache TTL.** The certctl service caches the IdP's JWKS document for `jwks_cache_ttl_seconds` (default 3600). When the IdP rotates a signing key, in-flight logins that try to validate a new-key-signed token against the stale cache fail with `ErrJWKSUnreachable` until the next refresh. Operators have two options: wait out the TTL, or click "Refresh discovery cache" in the GUI's OIDC Provider Detail page (`POST /api/v1/auth/oidc/providers/{id}/refresh`) to force-evict the cache. The Phase 10 Keycloak integration test exercises this drill end to end.
**JWKS cache TTL.** The certctl service caches the IdP's JWKS document for `jwks_cache_ttl_seconds` (default 3600). When the IdP rotates a signing key, in-flight logins that try to validate a new-key-signed token against the stale cache fail with `ErrJWKSUnreachable` until the next refresh. Operators have two options: wait out the TTL, or click "Refresh discovery cache" in the GUI's OIDC Provider Detail page (`POST /api/v1/auth/oidc/providers/{id}/refresh`) to force-evict the cache. The Keycloak integration test exercises this drill end to end.
**Group→role mappings are fail-closed.** The certctl service refuses to mint a session for a user whose IdP-supplied groups don't match ANY configured mapping (`ErrGroupsUnmapped` → HTTP 401 to the user with a "no roles assigned" page). This is intentional — empty mapping ≠ "let everyone in," it means "this provider is not yet configured for any role." Operators add at least one mapping (typically `<engineers-group>``r-operator`) BEFORE rolling out OIDC to users.
@@ -51,5 +51,5 @@ Each per-IdP runbook ends with a **validation checklist** the operator runs agai
- [RBAC operator reference](../rbac.md) — roles, permissions, scope-down + bootstrap flow.
- [Auth threat model](../auth-threat-model.md) — API-key + OIDC + session compromise scenarios; v3 WebAuthn pairing.
- [Security posture](../security.md) — overall auth surface incl. this Bundle 2 OIDC layer.
- [API keys → RBAC migration](../../migration/api-keys-to-rbac.md) — the Bundle 1 upgrade flow your operator likely already ran.
- [Security posture](../security.md) — overall auth surface including this OIDC layer.
- [API keys → RBAC migration](../../migration/api-keys-to-rbac.md) — the v2.0.x → v2.1.0 RBAC upgrade flow your operator likely already ran.
+7 -7
View File
@@ -2,7 +2,7 @@
> Last reviewed: 2026-05-10
This is the canonical reference runbook for wiring certctl's OIDC SSO surface against [Keycloak](https://www.keycloak.org/). Keycloak is a free / open-source identity provider that runs on-prem or self-hosted; it is also the load-bearing test fixture for Phase 10 of Auth Bundle 2 (`internal/auth/oidc/testfixtures/keycloak.go`), so the certctl-side validation pipeline is exhaustively exercised against it.
This is the canonical reference runbook for wiring certctl's OIDC SSO surface against [Keycloak](https://www.keycloak.org/). Keycloak is a free / open-source identity provider that runs on-prem or self-hosted; it is also the load-bearing test fixture for certctl's OIDC integration tests (`internal/auth/oidc/testfixtures/keycloak.go`), so the certctl-side validation pipeline is exhaustively exercised against it.
If your IdP is something else (Okta, Auth0, Azure AD, Authentik, Google Workspace), see the per-IdP siblings in [this directory](index.md). The mental model + certctl-side wiring are identical; only the IdP-side console differs.
@@ -10,7 +10,7 @@ If your IdP is something else (Okta, Auth0, Azure AD, Authentik, Google Workspac
**On the Keycloak side:**
- Keycloak ≥ 25.0 (older versions work but the screen flows differ slightly — the Phase 10 fixture pins 25.0).
- Keycloak ≥ 25.0 (older versions work but the screen flows differ slightly — the integration test fixture pins 25.0).
- Admin access to a realm — either an existing tenant realm or a fresh one created for certctl. Don't share Keycloak's `master` realm; create a dedicated realm.
- Network reachability from certctl-server to the Keycloak `https://<keycloak-host>/realms/<realm-name>` discovery endpoint. The certctl service fetches `/.well-known/openid-configuration` at provider creation and at every `RefreshKeys` call.
- Keycloak's signing alg set to RS256 (default) or any of: RS512, ES256, ES384, EdDSA. HS256/HS384/HS512 + `none` are rejected by certctl's IdP-downgrade-attack defense at provider creation time.
@@ -19,11 +19,11 @@ If your IdP is something else (Okta, Auth0, Azure AD, Authentik, Google Workspac
- `CERTCTL_CONFIG_ENCRYPTION_KEY` set to a stable secret (production deployments only — the encryption-at-rest layer for the OIDC client_secret depends on it).
- An admin actor holding `auth.oidc.create` + `auth.oidc.edit` (held by `r-admin` by default; granted via `certctl_auth_assign_role_to_key` MCP tool or the GUI's Auth → Keys page).
- Bundle 2 server build ≥ v2.1.0 (or post-`5204f1b` master).
- Server build ≥ v2.1.0.
## IdP-side configuration
The same configuration you'll do by hand here is what the Phase 10 testcontainers fixture imports from `internal/auth/oidc/testfixtures/keycloak-realm.json` — read that file alongside this runbook to see the exact JSON shape Keycloak persists.
The same configuration you'll do by hand here is what the testcontainers fixture imports from `internal/auth/oidc/testfixtures/keycloak-realm.json` — read that file alongside this runbook to see the exact JSON shape Keycloak persists.
### 1. Create or pick a realm
@@ -194,7 +194,7 @@ Operator action when Keycloak rotates its realm signing key:
2. In certctl: GUI → **Auth → OIDC Providers → Keycloak → Refresh discovery cache** button. Or the CLI / MCP equivalent: `POST /api/v1/auth/oidc/providers/<id>/refresh`.
3. Run another login. The new ID token is signed under the new key; the certctl service validates it against the freshly-fetched JWKS doc.
The Phase 10 integration test `TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey` exercises this exact flow end to end.
The Keycloak integration test `TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey` exercises this exact flow end to end.
## Troubleshooting
@@ -214,7 +214,7 @@ The user authenticated successfully but their groups didn't match any configured
- The group-membership mapper is configured correctly (Clients → certctl → Client scopes → certctl-dedicated → mappers → groups → "Full group path: off" matters).
- The group name in your certctl mapping exactly matches what Keycloak emits — case-sensitive, no leading slash if "Full group path: off".
You can confirm what Keycloak is actually emitting by decoding the ID token at jwt.io against the Keycloak public key, or by enabling certctl's debug logging on the OIDC service for one login (logs are scrubbed of token contents per the Phase 3 token-leak hygiene contract; debug logs surface only the resolved group list and the mapping decision).
You can confirm what Keycloak is actually emitting by decoding the ID token at jwt.io against the Keycloak public key, or by enabling certctl's debug logging on the OIDC service for one login (logs are scrubbed of token contents per the OIDC service's token-leak hygiene contract; debug logs surface only the resolved group list and the mapping decision).
**"id_token verify failed: token used before issued"**
Clock skew between Keycloak and certctl-server. Either align both to NTP, or bump `iat_window_seconds` on the OIDC provider config (default 300 = 5 minutes). The certctl service caps `iat_window_seconds` at 600.
@@ -226,7 +226,7 @@ The user clicked the OIDC login button, then the browser tab idled past the 10-m
Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns `ErrPreLoginNotFound`. Have them retry from the login page.
**Sessions revoked but the user can still hit the API.**
Check the Phase 4 session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `certctl_session` cookie wasn't actually cleared on the client, the cookie will hit the server's session middleware which will return 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.
Check the session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `__Host-certctl_session` cookie wasn't actually cleared on the client, the cookie will hit the server's session middleware which will return 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.
## Validation checklist
+1 -1
View File
@@ -112,7 +112,7 @@ End-to-end login + audit + Sessions checks are identical to Keycloak.
**Okta-specific:** the audit row's `details.subject` will be Okta's user UID (a 20-char alphanumeric string starting with `00u`), stable across email changes. The certctl `users` table's `oidc_subject` column will hold this UID.
**Optional Okta smoke test in CI:** Phase 10 ships an opt-in smoke test at `internal/auth/oidc/integration_okta_smoke_test.go` (build tags `integration && okta_smoke`). Set `OKTA_ISSUER` + `OKTA_CLIENT_ID` + `OKTA_CLIENT_SECRET` env vars and run `make okta-smoke-test` to drive a discovery + RefreshKeys round-trip against your live tenant. Pre-reqs: enable the Resource Owner Password (ROPC) grant on the application (Sign-On tab → Grant types → Resource Owner Password) for the smoke test only; production certctl uses auth-code-with-PKCE.
**Optional Okta smoke test in CI:** certctl ships an opt-in smoke test at `internal/auth/oidc/integration_okta_smoke_test.go` (build tags `integration && okta_smoke`). Set `OKTA_ISSUER` + `OKTA_CLIENT_ID` + `OKTA_CLIENT_SECRET` env vars and run `make okta-smoke-test` to drive a discovery + RefreshKeys round-trip against your live tenant. Pre-reqs: enable the Resource Owner Password (ROPC) grant on the application (Sign-On tab → Grant types → Resource Owner Password) for the smoke test only; production certctl uses auth-code-with-PKCE.
**JWKS-rotation drill:** Okta auto-rotates signing keys every ~3 months and publishes the new key alongside the old in the JWKS doc for ~1 month overlap. Manual rotation: **Security → API → Authorization Servers → default → Keys → "Generate new key"**. After rotation, click "Refresh discovery cache" in certctl's GUI; new tokens validate immediately.
+18 -17
View File
@@ -9,14 +9,14 @@
> [`security.md#demo-to-production-cutover-audit-2026-05-11-a-8`](security.md#demo-to-production-cutover-audit-2026-05-11-a-8).
This is the operator-facing reference for the role-based access
control primitive that ships with Bundle 1 (auth bundle 1) of certctl.
control primitive in certctl.
Read this if you're running certctl in production and need to grant /
revoke access to API keys, set up the auditor split, or onboard the
first admin.
For the threat model behind these controls, see
[`auth-threat-model.md`](auth-threat-model.md). For the migration
flow from a pre-Bundle-1 deployment, see
flow from a pre-RBAC (v2.0.x) deployment, see
[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md).
## Mental model
@@ -69,7 +69,7 @@ giving them the keys to the kingdom. The
forward.
The five **admin-only fine-grained perms** seeded by migration
000030 (Phase 3.5 conversion) gate the high-blast-radius endpoints:
000030 gate the high-blast-radius endpoints:
- `cert.bulk_revoke` - `POST /api/v1/certificates/bulk-revoke` and the EST sibling
- `crl.admin` - `/api/v1/admin/crl/cache`
@@ -141,14 +141,14 @@ even if no scoped grant exists. The reverse is also true - a
scoped grant doesn't satisfy a request against a different scope.
The Authorizer's `CheckPermission` is the single point of truth.
> **Note (Bundle 1 deferral):** the `scope_id` column is not
> **Note (deferral):** the `scope_id` column is not
> currently FK-constrained against the resource tables. An
> operator can grant a permission at scope `profile`/`p-bogus`
> without `p-bogus` existing; the gate still works (no rows match
> at request time), but the API does not 404 the grant. Bundle 2
> tracks the strict-FK closure. See
> at request time), but the API does not 404 the grant. Strict-FK
> closure is tracked for a follow-on release. See
> `internal/repository/postgres/auth.go::AddPermission`'s
> `TODO(bundle-2)` comment.
> `TODO` comment.
## Granting + revoking access
@@ -194,7 +194,7 @@ certctl-cli auth keys scope-down --non-interactive ./scope-down.json
The mutating role-lifecycle commands (`certctl-cli auth roles
create / update / delete` + `roles add-permission / remove-permission`)
are tracked as Bundle 1 Phase 5.5 follow-up; today, manage custom
are tracked as a follow-on; today, manage custom
roles via the HTTP API or GUI.
### From the HTTP API
@@ -258,7 +258,7 @@ distinguish wide cleanups from targeted demotions in the access log.
### From the MCP server
Bundle 1 Phase 11 ships 12 RBAC tools:
The MCP server ships 12 RBAC tools:
`certctl_auth_me`, `certctl_auth_list_roles`, `certctl_auth_get_role`,
`certctl_auth_create_role`, `certctl_auth_update_role`,
`certctl_auth_delete_role`, `certctl_auth_list_permissions`,
@@ -296,7 +296,7 @@ To create an auditor key:
## Day-0 bootstrap (first-admin path)
Bundle 1 Phase 6 ships a one-shot bootstrap endpoint for fresh
certctl ships a one-shot bootstrap endpoint for fresh
deployments where no admin actor exists yet.
1. Set `CERTCTL_BOOTSTRAP_TOKEN=$(openssl rand -hex 32)` in the
@@ -321,9 +321,10 @@ deployments where no admin actor exists yet.
The token is constant-time-compared. The server logs a startup
warning if `CERTCTL_BOOTSTRAP_TOKEN` is set AND admin actors
already exist (config-drift signal). For OIDC-first-admin (the
"first user who signs in via SSO becomes admin" pattern), wait for
Bundle 2.
already exist (config-drift signal). For the OIDC-first-admin
path (the "first user who signs in via SSO becomes admin"
pattern), see
[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md).
## Demo mode (`CERTCTL_AUTH_TYPE=none`)
@@ -344,11 +345,11 @@ example folders only.
- [Threat model](auth-threat-model.md) - what attacks this primitive
defends against and which it does not
- [Migration guide](../migration/api-keys-to-rbac.md) - moving
pre-Bundle-1 deployments onto RBAC
pre-RBAC (v2.0.x) deployments onto RBAC
- [Profiles](../reference/profiles.md) - the `RequiresApproval=true`
flow that Bundle 1 Phase 9 closure protects from flip-flop
- [Approval workflow](approval-workflow.md) - the Rank 7 Infisical
deep-research deliverable that the Phase 9 closure piggybacks on
flow with the flip-flop-bypass closure
- [Approval workflow](approval-workflow.md) - the two-person
integrity primitive backing `RequiresApproval`
- `internal/auth/` - the middleware + keystore + RequirePermission
- `internal/service/auth/` - the service-layer Authorizer
- `cowork/auth-bundle-1-prompt.md` - the design + phase plan
+5 -6
View File
@@ -2,12 +2,11 @@
> Last reviewed: 2026-05-05
> **Status (this document):** Production hardening II Phase 10
> deliverable. Codifies the fail-safe behaviors that already exist in
> the codebase and the operator procedures for recovering from
> common failure modes. Nothing in this runbook requires new code —
> if a procedure here doesn't work as documented, that's a bug in
> docs (file an issue).
> **Status (this document):** Operator runbook codifying the
> fail-safe behaviors that already exist in the codebase and the
> procedures for recovering from common failure modes. Nothing in
> this runbook requires new code — if a procedure here doesn't work
> as documented, that's a bug in docs (file an issue).
This runbook is the on-call deliverable: it tells reviewers and
on-call operators what to do when a piece of certctl's state
+48 -53
View File
@@ -9,16 +9,15 @@ any).
## OCSP responder availability
**Audit reference:** Bundle C / M-020. CWE-770 (uncontrolled resource
consumption); RFC 6960 (OCSP); RFC 7633 (Must-Staple).
**Audit reference:** CWE-770 (uncontrolled resource consumption); RFC
6960 (OCSP); RFC 7633 (Must-Staple).
certctl ships an OCSP responder at `/.well-known/pki/ocsp/{issuer_id}/{serial}`
that signs a fresh response per request. Pre-Bundle-C the unauth handler
chain had no rate limit, so an attacker could DoS the responder and force
fail-open relying parties to accept revoked certificates as valid. Bundle C
adds the same per-key rate limiter to the unauth chain that the authenticated
chain has used since Bundle B. Per-IP keying applies because OCSP traffic is
unauthenticated.
that signs a fresh response per request. The unauth handler chain
applies the same per-key rate limiter the authenticated chain uses;
per-IP keying applies because OCSP traffic is unauthenticated. Without
this defense an attacker could DoS the responder and force fail-open
relying parties to accept revoked certificates as valid.
The rate limiter alone does not solve the underlying revocation-bypass risk.
**The architectural fix is for issued certificates to carry the OCSP
@@ -59,11 +58,11 @@ For certificates issued to systems where revocation correctness matters:
## Postgres transport encryption
See [docs/database-tls.md](database-tls.md). Bundle B / M-018.
See [docs/database-tls.md](database-tls.md).
## Encryption at rest
Bundle B / M-001. PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password
PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password
Storage Cheat Sheet floor) for the operator-supplied passphrase that
derives the AES-256-GCM key for sensitive config columns. v3 blob format
with a per-ciphertext random salt; v1/v2 read fallback for legacy rows.
@@ -72,13 +71,13 @@ the accompanying tests for the format spec.
## Authentication surface
Bundle B / M-002. Two layers decide auth-exempt status:
Two layers decide auth-exempt status:
1. **Router layer:** `internal/api/router/router.go::AuthExemptRouterRoutes`
- the endpoints registered via direct `r.mux.Handle` without going
through the middleware chain (`/health`, `/ready`, `/api/v1/auth/info`,
`/api/v1/version`, plus `/api/v1/auth/bootstrap` GET + POST per
Bundle 1 Phase 6).
`/api/v1/version`, plus `/api/v1/auth/bootstrap` GET + POST for the
first-admin path).
2. **Dispatch layer:** `internal/api/router/router.go::AuthExemptDispatchPrefixes`
- URL-prefix routing in `cmd/server/main.go::buildFinalHandler` for
`/.well-known/pki/*`, `/.well-known/est/*`, `/.well-known/est-mtls`,
@@ -87,26 +86,25 @@ Bundle B / M-002. Two layers decide auth-exempt status:
Both lists have AST-walking regression tests (`auth_exempt_test.go`) that
fail CI if a new bypass lands without updating the documented constant.
### RBAC primitive (Bundle 1)
### Role-based authorization
Bundle 1 ships role-based authorization on top of API-key
authentication. Every gated handler routes through the
`auth.RequirePermission` middleware (or its router-level wrap
`rbacGate`); the middleware resolves the actor's effective
permissions via the service-layer `Authorizer.CheckPermission`
and returns HTTP 403 BEFORE the handler body runs on miss. The
seven default roles (`admin` / `operator` / `viewer` / `agent` /
`mcp` / `cli` / `auditor`), 33-permission canonical catalogue,
and the auditor split (`r-auditor` holds only `audit.read` +
`audit.export`) are seeded by migration 000029.
Role-based authorization runs on top of API-key authentication. Every
gated handler routes through the `auth.RequirePermission` middleware
(or its router-level wrap `rbacGate`); the middleware resolves the
actor's effective permissions via the service-layer
`Authorizer.CheckPermission` and returns HTTP 403 BEFORE the handler
body runs on miss. The seven default roles (`admin` / `operator` /
`viewer` / `agent` / `mcp` / `cli` / `auditor`), 33-permission
canonical catalogue, and the auditor split (`r-auditor` holds only
`audit.read` + `audit.export`) are seeded by migration 000029.
For the operator how-to, see [`rbac.md`](rbac.md). For the
threat model + compliance mapping, see
[`auth-threat-model.md`](auth-threat-model.md). For the upgrade
flow from a pre-Bundle-1 deployment, see
flow from an API-key-only deployment, see
[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md).
### Day-0 admin bootstrap (Bundle 1 Phase 6)
### Day-0 admin bootstrap
Fresh deployments where no admin actor exists yet can mint the
first admin via `POST /api/v1/auth/bootstrap` - set
@@ -119,24 +117,25 @@ into the HTTP response body. See
[`rbac.md`](rbac.md#day-0-bootstrap-first-admin-path) for the
full flow.
### Approval-bypass closure (Bundle 1 Phase 9)
### Approval-bypass closure
`CertificateProfile.RequiresApproval=true` profiles route both
issuance/renewal AND profile edits through the
`ApprovalService` two-person integrity gate (Phase 9 closes the
flip-flop loophole where an admin could disable approval, mutate,
re-enable). Same-actor self-approve is rejected at the service
layer with `ErrApproveBySameActor`. See
`ApprovalService` two-person integrity gate. The flip-flop loophole
(an admin disabling approval, mutating, re-enabling) is closed by
gating profile-edit through the same approval flow. Same-actor
self-approve is rejected at the service layer with
`ErrApproveBySameActor`. See
[`docs/reference/profiles.md`](../reference/profiles.md) for the
full gate semantics.
### OIDC federation (Bundle 2 Phases 1-7)
### OIDC federation
Bundle 2 adds OIDC SSO on top of the API-key + RBAC foundation.
Operators configure one or more identity providers (Keycloak,
Authentik, Okta, Auth0, Entra ID, or Google Workspace via Keycloak
broker); end users sign in at the IdP, certctl validates the
returned ID token, and a session cookie is minted.
OIDC SSO runs on top of the API-key + RBAC foundation. Operators
configure one or more identity providers (Keycloak, Authentik, Okta,
Auth0, Entra ID, or Google Workspace via Keycloak broker); end users
sign in at the IdP, certctl validates the returned ID token, and a
session cookie is minted.
The token-validation pipeline pins:
@@ -151,9 +150,9 @@ The token-validation pipeline pins:
- Exact `iss` match (`ErrIssuerMismatch`).
- `aud` membership + `azp` for multi-aud tokens (per OIDC core
§3.1.3.7 step 5).
- `at_hash` REQUIRED-when-access_token-present (Phase 3 tightening
of the spec MAY → MUST so a substituted access token cannot
ride alongside a clean ID token).
- `at_hash` REQUIRED-when-access_token-present (a tightening of the
spec MAY → MUST so a substituted access token cannot ride alongside
a clean ID token).
- Single-use state + nonce (32-byte random server-generated;
atomic `DELETE...RETURNING` on consume).
- PKCE-S256 mandatory; `plain` rejected.
@@ -175,7 +174,7 @@ Per-IdP setup guides at
[`oidc-runbooks/index.md`](oidc-runbooks/index.md) cover Keycloak,
Authentik, Okta, Auth0, Entra ID, and Google Workspace.
### Sessions + back-channel logout (Bundle 2 Phases 4-6)
### Sessions + back-channel logout
Successful OIDC login mints a session cookie:
`v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
@@ -220,9 +219,9 @@ For threat-model coverage of these surfaces, see
operator-runnable performance baselines, see
[`auth-benchmarks.md`](auth-benchmarks.md).
### OIDC first-admin bootstrap (Bundle 2 Phase 7)
### OIDC first-admin bootstrap
Coexists with Bundle 1's env-var-token bootstrap. When the
Coexists with the env-var-token bootstrap path. When the
operator sets `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + (optionally)
`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`, the first user with one of
those IdP groups becomes admin on first login per tenant.
@@ -232,7 +231,7 @@ once any actor holds `r-admin`, the OIDC bootstrap hook silently
falls through to normal mapping. Audit row on every grant
(`bootstrap.oidc_first_admin`, `event_category=auth`).
### Break-glass admin (Bundle 2 Phase 7.5)
### Break-glass admin
Default-OFF (`CERTCTL_BREAKGLASS_ENABLED=false`). When enabled,
the local-password admin path bypasses OIDC + group-claim layers;
@@ -319,8 +318,8 @@ Operator workflow at production cutover:
### Migrating an existing deployment to OIDC
A Bundle-1-merged deployment that wants to add OIDC follows the
step-by-step at
An existing API-key-only deployment that wants to add OIDC follows
the step-by-step at
[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md):
configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP
per the relevant runbook, configure the certctl-side OIDCProvider
@@ -330,7 +329,7 @@ organization.
## Per-user rate limiting
Bundle B / M-025. Authenticated callers are bucketed by API-key name;
Authenticated callers are bucketed by API-key name;
unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees)
are bucketed by source IP. `RPS` and `BurstSize` are per-key budgets.
`PerUserRPS` / `PerUserBurstSize` give authenticated clients a separate
@@ -345,11 +344,7 @@ certctl's API keys are configured via the `CERTCTL_API_KEYS_NAMED` env var
in-memory list. There is no DB-resident key store, no GUI, no `/api/v1/keys`
endpoint - the env var IS the key inventory.
Pre-Bundle-G the env var rejected duplicate names, so rotating a key
required: stop accepting OLDKEY → restart → roll NEWKEY out. Any client
polling against OLDKEY during the restart window hit a 401.
Bundle G adds a **double-key rotation window**: two entries can share a
The env var supports a **double-key rotation window**: two entries can share a
name during the rollover, and both keys validate. Operators run the
rotation as:
@@ -395,7 +390,7 @@ the end of step 4, extend the window before step 5.
startup** (privilege escalation guard).
- Two entries with the same `(name, key)` pair: **rejected at startup**
(typo guard - rotation requires DIFFERENT keys under the same name).
- Single-entry steady state: unchanged from pre-Bundle-G behavior.
- Single-entry steady state: the simple legacy behaviour.
### What the contract does NOT do