diff --git a/CHANGELOG.md b/CHANGELOG.md index f13b0f2..6c2f96f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## v2.1.0 - Auth Bundle 1: RBAC primitive ⚠️ +## v2.1.0 - Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions ⚠️ > **SECURITY: AUDIT YOUR API KEYS.** > @@ -87,15 +87,168 @@ What else changed in v2.1.0: `phase12_protocol_allowlist_test.go` AST scan all guard against accidentally wrapping ACME / SCEP / EST / OCSP / CRL routes in `rbacGate`. -- **Bundle 2 (OIDC + sessions) starts after Bundle 1 lands on - master.** Roadmap entry remains in `cowork/auth-bundle-2-prompt.md`. +- **Bundle 2: OIDC + sessions + back-channel logout + break-glass.** + Auth Bundle 2 ships in the same v2.1.0 release. Operators get OIDC + SSO support for Keycloak / Authentik / Okta / Auth0 / Microsoft + Entra ID / Google Workspace (via Keycloak broker), HMAC-signed + session cookies with idle/absolute timeouts + CSRF defense, + back-channel logout per OpenID Connect Back-Channel Logout 1.0, + and a default-OFF break-glass admin path with Argon2id passwords + for SSO-broken incidents. API-key auth keeps working unchanged + alongside; existing automation needs no changes. Migration walkthrough + at [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md); + per-IdP setup guides at + [`docs/operator/oidc-runbooks/index.md`](docs/operator/oidc-runbooks/index.md). +- **OIDC token validation pinned at three layers.** Algorithm + allow-list (RS256/RS512/ES256/ES384/EdDSA only) with HS-family + `none` + rejected at the service-layer sentinel; IdP-downgrade-attack defense + at provider creation AND every JWKS RefreshKeys (intersects the IdP's + advertised `id_token_signing_alg_values_supported` against the allow- + list, rejects providers that advertise weak algs even before any + token is signed); OIDC Core §3.1.3.7 re-verification of `iss` / + `aud` / `azp` / `at_hash` (REQUIRED-when-access_token-present per + Phase 3 tightening of the spec MAY → MUST) / `exp` / `iat` window + / `nonce` constant-time-compare. PKCE-S256 mandatory; `plain` + rejected. Single-use state + nonce via atomic `DELETE...RETURNING` + on consume. +- **Session cookies use length-prefixed HMAC.** The cookie wire format + is `v1...` + with HMAC input `len:sid:len:kid` (NOT bare-concat) to defeat + concatenation collisions. `HttpOnly` + `Secure` + `SameSite=Lax` + default; `SameSite=Strict` configurable via `CERTCTL_SESSION_SAMESITE`. + Idle timeout 1h / absolute 8h defaults; scheduler GC sweeps expired + rows hourly. Signing keys rotate via the new `RotateSigningKey` + primitive; the old key stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION` + (default 24h) so existing cookies validate during rollover. +- **CSRF defense via double-submit-cookie + hashed-token-on-row.** + Plaintext CSRF token in the JS-readable `certctl_csrf` cookie + (intentionally `HttpOnly=false` for the GUI to echo into the + `X-CSRF-Token` header); SHA-256 hash on the session row; + `subtle.ConstantTimeCompare` in the new `CSRFMiddleware`. API-key + actors are CSRF-exempt (no session row in context). +- **OIDC `client_secret` encrypted at rest.** AES-256-GCM v3 blob + format (magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using + the existing `CERTCTL_CONFIG_ENCRYPTION_KEY`. Encryption invariant + pinned by an integration test asserting ciphertext != plaintext + + v3 blob shape + round-trip recovery + wrong-passphrase fails. +- **OIDC first-admin bootstrap.** New `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + + `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` env vars: the first + OIDC-authenticated user with a matching group claim becomes admin + per tenant. Coexists with the Bundle 1 env-var-token bootstrap; + the admin-existence probe ensures only one wins. Audit row + (`bootstrap.oidc_first_admin`) on every grant. +- **Break-glass admin (default-OFF).** New `CERTCTL_BREAKGLASS_ENABLED` + env var (default `false`). When enabled, the local Argon2id-password + admin path bypasses OIDC + group-claim layers — intended ONLY for + SSO-broken incidents. Argon2id with OWASP 2024 params (m=64 MiB, + t=3, p=4); lockout after 5 failures (configurable); constant-time + across all failure paths via `verifyDummy`; surface invisibility + (HTTP 404 on every endpoint when disabled, NOT 403). WARN log at + server boot when enabled. WebAuthn/FIDO2 second factor pairing on + the v3 roadmap (Decision 12). +- **GUI: OIDC Providers + Group → Role Mappings + Sessions + login + buttons.** Four new pages under `/auth/*` consume the Bundle 2 API + surface. Login page renders one "Sign in with X" button per + configured OIDC provider (in addition to the API-key form, which + remains as a fallback for Bearer-mode + break-glass paths). Sessions + page exposes own-sessions + admin all-actors view. Every actionable + element is permission-gated server-side via `auth.oidc.*` and + `auth.session.*` perms; client-side hide is UX layer. Logout button + in the sidebar fires `POST /auth/logout` to clear the session + server-side before redirecting to login. +- **MCP server gains 11 OIDC + session tools.** `certctl_auth_list_oidc_providers`, + `_get_oidc_provider`, `_create_oidc_provider`, `_update_oidc_provider`, + `_delete_oidc_provider`, `_refresh_oidc_provider`, + `_list_group_mappings`, `_add_group_mapping`, `_remove_group_mapping`, + `_list_sessions`, `_revoke_session`. Operator-facing MCP tool count + goes 12 (Bundle 1 RBAC) → 23 across the auth surface. Total MCP + tool count: `grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go` ≈ 150. +- **Per-IdP runbooks: 6 production-tier setup guides** at + `docs/operator/oidc-runbooks/`. Each runbook follows a consistent + five-section layout (Prerequisites / IdP-side config / certctl-side + config / Verification / Troubleshooting + Validation checklist with + operator sign-off line). Keycloak is the canonical reference; + Authentik / Okta / Auth0 / Entra ID / Google Workspace document the + IdP-specific deltas (Auth0's namespaced custom claims; Entra ID's + group OBJECT IDs; Google Workspace's missing-groups-claim limitation + + the recommended Keycloak broker pattern). +- **Threat model extended.** [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md) + ships 5 new "Defenses Bundle 2 ships" subsections + 8 new threat- + catalogue subsections (OIDC token forgery / session hijacking / IdP + compromise / back-channel logout failure modes / group-claim + manipulation / bootstrap risks / break-glass risks / token-leak + hygiene). 6 new SQL-shaped operator-facing checks. New "Threats + Bundle 2 does NOT close" section enumerating the 8 v3-backlog items + (WebAuthn / JIT elevation / SAML / multi-tenant activation / + HSM-FIPS / OIDC RP-initiated logout / Playwright / per-IdP + external-tester sign-off). +- **Performance baselines documented.** [`docs/operator/auth-benchmarks.md`](docs/operator/auth-benchmarks.md) + ships four benchmarks with measured baselines on a 4 vCPU / + 8 GiB / Postgres 16 / Go 1.25 floor: `BenchmarkSession_SteadyState` + p99 5 µs (target < 1 ms; 200× under), `BenchmarkSession_ColdProcess` + p99 7.1 ms (target < 10 ms), `BenchmarkOIDC_SteadyState` p99 1.5 ms + (target < 5 ms), `BenchmarkOIDC_ColdCache` operator-runs against + live Keycloak via `make benchmark-auth-coldcache`. +- **Standards + RFC implementation table.** [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md) + ships 13 RFC / standard rows + 14 CWE rows with concrete file paths + + negative-test anchors per row. NOT a compliance-mapping doc per + the operator's 2026-05-05 retired-compliance-docs decision; the + doc explicitly says "build the framework mapping yourself against + the rows here using the framework-mapping methodology your audit + firm prescribes; this project does not own that mapping." +- **Coverage gates held at floor 90 across all four Bundle 2 + packages.** `internal/auth/oidc/` 93.7%, `internal/auth/session/` + 94.9%, `internal/auth/breakglass/` 91.5%, `internal/auth/user/domain/` + 96.4%. NO held-low-with-rationale entry — the Phase 13 prompt's + anti-Bundle-1-mistake rule held. Bundle 1's existing 85% floors + for `internal/auth/` + `internal/service/auth/` stay 85 + (already-shipped-and-accepted) per the prompt's explicit + inheritance rule. +- **Multi-tenant query CI guard.** New `scripts/ci-guards/multi-tenant-query-coverage.sh` + (ratchet-style, baseline 32 at v2.1.0 close): greps every + SELECT/UPDATE/DELETE in `internal/repository/postgres/` against + 10 tenant-aware tables, fails on regression OR improvement (forces + the operator to lift / lower the baseline visibly). Forward-compat + protection so a future Bundle 3 / managed-service multi-tenant + activation can flip the switch without finding silent + tenant-data-leak bugs in shipped queries. +- **Phase 10 Keycloak testcontainers integration test.** New build-tag- + gated suite at `internal/auth/oidc/testfixtures/` + `integration_keycloak_test.go` + drives the full OIDC flow against a live Keycloak container booted + by testcontainers-go. 5-test matrix: discovery + JWKS load, full + PKCE auth-code happy path with HTTP form scraping, logout-revokes- + session, JWKS rotation, unmapped-groups-fails-closed. Reuses one + container across the matrix to amortize the 60-90s boot. Optional + Okta smoke test (build-tagged `integration && okta_smoke`) for live + tenant validation. New Makefile targets: `make keycloak-integration-test` + + `make okta-smoke-test` + `make benchmark-auth-coldcache`. +- **OpenAPI surface extended.** New `cookieAuth` security scheme + (apiKey/cookie/`certctl_session`) alongside the existing + `bearerAuth`. 13 new Bundle 2 endpoints across the OIDC + session + + group-mapping CRUD surface; 4 break-glass endpoints with + surface-invisibility framing. The N-bundle-2-security-empty-preserved + CI guard locks the `security: []` opt-out count at ≥ 14 so existing + public endpoints stay public. +- **Bundle-1-only compat regression CI guard.** New + `scripts/ci-guards/bundle-1-compat-regression.sh` asserts the + load-bearing invariants that protect the Bundle-1-only-deploy + case (session middleware defers-to-next, CSRF passthrough on + missing session row, ChainAuthSessionThenBearer wired, public + OIDC routes in AuthExempt allowlist, AuthInfo guards on + OIDCProvidersResolver != nil). Sibling + `bundle-1-to-2-upgrade-regression.sh` asserts the upgrade-path + invariants (migrations 000034..000038 are CREATE TABLE IF NOT EXISTS + + BEGIN/COMMIT-wrapped + no DROP TABLE / ALTER...DROP COLUMN + against 19 protected Bundle-1 tables + ON CONFLICT DO NOTHING on + permission seed). Migration ordering, idempotency, and downgrade are documented in -[`docs/migration/api-keys-to-rbac.md`](docs/migration/api-keys-to-rbac.md). -The threat model + compliance mapping live at +[`docs/migration/api-keys-to-rbac.md`](docs/migration/api-keys-to-rbac.md) +(API-key → RBAC, Bundle 1) and [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md) +(API-key → OIDC, Bundle 2). The threat model lives at [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md). -Day-2 RBAC operations live at -[`docs/operator/rbac.md`](docs/operator/rbac.md). +Day-2 RBAC operations live at [`docs/operator/rbac.md`](docs/operator/rbac.md). +RFC + CWE evidence at [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md). ## v2.0.68 - Image registry path changed ⚠️ diff --git a/docs/README.md b/docs/README.md index 5389232..6190da7 100644 --- a/docs/README.md +++ b/docs/README.md @@ -97,6 +97,7 @@ You're moving from another cert-management tool to certctl, or running both in p | cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) | | Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) | | **API keys → RBAC (v2.0.x → v2.1.0)** | [migration/api-keys-to-rbac.md](migration/api-keys-to-rbac.md) — **AUDIT YOUR API KEYS** post-upgrade | +| **Enable OIDC SSO on a Bundle-1-merged deployment** | [migration/oidc-enable.md](migration/oidc-enable.md) — step-by-step Bundle 2 OIDC onboarding | ## Contributor diff --git a/docs/migration/oidc-enable.md b/docs/migration/oidc-enable.md new file mode 100644 index 0000000..18ba846 --- /dev/null +++ b/docs/migration/oidc-enable.md @@ -0,0 +1,245 @@ +# Enable OIDC SSO on a Bundle-1-merged deployment + +> Last reviewed: 2026-05-10 + +This guide walks an operator already running certctl with Bundle 1 (RBAC primitive on top of API-key auth) through enabling OIDC SSO from Bundle 2. The path is additive: API-key auth keeps working unchanged; OIDC sits alongside as a second authentication surface for human users. + +If you are upgrading from a pre-Bundle-1 deployment, finish [`api-keys-to-rbac.md`](api-keys-to-rbac.md) first. If you have not deployed certctl at all, start with [`getting-started/quickstart.md`](../getting-started/quickstart.md). For the canonical mental model + per-flow threat coverage, see [`security.md`](../operator/security.md) and [`auth-threat-model.md`](../operator/auth-threat-model.md). + +## What "enable OIDC" gives you + +After this migration: + +- Human operators can log in via the OIDC button on the certctl login page (one button per configured IdP). +- The IdP authenticates the user; certctl validates the returned ID token, mints a session cookie, and redirects to the dashboard. +- IdP groups → certctl roles are operator-configured (e.g. `engineering@example.com` → `r-operator`). +- Every login emits an audit row (`auth.oidc_login_succeeded`) attributing the action to the federated user, NOT to a shared API key. +- The first user from a configured admin group (when `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` is set) becomes admin per tenant; one-shot per the admin-existence probe. + +What does NOT change: + +- API keys keep working. Existing automation continues to authenticate via `Authorization: Bearer` exactly as before. +- The break-glass admin path (Phase 7.5) stays default-OFF. +- The auditor split + approval workflow + RBAC primitive are unchanged. + +## Pre-requisites + +**On certctl side:** + +- Server build ≥ v2.1.0 (the post-Bundle-2 master). Confirm via `curl https://:8443/api/v1/version`. +- `CERTCTL_CONFIG_ENCRYPTION_KEY` set in the server environment. This is the passphrase that encrypts the OIDC `client_secret` at rest. Use a stable, secrets-manager-stored value at least 32 random bytes long. **The server refuses to start if the key is missing AND any source='database' rows already exist** (per Bundle B / M-001 / CWE-311 closure). Set this before doing anything else. +- An admin actor available to drive the configuration. The actor needs the `auth.oidc.create` + `auth.oidc.edit` permissions; `r-admin` carries both by default. Get one via the day-0 bootstrap path if you don't have one yet. +- HTTPS-only control plane (post-v2.2 milestone — this is the default). The OIDC redirect URI MUST be `https://`. + +**On IdP side:** + +- A Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace tenant where you can register an OIDC application. Free dev tiers work for evaluation. See the per-IdP runbook at [`oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md). +- Network reachability from certctl-server to the IdP's `/.well-known/openid-configuration` discovery endpoint. The certctl service fetches discovery + JWKS at provider creation and at every `RefreshKeys` call. + +## Step-by-step + +### 1. Pin `CERTCTL_CONFIG_ENCRYPTION_KEY` + +If your deployment already has it set (the Bundle B M-001 fail-closed gate enforces this for any source='database' issuer/target row), skip this step. If you don't: + +```bash +# Generate a 32-byte random key + base64-encode it. +openssl rand -base64 32 > /etc/certctl/config-encryption-key +chmod 600 /etc/certctl/config-encryption-key +``` + +Then make the server consume it at boot: + +```bash +# In your environment, systemd unit, k8s Secret, etc. +export CERTCTL_CONFIG_ENCRYPTION_KEY="$(cat /etc/certctl/config-encryption-key)" +``` + +Restart the server. Confirm the boot log does NOT show the `ErrEncryptionKeyRequired` warning. If it does, the server refuses to start because there's pre-existing source='database' material that needs to be re-sealed; see the pre-Bundle-B migration notes for re-encryption flow. + +### 2. Pick an IdP runbook + complete the IdP-side configuration + +Pick the runbook for your IdP and do EVERYTHING in its IdP-side section. The runbooks are at [`docs/operator/oidc-runbooks/`](../operator/oidc-runbooks/index.md). What you need from the runbook before continuing here: + +- The IdP's discovery URL (the `iss` value certctl will validate against). +- An OIDC client ID + client secret. Save the secret; you'll paste it into certctl in step 3. +- At least one IdP group with the users who should be allowed to log in. The runbook walks the group-claim mapper config. +- The IdP-side group claim shape — most IdPs emit `string-array` under a `groups` key, but Auth0 uses namespaced URL keys (`https://your-namespace/groups`) and Entra ID emits group OBJECT IDs (GUIDs) instead of names. The runbook calls out the per-IdP shape. + +### 3. Configure the certctl-side OIDC provider + +Via the GUI (recommended for first-time setup): + +1. Sign in as an admin actor. +2. Navigate to **Auth → OIDC Providers** in the sidebar. +3. Click **Configure provider**. +4. Fill in the form using the values from step 2's runbook. +5. Click **Save**. + +If the discovery doc fetch fails, the modal surfaces the error inline. Most-common cause: a typo in the issuer URL. + +Or via the CLI / MCP: + +```bash +curl -X POST https://:8443/api/v1/auth/oidc/providers \ + -H "Authorization: Bearer ${CERTCTL_API_KEY}" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "Keycloak", + "issuer_url": "https://keycloak.example.com/realms/certctl", + "client_id": "certctl", + "client_secret": "", + "redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback", + "groups_claim_path": "groups", + "groups_claim_format": "string-array", + "scopes": ["openid", "profile", "email"], + "iat_window_seconds": 300, + "jwks_cache_ttl_seconds": 3600 + }' +``` + +The MCP equivalent (`certctl_auth_create_oidc_provider`) accepts the same JSON shape. + +### 4. Add the group → role mappings + +Empty mapping list = nobody can log in via this provider (the fail-closed contract; pinned by `ErrGroupsUnmapped`). Add at least one mapping BEFORE announcing the SSO endpoint to users. + +Via the GUI: **Auth → OIDC Providers → → Group → role mappings → Add**. + +Via the API: + +```bash +curl -X POST https://:8443/api/v1/auth/oidc/group-mappings \ + -H "Authorization: Bearer ${CERTCTL_API_KEY}" \ + -H "Content-Type: application/json" \ + -d '{ + "provider_id": "", + "group_name": "engineering@example.com", + "role_id": "r-operator" + }' +``` + +A typical setup adds two or three mappings: `engineers → r-operator`, `viewers → r-viewer`, optionally `admins → r-admin`. For Entra ID, use group object IDs (GUIDs) NOT names; for Auth0, use the bare group name from inside the namespaced claim array. + +### 5. (Optional) Configure first-admin bootstrap + +If your deployment has no admin actor yet AND you want the first OIDC-authenticated user from a specific group to become admin (instead of using the env-var-token bootstrap path), set: + +```bash +export CERTCTL_BOOTSTRAP_ADMIN_GROUPS=admins +export CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID= +``` + +Restart the server. The first user with the `admins` group claim from that provider becomes admin on login per tenant. Subsequent logins go through normal group-role mapping. Audit row on every grant (`bootstrap.oidc_first_admin`). + +If you already have an admin actor (likely — you needed one to run step 3), the bootstrap hook silently falls through to normal mapping; no harm done. The probe is one-shot per tenant and can't double-grant. + +### 6. Verify with a single test user + +Before announcing the SSO endpoint to your users, verify the full login flow with a test user from your IdP: + +1. Open `https://:8443/login` in a fresh incognito window. +2. The page should render `Sign in with ` button(s) above the API-key form. If not, check that `getAuthInfo` is returning the `oidc_providers` field — `curl https://:8443/api/v1/auth/info` should show the configured provider(s). +3. Click the provider button. The browser redirects to the IdP, you authenticate, and the IdP redirects back. You should land on the certctl dashboard. +4. Navigate to **Auth → Sessions**. You should see a row with your own actor ID and the current timestamp. +5. Confirm the audit row: + + ```bash + curl https://:8443/api/v1/audit?category=auth \ + -H "Authorization: Bearer ${CERTCTL_API_KEY}" \ + | jq '.events[] | select(.action == "auth.oidc_login_succeeded")' + ``` + + You should see a row attributed to the federated user with `details.provider_id` matching your configuration. + +If any step fails, see the **Troubleshooting** section below. + +### 7. Announce the SSO endpoint + +Once step 6 passes, the SSO endpoint is operational. Tell your users to log in via `https://:8443/login` and click the provider button. API-key auth continues to work for automation; the two paths coexist. + +Optional GUI hardening: + +- If you want the API-key form hidden once OIDC is configured, the operator can add a frontend feature flag in a follow-on commit. Default behavior keeps both paths visible (the API-key form stays for break-glass + Bearer-mode deploys). +- If you want to revoke a user's session immediately (e.g. an employee left), use **Auth → Sessions → All actors (admin) → → Revoke**. The next request from that user's browser fails 401. + +## Rollback + +If you need to disable OIDC: + +1. Delete every group-role mapping for the provider: + ```bash + # GUI: Auth → OIDC Providers → → Group → role mappings → Remove (each) + ``` +2. Delete the OIDC provider: + ```bash + # GUI: Auth → OIDC Providers → → Delete (type-confirm-name dialog) + ``` + The server returns HTTP 409 if any user has an authenticated session minted via this provider; revoke those sessions first. +3. The `Sign in with ` button disappears from the login page on the next `getAuthInfo` round-trip (typically the next page load). +4. Existing sessions continue to work until idle/absolute expiry. To force-revoke them, **Auth → Sessions → All actors (admin) → revoke each row**. + +API-key auth continues to work throughout this rollback; you do not need to re-bootstrap or change any other configuration. + +## Troubleshooting + +**"Discovery doc fetch failed" at provider creation.** +The most common cause is a typo in the issuer URL. Curl the URL manually: +```bash +curl -v https:////.well-known/openid-configuration +``` +If that returns 404, fix the issuer URL. + +**"IdP downgrade-attack defense" rejected provider creation.** +Your IdP advertises HS256/HS384/HS512 or `none` in `id_token_signing_alg_values_supported`. Configure the IdP to advertise only RS256 / RS512 / ES256 / ES384 / EdDSA before re-creating the provider in certctl. The relevant runbook section walks this. + +**Login redirects to IdP, user authenticates, but the callback redirects back to `/login` with "no roles assigned".** +The user authenticated successfully but their groups didn't match any configured mapping (`ErrGroupsUnmapped`). Check: +- The user is a member of the IdP group you mapped. +- The group-claim mapper is configured correctly at the IdP (the runbook walks per-IdP). +- The group name in your certctl mapping exactly matches what the IdP emits — case-sensitive, no leading slash for Keycloak full-path-OFF. + +Decode the ID token at jwt.io against the IdP's JWKS to see exactly what's in the `groups` claim. + +**`ErrIssuerMismatch` even though the discovery doc looks correct.** +The `iss` claim in the ID token must match `OIDCProvider.IssuerURL` byte-for-byte. Some IdPs include / omit a trailing slash; check the per-IdP runbook section on `iss` formatting. + +**`oidc: pre-login session not found or already consumed`.** +The user clicked the OIDC login button, then the browser tab idled past the 10-minute pre-login TTL OR the user opened the IdP login in a new tab and consumed the row from the first one. Have them retry from the login page. + +**`oidc: state parameter mismatch (replay or forgery)`.** +Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns `ErrPreLoginNotFound`. Have them retry from the login page. + +**`Sessions revoked but the user can still hit the API.`** +Check the Phase 4 session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `certctl_session` cookie wasn't actually cleared on the client, the cookie hits the server's session middleware which returns 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case. + +**JWKS rotation: an IdP rotated its signing key and existing users start failing login.** +Click **Refresh discovery cache** on the OIDC provider detail page (or `POST /api/v1/auth/oidc/providers//refresh`). The certctl service re-fetches discovery + JWKS. New tokens validate immediately. The Phase 10 integration test exercises this drill end to end. + +**Database row count drift.** +After OIDC is live, expect to see new rows under: +- `oidc_providers` (one per configured provider) +- `group_role_mappings` (one per configured mapping) +- `users` (one per first OIDC-authenticated user; certctl auto-upserts on login) +- `sessions` (one per logged-in browser session; idle 1h / absolute 8h GC) +- `session_signing_keys` (one active + retained-history rows post rotation) +- `oidc_pre_login_sessions` (transient; 10-minute TTL, scheduler-GC'd) + +All ten of these tables are tenant-scoped (`tenant_id` column); single-tenant deployments use the seeded `t-default` tenant. + +## What you can do next + +- Run [`docs/operator/oidc-runbooks/.md`](../operator/oidc-runbooks/index.md) end to end to fill in the validation checklist + sign-off line. +- Read [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) for the steady-state + cold-cache performance baselines. +- Review the [`auth-threat-model.md`](../operator/auth-threat-model.md) Bundle 2 sections to understand the failure modes the OIDC + sessions surface defends against. +- Schedule a rotation reminder for the OIDC `client_secret` (typically 6-12 months; the IdP doesn't auto-rotate it). Edit the provider via the GUI when the time comes; leaving `client_secret` blank in the edit form preserves the existing ciphertext, providing a value rotates. + +## Cross-references + +- [`docs/operator/oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md) — per-IdP setup guides. +- [`docs/operator/security.md`](../operator/security.md) — overall auth surface incl. this Bundle 2 OIDC layer. +- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) — threat model. +- [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) — performance baselines. +- [`docs/reference/auth-standards-implemented.md`](../reference/auth-standards-implemented.md) — RFC + CWE evidence list. +- `internal/auth/oidc/` — OIDC service implementation. +- `internal/auth/session/` — session minting + middleware + signing-key rotation. diff --git a/docs/operator/security.md b/docs/operator/security.md index 0c4f0b7..376dee1 100644 --- a/docs/operator/security.md +++ b/docs/operator/security.md @@ -1,6 +1,6 @@ # certctl Security Posture & Operator Guidance -> Last reviewed: 2026-05-09 +> Last reviewed: 2026-05-10 This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit @@ -130,6 +130,149 @@ layer with `ErrApproveBySameActor`. See [`docs/reference/profiles.md`](../reference/profiles.md) for the full gate semantics. +### OIDC federation (Bundle 2 Phases 1-7) + +Bundle 2 adds OIDC SSO on top of the API-key + RBAC foundation. +Operators configure one or more identity providers (Keycloak, +Authentik, Okta, Auth0, Entra ID, or Google Workspace via Keycloak +broker); end users sign in at the IdP, certctl validates the +returned ID token, and a session cookie is minted. + +The token-validation pipeline pins: + +- Algorithm allow-list: RS256 / RS512 / ES256 / ES384 / EdDSA only. + HS256 / HS384 / HS512 / `none` are rejected at the service-layer + sentinel level. +- IdP-downgrade-attack defense at provider creation AND every + RefreshKeys: the IdP's advertised + `id_token_signing_alg_values_supported` is intersected with the + allow-list; a provider that advertises HS-family is rejected + before any token is signed under the weak alg. +- Exact `iss` match (`ErrIssuerMismatch`). +- `aud` membership + `azp` for multi-aud tokens (per OIDC core + §3.1.3.7 step 5). +- `at_hash` REQUIRED-when-access_token-present (Phase 3 tightening + of the spec MAY → MUST so a substituted access token cannot + ride alongside a clean ID token). +- Single-use state + nonce (32-byte random server-generated; + atomic `DELETE...RETURNING` on consume). +- PKCE-S256 mandatory; `plain` rejected. +- Configurable `iat` window (default 300s, capped 600s). +- JWKS cache with operator-triggered RefreshKeys + auto-refresh on + TTL expiry (default 3600s); JWKS-fetch failure during a key + rotation returns 503 to the in-flight login (existing sessions + untouched). + +OIDC `client_secret` is encrypted at rest via AES-256-GCM (v3 blob +format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using +the `CERTCTL_CONFIG_ENCRYPTION_KEY` passphrase. The encryption +invariant is pinned by an integration test +(`internal/repository/postgres/oidc_encryption_invariant_test.go`) +that asserts ciphertext != plaintext + correct blob shape + +round-trip recovery + wrong-passphrase fails. + +Per-IdP setup guides at +[`oidc-runbooks/index.md`](oidc-runbooks/index.md) cover Keycloak, +Authentik, Okta, Auth0, Entra ID, and Google Workspace. + +### Sessions + back-channel logout (Bundle 2 Phases 4-6) + +Successful OIDC login mints a session cookie: +`v1...`. +The HMAC input is **length-prefixed** as `len:sid:len:kid` to defeat +concatenation-collision attacks on bare-concat designs. Cookie +attributes: + +- `HttpOnly=true` (no JS access; defends XSS cookie theft). +- `Secure=true` (HTTPS-only; defends network MITM). +- `SameSite=Lax` default (configurable to Strict via + `CERTCTL_SESSION_SAMESITE`). +- `Path=/`, host-only. + +Idle timeout default 1h; absolute timeout default 8h; both +configurable via `CERTCTL_SESSION_IDLE_TIMEOUT` and +`CERTCTL_SESSION_ABSOLUTE_TIMEOUT`. The scheduler's +`sessionGCLoop` (default 1h interval) sweeps expired rows. + +CSRF defense: plaintext CSRF token in the JS-readable +`certctl_csrf` cookie (intentionally `HttpOnly=false` for the GUI +to echo into the `X-CSRF-Token` header); SHA-256 hash on the +session row; `subtle.ConstantTimeCompare` in `CSRFMiddleware`. +API-key actors are CSRF-exempt (no session row in context). + +Session signing keys rotate via `RotateSigningKey`; the old key +stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION` (default +24h) so existing cookies validate during rollover. Past retention, +the old key's row is dropped and any cookie still signed under it +returns `ErrSigningKeyNotFound`. `EnsureInitialSigningKey` is +fail-fatal at server boot. + +Back-channel logout per **OpenID Connect Back-Channel Logout 1.0** +(NOT RFC 8414): `POST /auth/oidc/back-channel-logout` accepts a +JWT-signed logout token from the IdP, validates the JWT against +the IdP's JWKS (same alg allow-list as login), pins required +claims (`iss` / `aud` / `iat` / `jti` / `events`; exactly one of +`sub` / `sid`; `nonce` MUST be absent), defeats replay via +`jti`-based deduplication, and revokes matching sessions. + +For threat-model coverage of these surfaces, see +[`auth-threat-model.md`](auth-threat-model.md). For the +operator-runnable performance baselines, see +[`auth-benchmarks.md`](auth-benchmarks.md). + +### OIDC first-admin bootstrap (Bundle 2 Phase 7) + +Coexists with Bundle 1's env-var-token bootstrap. When the +operator sets `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + (optionally) +`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`, the first user with one of +those IdP groups becomes admin on first login per tenant. +Subsequent users go through normal mapping. The admin-existence +probe ensures only one wins between the two bootstrap paths; +once any actor holds `r-admin`, the OIDC bootstrap hook silently +falls through to normal mapping. Audit row on every grant +(`bootstrap.oidc_first_admin`, `event_category=auth`). + +### Break-glass admin (Bundle 2 Phase 7.5) + +Default-OFF (`CERTCTL_BREAKGLASS_ENABLED=false`). When enabled, +the local-password admin path bypasses OIDC + group-claim layers; +intended ONLY for SSO-broken incidents. + +- Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte + salt, 32-byte output, per-password random salt, PHC-format + hash). Hash column is `json:"-"` so handlers cannot wire-leak. +- Lockout state machine: 5 failures (default; configurable via + `CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD`) within 1h reset window + (`_LOCKOUT_RESET_INTERVAL`) trips a 30s lockout (`_LOCKOUT_DURATION`). + Atomic single-statement IncrementFailure defeats concurrent + racing attempts. +- Constant-time across all failure paths via `verifyDummy()` — + wrong-password / locked-account / no-actor all take statistically + indistinguishable time. +- Surface invisibility: when disabled, ALL four endpoints return + HTTP 404 (NOT 403). Scanners cannot distinguish "endpoint + disabled" from "endpoint doesn't exist". +- WARN log at server boot when `ENABLED=true`; audit row on every + break-glass login (`auth.breakglass_login_*`, + `event_category=auth`); WebAuthn/FIDO2 second factor pairing + on the v3 roadmap (Decision 12). + +Operator should DISABLE break-glass within 24h of SSO recovery +to avoid a permanent backdoor; the runbook at +[`auth-threat-model.md#break-glass-risks-phase-75`](auth-threat-model.md) +documents the full state machine. + +### Migrating an existing deployment to OIDC + +A Bundle-1-merged deployment that wants to add OIDC follows the +step-by-step at +[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md): +configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP +per the relevant runbook, configure the certctl-side OIDCProvider ++ group→role mappings, verify the login flow against a single +test user, then announce the SSO endpoint to the rest of the +organization. + ## Per-user rate limiting Bundle B / M-025. Authenticated callers are bucketed by API-key name;