Files
certctl/docs/migration/oidc-enable.md
T
shankar0123 c03d18bb1c auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes)
Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator-
facing docs updated, one new migration guide ships, README nav row
added.

Files
=====

docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10):
* Added 5 new Bundle 2 subsections under '## Authentication
  surface' after the Bundle 1 approval-bypass-closure entry:
  - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list,
    IdP-downgrade defense, iss/aud/azp/at_hash, single-use
    state+nonce, PKCE-S256 mandatory, JWKS rotation handling,
    encrypted client_secret at rest with the v3 blob format
    pinned by an integration test, pointer to oidc-runbooks/
    for per-IdP setup.
  - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' —
    length-prefixed HMAC cookie wire format, HttpOnly + Secure
    + SameSite cookie hardening, idle/absolute timeouts, CSRF
    defense, signing-key rotation primitive, fail-fatal
    EnsureInitialSigningKey at server boot, OpenID Connect
    Back-Channel Logout 1.0 (NOT RFC 8414).
  - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists
    with Bundle 1's env-var-token bootstrap, group-scoped via
    CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID,
    one-shot per tenant.
  - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF,
    surface invisibility via 404-not-403, Argon2id with OWASP
    2024 params, lockout state machine, constant-time-via-
    verifyDummy, WARN log at boot, runbook pointer for
    operator drill.
  - 'Migrating an existing deployment to OIDC' — pointer to
    the new migration/oidc-enable.md walkthrough.

docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10):
* Step-by-step migration guide for an operator on a Bundle-1-merged
  deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY,
  admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant)
  + 7 numbered steps (pin encryption key, complete IdP-side per
  runbook, configure certctl-side OIDCProvider, add group→role
  mappings with fail-closed warning, optional first-admin bootstrap,
  verify with single test user, announce SSO endpoint).
* Rollback section covering the 4-step disable flow + the 409
  Conflict on provider-delete-while-sessions-exist + the
  existing-sessions-keep-working-until-expiry semantics.
* Troubleshooting section pinning 8 most-common failure modes
  (discovery doc fetch fails / IdP downgrade defense rejects /
  no roles assigned / iss mismatch / pre-login expired / state
  mismatch / sessions revoked but user can hit API / JWKS
  rotation breaks login).
* Database row count drift documented so operators know what to
  expect after OIDC is live (10 Bundle 2 tables enumerated).
* Cross-references to oidc-runbooks/ + security.md +
  auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md.

CHANGELOG.md (MODIFIED):
* v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive'
  to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'.
* Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after
  Bundle 1 lands on master') with 18 new Bundle 2 entries:
  - OIDC + sessions + back-channel logout + break-glass overview.
  - OIDC token validation pinned at three layers (alg allow-list,
    IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification).
  - Length-prefixed HMAC session cookies.
  - CSRF double-submit + hashed-token-on-row.
  - OIDC client_secret AES-256-GCM v3 blob at rest +
    integration-test invariant.
  - OIDC first-admin bootstrap.
  - Default-OFF break-glass admin (Argon2id + lockout +
    constant-time + surface invisibility).
  - GUI: 4 new pages + login-page IdP buttons + sidebar logout.
  - 11 new MCP tools for OIDC + session management.
  - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 /
    Entra ID / Google Workspace).
  - Threat model extended with 5 new defense subsections + 8 new
    threat-catalogue subsections.
  - Performance baselines documented (4 benchmarks; 3 measured
    + 1 operator-runs).
  - Standards-and-RFC implementation table (13 RFCs + 14 CWEs;
    NOT a compliance-mapping doc).
  - Coverage gates held at floor 90 across all 4 Bundle 2
    packages (anti-Bundle-1-mistake invariant).
  - Multi-tenant query CI guard (ratchet baseline 32).
  - Phase 10 Keycloak testcontainers integration test + optional
    Okta smoke test.
  - OpenAPI cookieAuth security scheme + 13 new endpoints + 4
    break-glass endpoints.
  - Bundle-1-only compat regression CI guard +
    Bundle-1-to-2-upgrade regression CI guard.
* Final paragraph updated to point at oidc-enable.md alongside
  api-keys-to-rbac.md as the two migration walkthroughs.

docs/README.md (MODIFIED):
* Added the new oidc-enable.md migration row under '## Migration'
  alongside the existing api-keys-to-rbac.md entry, with a
  one-line description flagging it as the Bundle 2 OIDC
  onboarding walkthrough.

Verification
============

* Last-reviewed on security.md + oidc-enable.md: 2026-05-10.
* Internal-link sweep on oidc-enable.md: 0 broken (every relative
  link resolves via shell-loop verification).
* Internal-link sweep on docs/README.md: 0 broken (all .md
  references resolve).
* No Go-side impact, make verify gate unchanged.

Bundle 2 documentation deliverables now complete: security.md +
auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md +
auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md
+ CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator-
discoverable from docs/README.md root nav.
2026-05-10 17:07:27 +00:00

15 KiB

Enable OIDC SSO on a Bundle-1-merged deployment

Last reviewed: 2026-05-10

This guide walks an operator already running certctl with Bundle 1 (RBAC primitive on top of API-key auth) through enabling OIDC SSO from Bundle 2. The path is additive: API-key auth keeps working unchanged; OIDC sits alongside as a second authentication surface for human users.

If you are upgrading from a pre-Bundle-1 deployment, finish api-keys-to-rbac.md first. If you have not deployed certctl at all, start with getting-started/quickstart.md. For the canonical mental model + per-flow threat coverage, see security.md and auth-threat-model.md.

What "enable OIDC" gives you

After this migration:

  • Human operators can log in via the OIDC button on the certctl login page (one button per configured IdP).
  • The IdP authenticates the user; certctl validates the returned ID token, mints a session cookie, and redirects to the dashboard.
  • IdP groups → certctl roles are operator-configured (e.g. engineering@example.comr-operator).
  • Every login emits an audit row (auth.oidc_login_succeeded) attributing the action to the federated user, NOT to a shared API key.
  • The first user from a configured admin group (when CERTCTL_BOOTSTRAP_ADMIN_GROUPS is set) becomes admin per tenant; one-shot per the admin-existence probe.

What does NOT change:

  • API keys keep working. Existing automation continues to authenticate via Authorization: Bearer exactly as before.
  • The break-glass admin path (Phase 7.5) stays default-OFF.
  • The auditor split + approval workflow + RBAC primitive are unchanged.

Pre-requisites

On certctl side:

  • Server build ≥ v2.1.0 (the post-Bundle-2 master). Confirm via curl https://<your-host>:8443/api/v1/version.
  • CERTCTL_CONFIG_ENCRYPTION_KEY set in the server environment. This is the passphrase that encrypts the OIDC client_secret at rest. Use a stable, secrets-manager-stored value at least 32 random bytes long. The server refuses to start if the key is missing AND any source='database' rows already exist (per Bundle B / M-001 / CWE-311 closure). Set this before doing anything else.
  • An admin actor available to drive the configuration. The actor needs the auth.oidc.create + auth.oidc.edit permissions; r-admin carries both by default. Get one via the day-0 bootstrap path if you don't have one yet.
  • HTTPS-only control plane (post-v2.2 milestone — this is the default). The OIDC redirect URI MUST be https://.

On IdP side:

  • A Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace tenant where you can register an OIDC application. Free dev tiers work for evaluation. See the per-IdP runbook at oidc-runbooks/index.md.
  • Network reachability from certctl-server to the IdP's /.well-known/openid-configuration discovery endpoint. The certctl service fetches discovery + JWKS at provider creation and at every RefreshKeys call.

Step-by-step

1. Pin CERTCTL_CONFIG_ENCRYPTION_KEY

If your deployment already has it set (the Bundle B M-001 fail-closed gate enforces this for any source='database' issuer/target row), skip this step. If you don't:

# Generate a 32-byte random key + base64-encode it.
openssl rand -base64 32 > /etc/certctl/config-encryption-key
chmod 600 /etc/certctl/config-encryption-key

Then make the server consume it at boot:

# In your environment, systemd unit, k8s Secret, etc.
export CERTCTL_CONFIG_ENCRYPTION_KEY="$(cat /etc/certctl/config-encryption-key)"

Restart the server. Confirm the boot log does NOT show the ErrEncryptionKeyRequired warning. If it does, the server refuses to start because there's pre-existing source='database' material that needs to be re-sealed; see the pre-Bundle-B migration notes for re-encryption flow.

2. Pick an IdP runbook + complete the IdP-side configuration

Pick the runbook for your IdP and do EVERYTHING in its IdP-side section. The runbooks are at docs/operator/oidc-runbooks/. What you need from the runbook before continuing here:

  • The IdP's discovery URL (the iss value certctl will validate against).
  • An OIDC client ID + client secret. Save the secret; you'll paste it into certctl in step 3.
  • At least one IdP group with the users who should be allowed to log in. The runbook walks the group-claim mapper config.
  • The IdP-side group claim shape — most IdPs emit string-array under a groups key, but Auth0 uses namespaced URL keys (https://your-namespace/groups) and Entra ID emits group OBJECT IDs (GUIDs) instead of names. The runbook calls out the per-IdP shape.

3. Configure the certctl-side OIDC provider

Via the GUI (recommended for first-time setup):

  1. Sign in as an admin actor.
  2. Navigate to Auth → OIDC Providers in the sidebar.
  3. Click Configure provider.
  4. Fill in the form using the values from step 2's runbook.
  5. Click Save.

If the discovery doc fetch fails, the modal surfaces the error inline. Most-common cause: a typo in the issuer URL.

Or via the CLI / MCP:

curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
  -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Keycloak",
    "issuer_url": "https://keycloak.example.com/realms/certctl",
    "client_id": "certctl",
    "client_secret": "<paste-the-secret>",
    "redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
    "groups_claim_path": "groups",
    "groups_claim_format": "string-array",
    "scopes": ["openid", "profile", "email"],
    "iat_window_seconds": 300,
    "jwks_cache_ttl_seconds": 3600
  }'

The MCP equivalent (certctl_auth_create_oidc_provider) accepts the same JSON shape.

4. Add the group → role mappings

Empty mapping list = nobody can log in via this provider (the fail-closed contract; pinned by ErrGroupsUnmapped). Add at least one mapping BEFORE announcing the SSO endpoint to users.

Via the GUI: Auth → OIDC Providers → → Group → role mappings → Add.

Via the API:

curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
  -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "provider_id": "<provider-id-from-step-3>",
    "group_name": "engineering@example.com",
    "role_id": "r-operator"
  }'

A typical setup adds two or three mappings: engineers → r-operator, viewers → r-viewer, optionally admins → r-admin. For Entra ID, use group object IDs (GUIDs) NOT names; for Auth0, use the bare group name from inside the namespaced claim array.

5. (Optional) Configure first-admin bootstrap

If your deployment has no admin actor yet AND you want the first OIDC-authenticated user from a specific group to become admin (instead of using the env-var-token bootstrap path), set:

export CERTCTL_BOOTSTRAP_ADMIN_GROUPS=admins
export CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID=<provider-id-from-step-3>

Restart the server. The first user with the admins group claim from that provider becomes admin on login per tenant. Subsequent logins go through normal group-role mapping. Audit row on every grant (bootstrap.oidc_first_admin).

If you already have an admin actor (likely — you needed one to run step 3), the bootstrap hook silently falls through to normal mapping; no harm done. The probe is one-shot per tenant and can't double-grant.

6. Verify with a single test user

Before announcing the SSO endpoint to your users, verify the full login flow with a test user from your IdP:

  1. Open https://<your-certctl-host>:8443/login in a fresh incognito window.

  2. The page should render Sign in with <provider> button(s) above the API-key form. If not, check that getAuthInfo is returning the oidc_providers field — curl https://<your-host>:8443/api/v1/auth/info should show the configured provider(s).

  3. Click the provider button. The browser redirects to the IdP, you authenticate, and the IdP redirects back. You should land on the certctl dashboard.

  4. Navigate to Auth → Sessions. You should see a row with your own actor ID and the current timestamp.

  5. Confirm the audit row:

    curl https://<your-host>:8443/api/v1/audit?category=auth \
      -H "Authorization: Bearer ${CERTCTL_API_KEY}" \
      | jq '.events[] | select(.action == "auth.oidc_login_succeeded")'
    

    You should see a row attributed to the federated user with details.provider_id matching your configuration.

If any step fails, see the Troubleshooting section below.

7. Announce the SSO endpoint

Once step 6 passes, the SSO endpoint is operational. Tell your users to log in via https://<your-host>:8443/login and click the provider button. API-key auth continues to work for automation; the two paths coexist.

Optional GUI hardening:

  • If you want the API-key form hidden once OIDC is configured, the operator can add a frontend feature flag in a follow-on commit. Default behavior keeps both paths visible (the API-key form stays for break-glass + Bearer-mode deploys).
  • If you want to revoke a user's session immediately (e.g. an employee left), use Auth → Sessions → All actors (admin) → → Revoke. The next request from that user's browser fails 401.

Rollback

If you need to disable OIDC:

  1. Delete every group-role mapping for the provider:
    # GUI: Auth → OIDC Providers → <provider> → Group → role mappings → Remove (each)
    
  2. Delete the OIDC provider:
    # GUI: Auth → OIDC Providers → <provider> → Delete (type-confirm-name dialog)
    
    The server returns HTTP 409 if any user has an authenticated session minted via this provider; revoke those sessions first.
  3. The Sign in with <provider> button disappears from the login page on the next getAuthInfo round-trip (typically the next page load).
  4. Existing sessions continue to work until idle/absolute expiry. To force-revoke them, Auth → Sessions → All actors (admin) → revoke each row.

API-key auth continues to work throughout this rollback; you do not need to re-bootstrap or change any other configuration.

Troubleshooting

"Discovery doc fetch failed" at provider creation. The most common cause is a typo in the issuer URL. Curl the URL manually:

curl -v https://<idp-host>/<path>/.well-known/openid-configuration

If that returns 404, fix the issuer URL.

"IdP downgrade-attack defense" rejected provider creation. Your IdP advertises HS256/HS384/HS512 or none in id_token_signing_alg_values_supported. Configure the IdP to advertise only RS256 / RS512 / ES256 / ES384 / EdDSA before re-creating the provider in certctl. The relevant runbook section walks this.

Login redirects to IdP, user authenticates, but the callback redirects back to /login with "no roles assigned". The user authenticated successfully but their groups didn't match any configured mapping (ErrGroupsUnmapped). Check:

  • The user is a member of the IdP group you mapped.
  • The group-claim mapper is configured correctly at the IdP (the runbook walks per-IdP).
  • The group name in your certctl mapping exactly matches what the IdP emits — case-sensitive, no leading slash for Keycloak full-path-OFF.

Decode the ID token at jwt.io against the IdP's JWKS to see exactly what's in the groups claim.

ErrIssuerMismatch even though the discovery doc looks correct. The iss claim in the ID token must match OIDCProvider.IssuerURL byte-for-byte. Some IdPs include / omit a trailing slash; check the per-IdP runbook section on iss formatting.

oidc: pre-login session not found or already consumed. The user clicked the OIDC login button, then the browser tab idled past the 10-minute pre-login TTL OR the user opened the IdP login in a new tab and consumed the row from the first one. Have them retry from the login page.

oidc: state parameter mismatch (replay or forgery). Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns ErrPreLoginNotFound. Have them retry from the login page.

Sessions revoked but the user can still hit the API. Check the Phase 4 session contract: the cookie is HMAC-validated on every request, but the actual database row is what Revoke deletes. If your reverse proxy is caching the response or the certctl_session cookie wasn't actually cleared on the client, the cookie hits the server's session middleware which returns 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.

JWKS rotation: an IdP rotated its signing key and existing users start failing login. Click Refresh discovery cache on the OIDC provider detail page (or POST /api/v1/auth/oidc/providers/<id>/refresh). The certctl service re-fetches discovery + JWKS. New tokens validate immediately. The Phase 10 integration test exercises this drill end to end.

Database row count drift. After OIDC is live, expect to see new rows under:

  • oidc_providers (one per configured provider)
  • group_role_mappings (one per configured mapping)
  • users (one per first OIDC-authenticated user; certctl auto-upserts on login)
  • sessions (one per logged-in browser session; idle 1h / absolute 8h GC)
  • session_signing_keys (one active + retained-history rows post rotation)
  • oidc_pre_login_sessions (transient; 10-minute TTL, scheduler-GC'd)

All ten of these tables are tenant-scoped (tenant_id column); single-tenant deployments use the seeded t-default tenant.

What you can do next

  • Run docs/operator/oidc-runbooks/<your-idp>.md end to end to fill in the validation checklist + sign-off line.
  • Read docs/operator/auth-benchmarks.md for the steady-state + cold-cache performance baselines.
  • Review the auth-threat-model.md Bundle 2 sections to understand the failure modes the OIDC + sessions surface defends against.
  • Schedule a rotation reminder for the OIDC client_secret (typically 6-12 months; the IdP doesn't auto-rotate it). Edit the provider via the GUI when the time comes; leaving client_secret blank in the edit form preserves the existing ciphertext, providing a value rotates.

Cross-references