mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 20:41:30 +00:00

Files

T

shankar0123 a581e2d222 auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes)

Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator-
facing docs updated, one new migration guide ships, README nav row
added.

Files
=====

docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10):
* Added 5 new Bundle 2 subsections under '## Authentication
  surface' after the Bundle 1 approval-bypass-closure entry:
  - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list,
    IdP-downgrade defense, iss/aud/azp/at_hash, single-use
    state+nonce, PKCE-S256 mandatory, JWKS rotation handling,
    encrypted client_secret at rest with the v3 blob format
    pinned by an integration test, pointer to oidc-runbooks/
    for per-IdP setup.
  - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' —
    length-prefixed HMAC cookie wire format, HttpOnly + Secure
    + SameSite cookie hardening, idle/absolute timeouts, CSRF
    defense, signing-key rotation primitive, fail-fatal
    EnsureInitialSigningKey at server boot, OpenID Connect
    Back-Channel Logout 1.0 (NOT RFC 8414).
  - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists
    with Bundle 1's env-var-token bootstrap, group-scoped via
    CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID,
    one-shot per tenant.
  - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF,
    surface invisibility via 404-not-403, Argon2id with OWASP
    2024 params, lockout state machine, constant-time-via-
    verifyDummy, WARN log at boot, runbook pointer for
    operator drill.
  - 'Migrating an existing deployment to OIDC' — pointer to
    the new migration/oidc-enable.md walkthrough.

docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10):
* Step-by-step migration guide for an operator on a Bundle-1-merged
  deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY,
  admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant)
  + 7 numbered steps (pin encryption key, complete IdP-side per
  runbook, configure certctl-side OIDCProvider, add group→role
  mappings with fail-closed warning, optional first-admin bootstrap,
  verify with single test user, announce SSO endpoint).
* Rollback section covering the 4-step disable flow + the 409
  Conflict on provider-delete-while-sessions-exist + the
  existing-sessions-keep-working-until-expiry semantics.
* Troubleshooting section pinning 8 most-common failure modes
  (discovery doc fetch fails / IdP downgrade defense rejects /
  no roles assigned / iss mismatch / pre-login expired / state
  mismatch / sessions revoked but user can hit API / JWKS
  rotation breaks login).
* Database row count drift documented so operators know what to
  expect after OIDC is live (10 Bundle 2 tables enumerated).
* Cross-references to oidc-runbooks/ + security.md +
  auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md.

CHANGELOG.md (MODIFIED):
* v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive'
  to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'.
* Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after
  Bundle 1 lands on master') with 18 new Bundle 2 entries:
  - OIDC + sessions + back-channel logout + break-glass overview.
  - OIDC token validation pinned at three layers (alg allow-list,
    IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification).
  - Length-prefixed HMAC session cookies.
  - CSRF double-submit + hashed-token-on-row.
  - OIDC client_secret AES-256-GCM v3 blob at rest +
    integration-test invariant.
  - OIDC first-admin bootstrap.
  - Default-OFF break-glass admin (Argon2id + lockout +
    constant-time + surface invisibility).
  - GUI: 4 new pages + login-page IdP buttons + sidebar logout.
  - 11 new MCP tools for OIDC + session management.
  - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 /
    Entra ID / Google Workspace).
  - Threat model extended with 5 new defense subsections + 8 new
    threat-catalogue subsections.
  - Performance baselines documented (4 benchmarks; 3 measured
    + 1 operator-runs).
  - Standards-and-RFC implementation table (13 RFCs + 14 CWEs;
    NOT a compliance-mapping doc).
  - Coverage gates held at floor 90 across all 4 Bundle 2
    packages (anti-Bundle-1-mistake invariant).
  - Multi-tenant query CI guard (ratchet baseline 32).
  - Phase 10 Keycloak testcontainers integration test + optional
    Okta smoke test.
  - OpenAPI cookieAuth security scheme + 13 new endpoints + 4
    break-glass endpoints.
  - Bundle-1-only compat regression CI guard +
    Bundle-1-to-2-upgrade regression CI guard.
* Final paragraph updated to point at oidc-enable.md alongside
  api-keys-to-rbac.md as the two migration walkthroughs.

docs/README.md (MODIFIED):
* Added the new oidc-enable.md migration row under '## Migration'
  alongside the existing api-keys-to-rbac.md entry, with a
  one-line description flagging it as the Bundle 2 OIDC
  onboarding walkthrough.

Verification
============

* Last-reviewed on security.md + oidc-enable.md: 2026-05-10.
* Internal-link sweep on oidc-enable.md: 0 broken (every relative
  link resolves via shell-loop verification).
* Internal-link sweep on docs/README.md: 0 broken (all .md
  references resolve).
* No Go-side impact, make verify gate unchanged.

Bundle 2 documentation deliverables now complete: security.md +
auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md +
auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md
+ CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator-
discoverable from docs/README.md root nav.

2026-05-10 17:07:27 +00:00

16 KiB

Raw Blame History

certctl Security Posture & Operator Guidance

Last reviewed: 2026-05-10

This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit finding it closes, the threat model, and the operator action required (if any).

OCSP responder availability

Audit reference: Bundle C / M-020. CWE-770 (uncontrolled resource consumption); RFC 6960 (OCSP); RFC 7633 (Must-Staple).

certctl ships an OCSP responder at /.well-known/pki/ocsp/{issuer_id}/{serial} that signs a fresh response per request. Pre-Bundle-C the unauth handler chain had no rate limit, so an attacker could DoS the responder and force fail-open relying parties to accept revoked certificates as valid. Bundle C adds the same per-key rate limiter to the unauth chain that the authenticated chain has used since Bundle B. Per-IP keying applies because OCSP traffic is unauthenticated.

The rate limiter alone does not solve the underlying revocation-bypass risk. The architectural fix is for issued certificates to carry the OCSP Must-Staple TLS Feature extension (RFC 7633, OID 1.3.6.1.5.5.7.1.24). When present, conforming TLS clients refuse to negotiate a session unless the server staples a fresh signed OCSP response in the TLS handshake. This shifts revocation enforcement from the client's discretion (which most fail-open by default) to a hard requirement that the connection cannot complete without proof of non-revocation.

Operator action

For certificates issued to systems where revocation correctness matters:

Configure the issuer profile to set must-staple: true. Out-of-the-box profiles in migrations/seed.sql do not set this; operators add it at profile-creation time via the API or by editing seed data.
Confirm the relying party honors the extension. OpenSSL ≥ 1.1.0, Firefox, and Chrome 84+ all enforce Must-Staple. Older clients silently ignore it.
Confirm the deployment target is configured for OCSP stapling so the server can actually deliver the stapled response in the handshake.

nginx: ssl_stapling on; ssl_stapling_verify on;
Apache: SSLUseStapling on
HAProxy: set ssl ocsp-response /path/to/response.der
Envoy: ocsp_staple_policy: must_staple

What this does NOT cover

CRL fallback. Must-Staple does not affect CRL behavior. Operators with CRL-based relying parties should use the rate-limit + caching defense alone; there is no client-side equivalent to Must-Staple for CRLs.
Self-issued certs in air-gapped networks. When the relying party cannot reach the OCSP responder at all (the threat model the audit cited), Must-Staple is the only mechanism that closes the bypass. CRL distribution similarly requires the relying party to fetch the CRL, which is also subject to the same network-availability concern.

Postgres transport encryption

See docs/database-tls.md. Bundle B / M-018.

Encryption at rest

Bundle B / M-001. PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password Storage Cheat Sheet floor) for the operator-supplied passphrase that derives the AES-256-GCM key for sensitive config columns. v3 blob format with a per-ciphertext random salt; v1/v2 read fallback for legacy rows. See internal/crypto/encryption.go and the accompanying tests for the format spec.

Authentication surface

Bundle B / M-002. Two layers decide auth-exempt status:

Router layer: internal/api/router/router.go::AuthExemptRouterRoutes

the endpoints registered via direct r.mux.Handle without going through the middleware chain (/health, /ready, /api/v1/auth/info, /api/v1/version, plus /api/v1/auth/bootstrap GET + POST per Bundle 1 Phase 6).

Dispatch layer: internal/api/router/router.go::AuthExemptDispatchPrefixes

URL-prefix routing in cmd/server/main.go::buildFinalHandler for /.well-known/pki/*, /.well-known/est/*, /.well-known/est-mtls, and /scep[/...]* (incl. /scep-mtls).

Both lists have AST-walking regression tests (auth_exempt_test.go) that fail CI if a new bypass lands without updating the documented constant.

RBAC primitive (Bundle 1)

Bundle 1 ships role-based authorization on top of API-key authentication. Every gated handler routes through the auth.RequirePermission middleware (or its router-level wrap rbacGate); the middleware resolves the actor's effective permissions via the service-layer Authorizer.CheckPermission and returns HTTP 403 BEFORE the handler body runs on miss. The seven default roles (admin / operator / viewer / agent / mcp / cli / auditor), 33-permission canonical catalogue, and the auditor split (r-auditor holds only audit.read + audit.export) are seeded by migration 000029.

For the operator how-to, see rbac.md. For the threat model + compliance mapping, see auth-threat-model.md. For the upgrade flow from a pre-Bundle-1 deployment, see docs/migration/api-keys-to-rbac.md.

Day-0 admin bootstrap (Bundle 1 Phase 6)

Fresh deployments where no admin actor exists yet can mint the first admin via POST /api/v1/auth/bootstrap - set CERTCTL_BOOTSTRAP_TOKEN, POST a single curl with the token, and the server returns the plaintext key value once. The token is constant-time-compared; the strategy is one-shot via mutex; the admin-existence probe re-closes the path once an admin lands. The token is NEVER logged. The minted plaintext key flows only into the HTTP response body. See rbac.md for the full flow.

Approval-bypass closure (Bundle 1 Phase 9)

CertificateProfile.RequiresApproval=true profiles route both issuance/renewal AND profile edits through the ApprovalService two-person integrity gate (Phase 9 closes the flip-flop loophole where an admin could disable approval, mutate, re-enable). Same-actor self-approve is rejected at the service layer with ErrApproveBySameActor. See docs/reference/profiles.md for the full gate semantics.

OIDC federation (Bundle 2 Phases 1-7)

Bundle 2 adds OIDC SSO on top of the API-key + RBAC foundation. Operators configure one or more identity providers (Keycloak, Authentik, Okta, Auth0, Entra ID, or Google Workspace via Keycloak broker); end users sign in at the IdP, certctl validates the returned ID token, and a session cookie is minted.

The token-validation pipeline pins:

Algorithm allow-list: RS256 / RS512 / ES256 / ES384 / EdDSA only. HS256 / HS384 / HS512 / none are rejected at the service-layer sentinel level.
IdP-downgrade-attack defense at provider creation AND every RefreshKeys: the IdP's advertised id_token_signing_alg_values_supported is intersected with the allow-list; a provider that advertises HS-family is rejected before any token is signed under the weak alg.
Exact iss match (ErrIssuerMismatch).
aud membership + azp for multi-aud tokens (per OIDC core §3.1.3.7 step 5).
at_hash REQUIRED-when-access_token-present (Phase 3 tightening of the spec MAY → MUST so a substituted access token cannot ride alongside a clean ID token).
Single-use state + nonce (32-byte random server-generated; atomic DELETE...RETURNING on consume).
PKCE-S256 mandatory; plain rejected.
Configurable iat window (default 300s, capped 600s).
JWKS cache with operator-triggered RefreshKeys + auto-refresh on TTL expiry (default 3600s); JWKS-fetch failure during a key rotation returns 503 to the in-flight login (existing sessions untouched).

OIDC client_secret is encrypted at rest via AES-256-GCM (v3 blob format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using the CERTCTL_CONFIG_ENCRYPTION_KEY passphrase. The encryption invariant is pinned by an integration test (internal/repository/postgres/oidc_encryption_invariant_test.go) that asserts ciphertext != plaintext + correct blob shape + round-trip recovery + wrong-passphrase fails.

Per-IdP setup guides at oidc-runbooks/index.md cover Keycloak, Authentik, Okta, Auth0, Entra ID, and Google Workspace.

Sessions + back-channel logout (Bundle 2 Phases 4-6)

Successful OIDC login mints a session cookie: v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>. The HMAC input is length-prefixed as len:sid:len:kid to defeat concatenation-collision attacks on bare-concat designs. Cookie attributes:

HttpOnly=true (no JS access; defends XSS cookie theft).
Secure=true (HTTPS-only; defends network MITM).
SameSite=Lax default (configurable to Strict via CERTCTL_SESSION_SAMESITE).
Path=/, host-only.

Idle timeout default 1h; absolute timeout default 8h; both configurable via CERTCTL_SESSION_IDLE_TIMEOUT and CERTCTL_SESSION_ABSOLUTE_TIMEOUT. The scheduler's sessionGCLoop (default 1h interval) sweeps expired rows.

CSRF defense: plaintext CSRF token in the JS-readable certctl_csrf cookie (intentionally HttpOnly=false for the GUI to echo into the X-CSRF-Token header); SHA-256 hash on the session row; subtle.ConstantTimeCompare in CSRFMiddleware. API-key actors are CSRF-exempt (no session row in context).

Session signing keys rotate via RotateSigningKey; the old key stays valid for CERTCTL_SESSION_SIGNING_KEY_RETENTION (default 24h) so existing cookies validate during rollover. Past retention, the old key's row is dropped and any cookie still signed under it returns ErrSigningKeyNotFound. EnsureInitialSigningKey is fail-fatal at server boot.

Back-channel logout per OpenID Connect Back-Channel Logout 1.0 (NOT RFC 8414): POST /auth/oidc/back-channel-logout accepts a JWT-signed logout token from the IdP, validates the JWT against the IdP's JWKS (same alg allow-list as login), pins required claims (iss / aud / iat / jti / events; exactly one of sub / sid; nonce MUST be absent), defeats replay via jti-based deduplication, and revokes matching sessions.

For threat-model coverage of these surfaces, see auth-threat-model.md. For the operator-runnable performance baselines, see auth-benchmarks.md.

OIDC first-admin bootstrap (Bundle 2 Phase 7)

Coexists with Bundle 1's env-var-token bootstrap. When the operator sets CERTCTL_BOOTSTRAP_ADMIN_GROUPS + (optionally) CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID, the first user with one of those IdP groups becomes admin on first login per tenant. Subsequent users go through normal mapping. The admin-existence probe ensures only one wins between the two bootstrap paths; once any actor holds r-admin, the OIDC bootstrap hook silently falls through to normal mapping. Audit row on every grant (bootstrap.oidc_first_admin, event_category=auth).

Break-glass admin (Bundle 2 Phase 7.5)

Default-OFF (CERTCTL_BREAKGLASS_ENABLED=false). When enabled, the local-password admin path bypasses OIDC + group-claim layers; intended ONLY for SSO-broken incidents.

Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte salt, 32-byte output, per-password random salt, PHC-format hash). Hash column is json:"-" so handlers cannot wire-leak.
Lockout state machine: 5 failures (default; configurable via CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD) within 1h reset window (_LOCKOUT_RESET_INTERVAL) trips a 30s lockout (_LOCKOUT_DURATION). Atomic single-statement IncrementFailure defeats concurrent racing attempts.
Constant-time across all failure paths via verifyDummy() — wrong-password / locked-account / no-actor all take statistically indistinguishable time.
Surface invisibility: when disabled, ALL four endpoints return HTTP 404 (NOT 403). Scanners cannot distinguish "endpoint disabled" from "endpoint doesn't exist".
WARN log at server boot when ENABLED=true; audit row on every break-glass login (auth.breakglass_login_*, event_category=auth); WebAuthn/FIDO2 second factor pairing on the v3 roadmap (Decision 12).

Operator should DISABLE break-glass within 24h of SSO recovery to avoid a permanent backdoor; the runbook at auth-threat-model.md#break-glass-risks-phase-75 documents the full state machine.

Migrating an existing deployment to OIDC

A Bundle-1-merged deployment that wants to add OIDC follows the step-by-step at docs/migration/oidc-enable.md: configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP per the relevant runbook, configure the certctl-side OIDCProvider

group→role mappings, verify the login flow against a single test user, then announce the SSO endpoint to the rest of the organization.

Per-user rate limiting

Bundle B / M-025. Authenticated callers are bucketed by API-key name; unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees) are bucketed by source IP. RPS and BurstSize are per-key budgets. PerUserRPS / PerUserBurstSize give authenticated clients a separate budget when set non-zero.

API key rotation

Audit reference: L-004. CWE-924 (improper enforcement of message integrity during transmission in a communication channel) - operator UX variant.

certctl's API keys are configured via the CERTCTL_API_KEYS_NAMED env var (format name1:key1,name2:key2:admin) and parsed at startup into an in-memory list. There is no DB-resident key store, no GUI, no /api/v1/keys endpoint - the env var IS the key inventory.

Pre-Bundle-G the env var rejected duplicate names, so rotating a key required: stop accepting OLDKEY → restart → roll NEWKEY out. Any client polling against OLDKEY during the restart window hit a 401.

Bundle G adds a double-key rotation window: two entries can share a name during the rollover, and both keys validate. Operators run the rotation as:

Generate the new key. openssl rand -hex 32 produces a 256-bit value with sufficient entropy.
Append the new entry to CERTCTL_API_KEYS_NAMED alongside the existing one:
```
CERTCTL_API_KEYS_NAMED="alice:OLDKEY:admin,alice:NEWKEY:admin"
```
Both entries MUST carry the same admin flag - startup fails loud if they don't (a non-admin shouldn't share an identity with an admin).

Restart certctl. A startup INFO log confirms the rotation window is active:

INFO api-key rotation window active name=alice entries=2 see=docs/security.md::api-key-rotation

Roll the new key out to all clients. Both keys validate during this phase. Audit-trail actor + per-user rate-limit bucket stay consistent across the rollover (both entries produce the same UserKey context value, the shared name).
Remove the old entry from CERTCTL_API_KEYS_NAMED:
```
CERTCTL_API_KEYS_NAMED="alice:NEWKEY:admin"
```
Restart certctl. OLDKEY now fails with 401. Rotation complete.

The rotation window has no operator-set timeout - it lasts for as long as both entries are in the env var. Best practice is a 24-72h window covering a full deploy cadence; if a client hasn't rolled to NEWKEY by the end of step 4, extend the window before step 5.

What the contract guarantees

Two entries with the same name: allowed if both have the same admin flag.
Two entries with the same name but mismatched admin: rejected at startup (privilege escalation guard).
Two entries with the same (name, key) pair: rejected at startup (typo guard - rotation requires DIFFERENT keys under the same name).
Single-entry steady state: unchanged from pre-Bundle-G behavior.

What the contract does NOT do

No automatic expiration of OLDKEY. The operator removes the entry in step 5; certctl doesn't track timestamps. A future enhancement could add a rotated_at annotation if operators ask for it.
No GUI / API for key management. Keys are env-var only by design; building a key-management surface is a separate feature project.
No revocation list. If a key leaks, the only path is to remove it from the env var and restart. That's appropriate for a small env-var inventory; it would not scale to a per-user-key-issued model.

Reporting a vulnerability

Email certctl@proton.me. Coordinated disclosure preferred; we will acknowledge within 72h.

16 KiB Raw Blame History