mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 12:41:30 +00:00

Files

T

shankar0123 c03d18bb1c auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes)

Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator-
facing docs updated, one new migration guide ships, README nav row
added.

Files
=====

docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10):
* Added 5 new Bundle 2 subsections under '## Authentication
  surface' after the Bundle 1 approval-bypass-closure entry:
  - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list,
    IdP-downgrade defense, iss/aud/azp/at_hash, single-use
    state+nonce, PKCE-S256 mandatory, JWKS rotation handling,
    encrypted client_secret at rest with the v3 blob format
    pinned by an integration test, pointer to oidc-runbooks/
    for per-IdP setup.
  - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' —
    length-prefixed HMAC cookie wire format, HttpOnly + Secure
    + SameSite cookie hardening, idle/absolute timeouts, CSRF
    defense, signing-key rotation primitive, fail-fatal
    EnsureInitialSigningKey at server boot, OpenID Connect
    Back-Channel Logout 1.0 (NOT RFC 8414).
  - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists
    with Bundle 1's env-var-token bootstrap, group-scoped via
    CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID,
    one-shot per tenant.
  - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF,
    surface invisibility via 404-not-403, Argon2id with OWASP
    2024 params, lockout state machine, constant-time-via-
    verifyDummy, WARN log at boot, runbook pointer for
    operator drill.
  - 'Migrating an existing deployment to OIDC' — pointer to
    the new migration/oidc-enable.md walkthrough.

docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10):
* Step-by-step migration guide for an operator on a Bundle-1-merged
  deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY,
  admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant)
  + 7 numbered steps (pin encryption key, complete IdP-side per
  runbook, configure certctl-side OIDCProvider, add group→role
  mappings with fail-closed warning, optional first-admin bootstrap,
  verify with single test user, announce SSO endpoint).
* Rollback section covering the 4-step disable flow + the 409
  Conflict on provider-delete-while-sessions-exist + the
  existing-sessions-keep-working-until-expiry semantics.
* Troubleshooting section pinning 8 most-common failure modes
  (discovery doc fetch fails / IdP downgrade defense rejects /
  no roles assigned / iss mismatch / pre-login expired / state
  mismatch / sessions revoked but user can hit API / JWKS
  rotation breaks login).
* Database row count drift documented so operators know what to
  expect after OIDC is live (10 Bundle 2 tables enumerated).
* Cross-references to oidc-runbooks/ + security.md +
  auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md.

CHANGELOG.md (MODIFIED):
* v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive'
  to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'.
* Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after
  Bundle 1 lands on master') with 18 new Bundle 2 entries:
  - OIDC + sessions + back-channel logout + break-glass overview.
  - OIDC token validation pinned at three layers (alg allow-list,
    IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification).
  - Length-prefixed HMAC session cookies.
  - CSRF double-submit + hashed-token-on-row.
  - OIDC client_secret AES-256-GCM v3 blob at rest +
    integration-test invariant.
  - OIDC first-admin bootstrap.
  - Default-OFF break-glass admin (Argon2id + lockout +
    constant-time + surface invisibility).
  - GUI: 4 new pages + login-page IdP buttons + sidebar logout.
  - 11 new MCP tools for OIDC + session management.
  - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 /
    Entra ID / Google Workspace).
  - Threat model extended with 5 new defense subsections + 8 new
    threat-catalogue subsections.
  - Performance baselines documented (4 benchmarks; 3 measured
    + 1 operator-runs).
  - Standards-and-RFC implementation table (13 RFCs + 14 CWEs;
    NOT a compliance-mapping doc).
  - Coverage gates held at floor 90 across all 4 Bundle 2
    packages (anti-Bundle-1-mistake invariant).
  - Multi-tenant query CI guard (ratchet baseline 32).
  - Phase 10 Keycloak testcontainers integration test + optional
    Okta smoke test.
  - OpenAPI cookieAuth security scheme + 13 new endpoints + 4
    break-glass endpoints.
  - Bundle-1-only compat regression CI guard +
    Bundle-1-to-2-upgrade regression CI guard.
* Final paragraph updated to point at oidc-enable.md alongside
  api-keys-to-rbac.md as the two migration walkthroughs.

docs/README.md (MODIFIED):
* Added the new oidc-enable.md migration row under '## Migration'
  alongside the existing api-keys-to-rbac.md entry, with a
  one-line description flagging it as the Bundle 2 OIDC
  onboarding walkthrough.

Verification
============

* Last-reviewed on security.md + oidc-enable.md: 2026-05-10.
* Internal-link sweep on oidc-enable.md: 0 broken (every relative
  link resolves via shell-loop verification).
* Internal-link sweep on docs/README.md: 0 broken (all .md
  references resolve).
* No Go-side impact, make verify gate unchanged.

Bundle 2 documentation deliverables now complete: security.md +
auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md +
auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md
+ CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator-
discoverable from docs/README.md root nav.

2026-05-10 17:07:27 +00:00

17 KiB

Raw Blame History

Changelog

v2.1.0 - Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions ⚠️

SECURITY: AUDIT YOUR API KEYS.

Bundle 1 ships role-based authorization. Every existing API key configured via CERTCTL_API_KEYS_NAMED (or the legacy CERTCTL_AUTH_SECRET) is mapped to the r-admin role on the first upgrade boot so existing automation keeps working unchanged. Most keys do NOT need full admin power; downgrade them before tagging the next release.

Recommended post-upgrade flow:
# 1. List every key with its current role:
certctl-cli auth keys list

# 2. Walk an interactive prompt that downgrades each key:
certctl-cli auth keys scope-down

# 3. Or get a heuristic suggestion based on 30 days of audit history:
certctl-cli auth keys scope-down --suggest
certctl-cli auth keys scope-down --suggest --apply   # applies the suggestion

# 4. Or drive scope-down from a JSON config (Helm post-upgrade hook):
certctl-cli auth keys scope-down --non-interactive ./scope-down.json
The synthetic actor-demo-anon actor (used when CERTCTL_AUTH_TYPE=none is configured) is system-managed and excluded from the prompt loop.

What else changed in v2.1.0:

RBAC primitive shipped. tenants, roles, permissions, role_permissions, actor_roles tables (migration 000029); 33-permission canonical catalogue; 7 default roles (admin, operator, viewer, agent, mcp, cli, auditor); per-handler permission gates via auth.RequirePermission middleware (replaces the legacy IsAdmin boolean check on the 5 admin-only handlers).
Day-0 admin bootstrap. Set CERTCTL_BOOTSTRAP_TOKEN on a fresh deploy and POST a single curl call against /api/v1/auth/bootstrap to mint the first admin API key; one-shot, never logged, and locks closed once any admin actor exists. Migration 000031 ships the api_keys table that stores the SHA-256 hash; the plaintext is shown in the response body once and never persisted.
Auditor role split. New auditor role holds only audit.read
- audit.export. Compliance reviewers can read the audit trail without holding mutation power. Migration 000032 adds audit_events.event_category so auditors can filter to authentication-related events specifically.
/v1/auth/check enrichment. Response now includes the actor's standing roles and effective permissions, so the GUI gates affordances from a single fetch on app boot.
Approval-bypass closure. Edits to a profile that has (or would have) RequiresApproval=true now route through the ApprovalService two-person integrity gate (Phase 9). Migration 000033 adds approval_kind + payload to issuance_approval_requests so cert-issuance and profile-edit approvals share the same workflow. Same-actor self-approve is rejected with ErrApproveBySameActor for both kinds. Closes the flip-flop loophole where an admin could disable approval, mutate, re-enable. Documented at docs/reference/profiles.md.
GUI: Roles / API Keys / Auth Settings / Approvals queue. Four new pages under /auth/* consume /v1/auth/me for permission-aware rendering. The Approvals queue blocks self-approve at the client layer (Approve/Reject buttons hidden when requested_by == current actor_id) on top of the server-side enforcement. AuditPage gains a category filter (cert_lifecycle / auth / config) for the auditor view.
MCP server gains 12 RBAC tools. Operators driving certctl from Claude / VS Code / any MCP client get parity with the GUI
- CLI. Each tool routes through the same HTTP handler; permission gates fire server-side.
OpenAPI catalogues every new route. Every Bundle 1 endpoint ships with an operationId; the parity test guards against drift.
Coverage gates. internal/auth/ and internal/service/auth/ now have ≥85% coverage floors in .github/coverage-thresholds.yml. The 12-path negative-test list from the Bundle 1 prompt is fully covered (path #12 deferred with in-tree TODO).
Protocol-endpoint allowlist pinned at three layers. The middleware bypass (auth.IsProtocolEndpoint), the router-level AuthExemptRouterRoutes constant, and a new phase12_protocol_allowlist_test.go AST scan all guard against accidentally wrapping ACME / SCEP / EST / OCSP / CRL routes in rbacGate.
Bundle 2: OIDC + sessions + back-channel logout + break-glass. Auth Bundle 2 ships in the same v2.1.0 release. Operators get OIDC SSO support for Keycloak / Authentik / Okta / Auth0 / Microsoft Entra ID / Google Workspace (via Keycloak broker), HMAC-signed session cookies with idle/absolute timeouts + CSRF defense, back-channel logout per OpenID Connect Back-Channel Logout 1.0, and a default-OFF break-glass admin path with Argon2id passwords for SSO-broken incidents. API-key auth keeps working unchanged alongside; existing automation needs no changes. Migration walkthrough at docs/migration/oidc-enable.md; per-IdP setup guides at docs/operator/oidc-runbooks/index.md.
OIDC token validation pinned at three layers. Algorithm allow-list (RS256/RS512/ES256/ES384/EdDSA only) with HS-family + none rejected at the service-layer sentinel; IdP-downgrade-attack defense at provider creation AND every JWKS RefreshKeys (intersects the IdP's advertised id_token_signing_alg_values_supported against the allow- list, rejects providers that advertise weak algs even before any token is signed); OIDC Core §3.1.3.7 re-verification of iss / aud / azp / at_hash (REQUIRED-when-access_token-present per Phase 3 tightening of the spec MAY → MUST) / exp / iat window / nonce constant-time-compare. PKCE-S256 mandatory; plain rejected. Single-use state + nonce via atomic DELETE...RETURNING on consume.
Session cookies use length-prefixed HMAC. The cookie wire format is v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)> with HMAC input len:sid:len:kid (NOT bare-concat) to defeat concatenation collisions. HttpOnly + Secure + SameSite=Lax default; SameSite=Strict configurable via CERTCTL_SESSION_SAMESITE. Idle timeout 1h / absolute 8h defaults; scheduler GC sweeps expired rows hourly. Signing keys rotate via the new RotateSigningKey primitive; the old key stays valid for CERTCTL_SESSION_SIGNING_KEY_RETENTION (default 24h) so existing cookies validate during rollover.
CSRF defense via double-submit-cookie + hashed-token-on-row. Plaintext CSRF token in the JS-readable certctl_csrf cookie (intentionally HttpOnly=false for the GUI to echo into the X-CSRF-Token header); SHA-256 hash on the session row; subtle.ConstantTimeCompare in the new CSRFMiddleware. API-key actors are CSRF-exempt (no session row in context).
OIDC client_secret encrypted at rest. AES-256-GCM v3 blob format (magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using the existing CERTCTL_CONFIG_ENCRYPTION_KEY. Encryption invariant pinned by an integration test asserting ciphertext != plaintext + v3 blob shape + round-trip recovery + wrong-passphrase fails.
OIDC first-admin bootstrap. New CERTCTL_BOOTSTRAP_ADMIN_GROUPS
- CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID env vars: the first OIDC-authenticated user with a matching group claim becomes admin per tenant. Coexists with the Bundle 1 env-var-token bootstrap; the admin-existence probe ensures only one wins. Audit row (bootstrap.oidc_first_admin) on every grant.
Break-glass admin (default-OFF). New CERTCTL_BREAKGLASS_ENABLED env var (default false). When enabled, the local Argon2id-password admin path bypasses OIDC + group-claim layers — intended ONLY for SSO-broken incidents. Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4); lockout after 5 failures (configurable); constant-time across all failure paths via verifyDummy; surface invisibility (HTTP 404 on every endpoint when disabled, NOT 403). WARN log at server boot when enabled. WebAuthn/FIDO2 second factor pairing on the v3 roadmap (Decision 12).
GUI: OIDC Providers + Group → Role Mappings + Sessions + login buttons. Four new pages under /auth/* consume the Bundle 2 API surface. Login page renders one "Sign in with X" button per configured OIDC provider (in addition to the API-key form, which remains as a fallback for Bearer-mode + break-glass paths). Sessions page exposes own-sessions + admin all-actors view. Every actionable element is permission-gated server-side via auth.oidc.* and auth.session.* perms; client-side hide is UX layer. Logout button in the sidebar fires POST /auth/logout to clear the session server-side before redirecting to login.
MCP server gains 11 OIDC + session tools. certctl_auth_list_oidc_providers, _get_oidc_provider, _create_oidc_provider, _update_oidc_provider, _delete_oidc_provider, _refresh_oidc_provider, _list_group_mappings, _add_group_mapping, _remove_group_mapping, _list_sessions, _revoke_session. Operator-facing MCP tool count goes 12 (Bundle 1 RBAC) → 23 across the auth surface. Total MCP tool count: grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go ≈ 150.
Per-IdP runbooks: 6 production-tier setup guides at docs/operator/oidc-runbooks/. Each runbook follows a consistent five-section layout (Prerequisites / IdP-side config / certctl-side config / Verification / Troubleshooting + Validation checklist with operator sign-off line). Keycloak is the canonical reference; Authentik / Okta / Auth0 / Entra ID / Google Workspace document the IdP-specific deltas (Auth0's namespaced custom claims; Entra ID's group OBJECT IDs; Google Workspace's missing-groups-claim limitation
- the recommended Keycloak broker pattern).
Threat model extended. docs/operator/auth-threat-model.md ships 5 new "Defenses Bundle 2 ships" subsections + 8 new threat- catalogue subsections (OIDC token forgery / session hijacking / IdP compromise / back-channel logout failure modes / group-claim manipulation / bootstrap risks / break-glass risks / token-leak hygiene). 6 new SQL-shaped operator-facing checks. New "Threats Bundle 2 does NOT close" section enumerating the 8 v3-backlog items (WebAuthn / JIT elevation / SAML / multi-tenant activation / HSM-FIPS / OIDC RP-initiated logout / Playwright / per-IdP external-tester sign-off).
Performance baselines documented. docs/operator/auth-benchmarks.md ships four benchmarks with measured baselines on a 4 vCPU / 8 GiB / Postgres 16 / Go 1.25 floor: BenchmarkSession_SteadyState p99 5 µs (target < 1 ms; 200× under), BenchmarkSession_ColdProcess p99 7.1 ms (target < 10 ms), BenchmarkOIDC_SteadyState p99 1.5 ms (target < 5 ms), BenchmarkOIDC_ColdCache operator-runs against live Keycloak via make benchmark-auth-coldcache.
Standards + RFC implementation table. docs/reference/auth-standards-implemented.md ships 13 RFC / standard rows + 14 CWE rows with concrete file paths
- negative-test anchors per row. NOT a compliance-mapping doc per the operator's 2026-05-05 retired-compliance-docs decision; the doc explicitly says "build the framework mapping yourself against the rows here using the framework-mapping methodology your audit firm prescribes; this project does not own that mapping."
Coverage gates held at floor 90 across all four Bundle 2 packages. internal/auth/oidc/ 93.7%, internal/auth/session/ 94.9%, internal/auth/breakglass/ 91.5%, internal/auth/user/domain/ 96.4%. NO held-low-with-rationale entry — the Phase 13 prompt's anti-Bundle-1-mistake rule held. Bundle 1's existing 85% floors for internal/auth/ + internal/service/auth/ stay 85 (already-shipped-and-accepted) per the prompt's explicit inheritance rule.
Multi-tenant query CI guard. New scripts/ci-guards/multi-tenant-query-coverage.sh (ratchet-style, baseline 32 at v2.1.0 close): greps every SELECT/UPDATE/DELETE in internal/repository/postgres/ against 10 tenant-aware tables, fails on regression OR improvement (forces the operator to lift / lower the baseline visibly). Forward-compat protection so a future Bundle 3 / managed-service multi-tenant activation can flip the switch without finding silent tenant-data-leak bugs in shipped queries.
Phase 10 Keycloak testcontainers integration test. New build-tag- gated suite at internal/auth/oidc/testfixtures/ + integration_keycloak_test.go drives the full OIDC flow against a live Keycloak container booted by testcontainers-go. 5-test matrix: discovery + JWKS load, full PKCE auth-code happy path with HTTP form scraping, logout-revokes- session, JWKS rotation, unmapped-groups-fails-closed. Reuses one container across the matrix to amortize the 60-90s boot. Optional Okta smoke test (build-tagged integration && okta_smoke) for live tenant validation. New Makefile targets: make keycloak-integration-test
- make okta-smoke-test + make benchmark-auth-coldcache.
OpenAPI surface extended. New cookieAuth security scheme (apiKey/cookie/certctl_session) alongside the existing bearerAuth. 13 new Bundle 2 endpoints across the OIDC + session
- group-mapping CRUD surface; 4 break-glass endpoints with surface-invisibility framing. The N-bundle-2-security-empty-preserved CI guard locks the security: [] opt-out count at ≥ 14 so existing public endpoints stay public.
Bundle-1-only compat regression CI guard. New scripts/ci-guards/bundle-1-compat-regression.sh asserts the load-bearing invariants that protect the Bundle-1-only-deploy case (session middleware defers-to-next, CSRF passthrough on missing session row, ChainAuthSessionThenBearer wired, public OIDC routes in AuthExempt allowlist, AuthInfo guards on OIDCProvidersResolver != nil). Sibling bundle-1-to-2-upgrade-regression.sh asserts the upgrade-path invariants (migrations 000034..000038 are CREATE TABLE IF NOT EXISTS
- BEGIN/COMMIT-wrapped + no DROP TABLE / ALTER...DROP COLUMN against 19 protected Bundle-1 tables + ON CONFLICT DO NOTHING on permission seed).

Migration ordering, idempotency, and downgrade are documented in docs/migration/api-keys-to-rbac.md (API-key → RBAC, Bundle 1) and docs/migration/oidc-enable.md (API-key → OIDC, Bundle 2). The threat model lives at docs/operator/auth-threat-model.md. Day-2 RBAC operations live at docs/operator/rbac.md. RFC + CWE evidence at docs/reference/auth-standards-implemented.md.

v2.0.68 - Image registry path changed ⚠️

Image registry path changed. Starting this release, container images publish to ghcr.io/certctl-io/certctl-server and ghcr.io/certctl-io/certctl-agent. Existing pulls from ghcr.io/shankar0123/certctl-{server,agent}:<tag> continue to work for previously-published tags (the registry never deletes images), but the :latest tag at the old path stops moving forward at this release. Update your docker pull paths, docker-compose.yml image: keys, or Helm image.repository values to receive future updates. Old git clone / git push / install-script / API URLs continue to redirect forever - only the container-registry path changed.

This is the only operator-action-required change in v2.0.68. Other changes in this release are cosmetic URL refreshes after the GitHub-org transfer from shankar0123/certctl to certctl-io/certctl (HTTP redirects mean no other operator action is required) plus an internal contextcheck lint fix in the agent. Full commit list is on the GitHub release page.

certctl no longer maintains a hand-edited per-version changelog. Per-release notes are auto-generated from commit messages between consecutive tags.

Where to find what changed in a given release:

GitHub Releases - every tag has an auto-generated "What's Changed" section pulled from the commits between that tag and the previous one, plus per-release supply-chain verification instructions (Cosign / SLSA / SBOM).
git log <prev-tag>..<this-tag> --oneline - same content, locally.

Why no hand-edited CHANGELOG.md:

certctl is solo-developed and pushes directly to master. Maintaining a hand-edited CHANGELOG meant the file drifted (entries piled into [unreleased] and never got promoted to per-version sections when tags were cut). A stale CHANGELOG is worse than no CHANGELOG - it signals abandoned maintenance to security-conscious operators doing diligence.

The auto-generated release notes work here because commit messages follow a descriptive convention: <area>: <summary> with a longer body for non-trivial changes (see git log v2.0.50..HEAD for the established pattern). Anyone reading the GitHub Releases page can see exactly what landed in each version without depending on the author to manually update a separate file.

For the historical record: earlier versions (pre-v2.2.0 and the [2.2.0] tag itself) had a hand-edited CHANGELOG. That content is preserved in git history at the v2.2.0 tag.

17 KiB Raw Blame History Unescape Escape

Changelog

v2.1.0 - Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions ⚠️

v2.0.68 - Image registry path changed ⚠️

17 KiB

Raw Blame History