Files
certctl/docs/operator/security.md
T
shankar0123 e7a94b6080 auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update)
Closes the last Phase before the Bundle 1 Exit gate. Operators
now have authoritative reference + threat model + migration guide
covering every behavior change Bundles 0-12 introduced.

# New docs

* docs/operator/rbac.md (340 lines) — operator how-to:
  - Mental model (actors / roles / permissions / scopes)
  - 7 default roles seeded by migration 000029 + the 5
    admin-only fine-grained perms seeded by 000030
  - Permission catalogue table by namespace
  - Scope semantics (global beats specific) + the Bundle-2
    deferral on scope_id FK enforcement
  - Granting / revoking access from GUI + CLI + HTTP API + MCP
  - The auditor pattern (audit-only, no resource read)
  - Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl →
    HTTP 410 thereafter)
  - Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production

* docs/operator/auth-threat-model.md (180 lines) — what the
  controls defend against:
  - 5 threat actors (external, wrong-role, compromised key,
    insider operator, compromised auditor)
  - Per-defense walk-through (API-key auth, RBAC, bootstrap,
    approval workflow + Phase 9 closure, audit trail,
    protocol-endpoint allowlist)
  - 9 explicit deferrals (OIDC, sessions, local accounts,
    JIT elevation, MFA, etc.) — Bundle 2 / future scope
  - Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b),
    NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10)
  - 5 operator-runnable sanity checks (e.g.,
    'SELECT FROM audit_events WHERE actor=system-bypass' MUST
    return 0 in production)

* docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x →
  v2.1.0 upgrade flow:
  - The SECURITY: AUDIT YOUR API KEYS callout
  - Migration list (000029-000033) + what each does
  - 4-mode scope-down flow (interactive / non-interactive
    JSON / --suggest / --suggest --apply)
  - What changes for code that called auth.IsAdmin
  - Helm-specific upgrade flow with example post-upgrade Job
  - Docker Compose upgrade flow + the 5 examples folders
    that ride demo mode unchanged
  - Verification queries + rollback flow

# Updated docs

* docs/operator/security.md — Last-reviewed bumped to
  2026-05-09; existing Authentication-surface section
  extended to call out the Bundle 1 RBAC primitive,
  day-0 bootstrap path, and approval-bypass closure with
  cross-references to the new docs.

* docs/reference/profiles.md — Last-reviewed header
  formatting fixed (added the > blockquote prefix used
  consistently across the docs tree).

# docs/README.md navigation

* Operator section gains 2 new rows (RBAC + auth-threat-model)
  and Approval-workflow row updated to mention Phase 9
  closure.
* Reference section gains the Profiles row.
* Migration section gains the api-keys-to-rbac row with the
  AUDIT YOUR API KEYS callout in the link description.

# CHANGELOG.md v2.1.0 section refreshed

The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS
callout. This commit appends the missing Phase 9-12 highlights:

  - Approval-bypass closure (profile-edit gate + flip-flop
    loophole + ErrApproveBySameActor invariant)
  - GUI: Roles / API Keys / Auth Settings / Approvals queue
  - 12 new MCP RBAC tools
  - Coverage gates on internal/auth + internal/service/auth
  - Protocol-endpoint allowlist pinned at 3 layers

Trailing cross-reference block now points at all 4 new docs.

# Verifications

* Every internal link in the 4 new/modified docs validated by
  shell sweep (find broken links → 0 hits).
* Every new doc carries 'Last reviewed: 2026-05-09' header
  with the > blockquote prefix matching the docs-tree
  convention.
* go vet ./... clean.
* staticcheck across every Bundle-1-touched Go package clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/auth (incl.
  bootstrap), internal/api/handler, internal/api/router,
  internal/cli, internal/service (incl. auth),
  internal/domain/auth, internal/mcp, cmd/cli (cmd/server
  has 1 environmental failure on the sandbox virtiofs-tmp:
  TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends
  on tmpfs file-mode semantics that virtiofs propagates
  differently — pre-existing, unrelated to Bundle 1).
* Frontend: 19 Vitest tests across src/pages/auth/ +
  AuditPage all pass; tsc --noEmit clean.
2026-05-10 00:10:15 +00:00

9.6 KiB

certctl Security Posture & Operator Guidance

Last reviewed: 2026-05-09

This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit finding it closes, the threat model, and the operator action required (if any).

OCSP responder availability

Audit reference: Bundle C / M-020. CWE-770 (uncontrolled resource consumption); RFC 6960 (OCSP); RFC 7633 (Must-Staple).

certctl ships an OCSP responder at /.well-known/pki/ocsp/{issuer_id}/{serial} that signs a fresh response per request. Pre-Bundle-C the unauth handler chain had no rate limit, so an attacker could DoS the responder and force fail-open relying parties to accept revoked certificates as valid. Bundle C adds the same per-key rate limiter to the unauth chain that the authenticated chain has used since Bundle B. Per-IP keying applies because OCSP traffic is unauthenticated.

The rate limiter alone does not solve the underlying revocation-bypass risk. The architectural fix is for issued certificates to carry the OCSP Must-Staple TLS Feature extension (RFC 7633, OID 1.3.6.1.5.5.7.1.24). When present, conforming TLS clients refuse to negotiate a session unless the server staples a fresh signed OCSP response in the TLS handshake. This shifts revocation enforcement from the client's discretion (which most fail-open by default) to a hard requirement that the connection cannot complete without proof of non-revocation.

Operator action

For certificates issued to systems where revocation correctness matters:

  1. Configure the issuer profile to set must-staple: true. Out-of-the-box profiles in migrations/seed.sql do not set this; operators add it at profile-creation time via the API or by editing seed data.
  2. Confirm the relying party honors the extension. OpenSSL ≥ 1.1.0, Firefox, and Chrome 84+ all enforce Must-Staple. Older clients silently ignore it.
  3. Confirm the deployment target is configured for OCSP stapling so the server can actually deliver the stapled response in the handshake.
    • nginx: ssl_stapling on; ssl_stapling_verify on;
    • Apache: SSLUseStapling on
    • HAProxy: set ssl ocsp-response /path/to/response.der
    • Envoy: ocsp_staple_policy: must_staple

What this does NOT cover

  • CRL fallback. Must-Staple does not affect CRL behavior. Operators with CRL-based relying parties should use the rate-limit + caching defense alone; there is no client-side equivalent to Must-Staple for CRLs.
  • Self-issued certs in air-gapped networks. When the relying party cannot reach the OCSP responder at all (the threat model the audit cited), Must-Staple is the only mechanism that closes the bypass. CRL distribution similarly requires the relying party to fetch the CRL, which is also subject to the same network-availability concern.

Postgres transport encryption

See docs/database-tls.md. Bundle B / M-018.

Encryption at rest

Bundle B / M-001. PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password Storage Cheat Sheet floor) for the operator-supplied passphrase that derives the AES-256-GCM key for sensitive config columns. v3 blob format with a per-ciphertext random salt; v1/v2 read fallback for legacy rows. See internal/crypto/encryption.go and the accompanying tests for the format spec.

Authentication surface

Bundle B / M-002. Two layers decide auth-exempt status:

  1. Router layer: internal/api/router/router.go::AuthExemptRouterRoutes — the endpoints registered via direct r.mux.Handle without going through the middleware chain (/health, /ready, /api/v1/auth/info, /api/v1/version, plus /api/v1/auth/bootstrap GET + POST per Bundle 1 Phase 6).
  2. Dispatch layer: internal/api/router/router.go::AuthExemptDispatchPrefixes — URL-prefix routing in cmd/server/main.go::buildFinalHandler for /.well-known/pki/*, /.well-known/est/*, /.well-known/est-mtls, and /scep[/...]* (incl. /scep-mtls).

Both lists have AST-walking regression tests (auth_exempt_test.go) that fail CI if a new bypass lands without updating the documented constant.

RBAC primitive (Bundle 1)

Bundle 1 ships role-based authorization on top of API-key authentication. Every gated handler routes through the auth.RequirePermission middleware (or its router-level wrap rbacGate); the middleware resolves the actor's effective permissions via the service-layer Authorizer.CheckPermission and returns HTTP 403 BEFORE the handler body runs on miss. The seven default roles (admin / operator / viewer / agent / mcp / cli / auditor), 33-permission canonical catalogue, and the auditor split (r-auditor holds only audit.read + audit.export) are seeded by migration 000029.

For the operator how-to, see rbac.md. For the threat model + compliance mapping, see auth-threat-model.md. For the upgrade flow from a pre-Bundle-1 deployment, see docs/migration/api-keys-to-rbac.md.

Day-0 admin bootstrap (Bundle 1 Phase 6)

Fresh deployments where no admin actor exists yet can mint the first admin via POST /api/v1/auth/bootstrap — set CERTCTL_BOOTSTRAP_TOKEN, POST a single curl with the token, and the server returns the plaintext key value once. The token is constant-time-compared; the strategy is one-shot via mutex; the admin-existence probe re-closes the path once an admin lands. The token is NEVER logged. The minted plaintext key flows only into the HTTP response body. See rbac.md for the full flow.

Approval-bypass closure (Bundle 1 Phase 9)

CertificateProfile.RequiresApproval=true profiles route both issuance/renewal AND profile edits through the ApprovalService two-person integrity gate (Phase 9 closes the flip-flop loophole where an admin could disable approval, mutate, re-enable). Same-actor self-approve is rejected at the service layer with ErrApproveBySameActor. See docs/reference/profiles.md for the full gate semantics.

Per-user rate limiting

Bundle B / M-025. Authenticated callers are bucketed by API-key name; unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees) are bucketed by source IP. RPS and BurstSize are per-key budgets. PerUserRPS / PerUserBurstSize give authenticated clients a separate budget when set non-zero.

API key rotation

Audit reference: L-004. CWE-924 (improper enforcement of message integrity during transmission in a communication channel) — operator UX variant.

certctl's API keys are configured via the CERTCTL_API_KEYS_NAMED env var (format name1:key1,name2:key2:admin) and parsed at startup into an in-memory list. There is no DB-resident key store, no GUI, no /api/v1/keys endpoint — the env var IS the key inventory.

Pre-Bundle-G the env var rejected duplicate names, so rotating a key required: stop accepting OLDKEY → restart → roll NEWKEY out. Any client polling against OLDKEY during the restart window hit a 401.

Bundle G adds a double-key rotation window: two entries can share a name during the rollover, and both keys validate. Operators run the rotation as:

  1. Generate the new key. openssl rand -hex 32 produces a 256-bit value with sufficient entropy.

  2. Append the new entry to CERTCTL_API_KEYS_NAMED alongside the existing one:

    CERTCTL_API_KEYS_NAMED="alice:OLDKEY:admin,alice:NEWKEY:admin"
    

    Both entries MUST carry the same admin flag — startup fails loud if they don't (a non-admin shouldn't share an identity with an admin).

  3. Restart certctl. A startup INFO log confirms the rotation window is active:

    INFO api-key rotation window active name=alice entries=2 see=docs/security.md::api-key-rotation
    
  4. Roll the new key out to all clients. Both keys validate during this phase. Audit-trail actor + per-user rate-limit bucket stay consistent across the rollover (both entries produce the same UserKey context value, the shared name).

  5. Remove the old entry from CERTCTL_API_KEYS_NAMED:

    CERTCTL_API_KEYS_NAMED="alice:NEWKEY:admin"
    
  6. Restart certctl. OLDKEY now fails with 401. Rotation complete.

The rotation window has no operator-set timeout — it lasts for as long as both entries are in the env var. Best practice is a 24-72h window covering a full deploy cadence; if a client hasn't rolled to NEWKEY by the end of step 4, extend the window before step 5.

What the contract guarantees

  • Two entries with the same name: allowed if both have the same admin flag.
  • Two entries with the same name but mismatched admin: rejected at startup (privilege escalation guard).
  • Two entries with the same (name, key) pair: rejected at startup (typo guard — rotation requires DIFFERENT keys under the same name).
  • Single-entry steady state: unchanged from pre-Bundle-G behavior.

What the contract does NOT do

  • No automatic expiration of OLDKEY. The operator removes the entry in step 5; certctl doesn't track timestamps. A future enhancement could add a rotated_at annotation if operators ask for it.
  • No GUI / API for key management. Keys are env-var only by design; building a key-management surface is a separate feature project.
  • No revocation list. If a key leaks, the only path is to remove it from the env var and restart. That's appropriate for a small env-var inventory; it would not scale to a per-user-key-issued model.

Reporting a vulnerability

Email certctl@proton.me. Coordinated disclosure preferred; we will acknowledge within 72h.