Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.
config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================
Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
- ErrAgentBootstrapTokenRequired (SEC-H1)
- ErrACMEInsecureWithoutAck (SEC-M4)
- ErrDemoModeAckExpired (SEC-H3)
SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.
SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.
SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.
Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.
Helm chart + HA runbook (DEPL-H1)
=================================
Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.
CI guard (DEPL-M2)
==================
scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.
Consolidated 'Security carve-outs' doc section
==============================================
docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
- SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
- SEC-M5: F5 connector InsecureSkipVerify per-config field
- SEC-M4: ACME insecure + new ACK gate
- SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
- SEC-L2: break-glass Argon2id rest-defense reminder
- SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
- DEPL-M2: change-me-* placeholder credentials in demo overlay
- DEPL-M3: K8s NetworkPolicy operator-opt-in default
Each entry cites the file:line, the rationale for the carve-out, and
the operator action.
CHANGELOG + ENVIRONMENTS coverage
==================================
CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.
deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.
WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.
Sandbox limitation
==================
The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.
CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
24 KiB
certctl Security Posture & Operator Guidance
Last reviewed: 2026-05-11
This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit finding it closes, the threat model, and the operator action required (if any).
OCSP responder availability
Audit reference: CWE-770 (uncontrolled resource consumption); RFC 6960 (OCSP); RFC 7633 (Must-Staple).
certctl ships an OCSP responder at /.well-known/pki/ocsp/{issuer_id}/{serial}
that signs a fresh response per request. The unauth handler chain
applies the same per-key rate limiter the authenticated chain uses;
per-IP keying applies because OCSP traffic is unauthenticated. Without
this defense an attacker could DoS the responder and force fail-open
relying parties to accept revoked certificates as valid.
The rate limiter alone does not solve the underlying revocation-bypass risk. The architectural fix is for issued certificates to carry the OCSP Must-Staple TLS Feature extension (RFC 7633, OID 1.3.6.1.5.5.7.1.24). When present, conforming TLS clients refuse to negotiate a session unless the server staples a fresh signed OCSP response in the TLS handshake. This shifts revocation enforcement from the client's discretion (which most fail-open by default) to a hard requirement that the connection cannot complete without proof of non-revocation.
Operator action
For certificates issued to systems where revocation correctness matters:
- Configure the issuer profile to set
must-staple: true. Out-of-the-box profiles inmigrations/seed.sqldo not set this; operators add it at profile-creation time via the API or by editing seed data. - Confirm the relying party honors the extension. OpenSSL ≥ 1.1.0, Firefox, and Chrome 84+ all enforce Must-Staple. Older clients silently ignore it.
- Confirm the deployment target is configured for OCSP stapling so the server can actually deliver the stapled response in the handshake.
- nginx:
ssl_stapling on; ssl_stapling_verify on; - Apache:
SSLUseStapling on - HAProxy:
set ssl ocsp-response /path/to/response.der - Envoy:
ocsp_staple_policy: must_staple
What this does NOT cover
- CRL fallback. Must-Staple does not affect CRL behavior. Operators with CRL-based relying parties should use the rate-limit + caching defense alone; there is no client-side equivalent to Must-Staple for CRLs.
- Self-issued certs in air-gapped networks. When the relying party cannot reach the OCSP responder at all (the threat model the audit cited), Must-Staple is the only mechanism that closes the bypass. CRL distribution similarly requires the relying party to fetch the CRL, which is also subject to the same network-availability concern.
Postgres transport encryption
See docs/database-tls.md.
Encryption at rest
PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password Storage Cheat Sheet floor) for the operator-supplied passphrase that derives the AES-256-GCM key for sensitive config columns. v3 blob format with a per-ciphertext random salt; v1/v2 read fallback for legacy rows. See internal/crypto/encryption.go and the accompanying tests for the format spec.
Authentication surface
Two layers decide auth-exempt status:
- Router layer:
internal/api/router/router.go::AuthExemptRouterRoutes
- the endpoints registered via direct
r.mux.Handlewithout going through the middleware chain (/health,/ready,/api/v1/auth/info,/api/v1/version, plus/api/v1/auth/bootstrapGET + POST for the first-admin path).
- Dispatch layer:
internal/api/router/router.go::AuthExemptDispatchPrefixes
- URL-prefix routing in
cmd/server/main.go::buildFinalHandlerfor/.well-known/pki/*,/.well-known/est/*,/.well-known/est-mtls, and/scep[/...]*(incl./scep-mtls).
Both lists have AST-walking regression tests (auth_exempt_test.go) that
fail CI if a new bypass lands without updating the documented constant.
Role-based authorization
Role-based authorization runs on top of API-key authentication. Every
gated handler routes through the auth.RequirePermission middleware
(or its router-level wrap rbacGate); the middleware resolves the
actor's effective permissions via the service-layer
Authorizer.CheckPermission and returns HTTP 403 BEFORE the handler
body runs on miss. The seven default roles (admin / operator /
viewer / agent / mcp / cli / auditor), 33-permission
canonical catalogue, and the auditor split (r-auditor holds only
audit.read + audit.export) are seeded by migration 000029.
For the operator how-to, see rbac.md. For the
threat model + compliance mapping, see
auth-threat-model.md. For the upgrade
flow from an API-key-only deployment, see
docs/migration/api-keys-to-rbac.md.
Day-0 admin bootstrap
Fresh deployments where no admin actor exists yet can mint the
first admin via POST /api/v1/auth/bootstrap - set
CERTCTL_BOOTSTRAP_TOKEN, POST a single curl with the token, and
the server returns the plaintext key value once. The token is
constant-time-compared; the strategy is one-shot via mutex; the
admin-existence probe re-closes the path once an admin lands.
The token is NEVER logged. The minted plaintext key flows only
into the HTTP response body. See
rbac.md for the
full flow.
Approval-bypass closure
CertificateProfile.RequiresApproval=true profiles route both
issuance/renewal AND profile edits through the
ApprovalService two-person integrity gate. The flip-flop loophole
(an admin disabling approval, mutating, re-enabling) is closed by
gating profile-edit through the same approval flow. Same-actor
self-approve is rejected at the service layer with
ErrApproveBySameActor. See
docs/reference/profiles.md for the
full gate semantics.
OIDC federation
OIDC SSO runs on top of the API-key + RBAC foundation. Operators configure one or more identity providers (Keycloak, Authentik, Okta, Auth0, Entra ID, or Google Workspace via Keycloak broker); end users sign in at the IdP, certctl validates the returned ID token, and a session cookie is minted.
The token-validation pipeline pins:
- Algorithm allow-list: RS256 / RS512 / ES256 / ES384 / EdDSA only.
HS256 / HS384 / HS512 /
noneare rejected at the service-layer sentinel level. - IdP-downgrade-attack defense at provider creation AND every
RefreshKeys: the IdP's advertised
id_token_signing_alg_values_supportedis intersected with the allow-list; a provider that advertises HS-family is rejected before any token is signed under the weak alg. - Exact
issmatch (ErrIssuerMismatch). audmembership +azpfor multi-aud tokens (per OIDC core §3.1.3.7 step 5).at_hashREQUIRED-when-access_token-present (a tightening of the spec MAY → MUST so a substituted access token cannot ride alongside a clean ID token).- Single-use state + nonce (32-byte random server-generated;
atomic
DELETE...RETURNINGon consume). - PKCE-S256 mandatory;
plainrejected. - Configurable
iatwindow (default 300s, capped 600s). - JWKS cache with operator-triggered RefreshKeys + auto-refresh on TTL expiry (default 3600s); JWKS-fetch failure during a key rotation returns 503 to the in-flight login (existing sessions untouched).
OIDC client_secret is encrypted at rest via AES-256-GCM (v3 blob
format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using
the CERTCTL_CONFIG_ENCRYPTION_KEY passphrase. The encryption
invariant is pinned by an integration test
(internal/repository/postgres/oidc_encryption_invariant_test.go)
that asserts ciphertext != plaintext + correct blob shape +
round-trip recovery + wrong-passphrase fails.
Per-IdP setup guides at
oidc-runbooks/index.md cover Keycloak,
Authentik, Okta, Auth0, Entra ID, and Google Workspace.
Sessions + back-channel logout
Successful OIDC login mints a session cookie:
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>.
The HMAC input is length-prefixed as len:sid:len:kid to defeat
concatenation-collision attacks on bare-concat designs. Cookie
attributes:
HttpOnly=true(no JS access; defends XSS cookie theft).Secure=true(HTTPS-only; defends network MITM).SameSite=Laxdefault (configurable to Strict viaCERTCTL_SESSION_SAMESITE).Path=/, host-only.
Idle timeout default 1h; absolute timeout default 8h; both
configurable via CERTCTL_SESSION_IDLE_TIMEOUT and
CERTCTL_SESSION_ABSOLUTE_TIMEOUT. The scheduler's
sessionGCLoop (default 1h interval) sweeps expired rows.
CSRF defense: plaintext CSRF token in the JS-readable
certctl_csrf cookie (intentionally HttpOnly=false for the GUI
to echo into the X-CSRF-Token header); SHA-256 hash on the
session row; subtle.ConstantTimeCompare in CSRFMiddleware.
API-key actors are CSRF-exempt (no session row in context).
Session signing keys rotate via RotateSigningKey; the old key
stays valid for CERTCTL_SESSION_SIGNING_KEY_RETENTION (default
24h) so existing cookies validate during rollover. Past retention,
the old key's row is dropped and any cookie still signed under it
returns ErrSigningKeyNotFound. EnsureInitialSigningKey is
fail-fatal at server boot.
Back-channel logout per OpenID Connect Back-Channel Logout 1.0
(NOT RFC 8414): POST /auth/oidc/back-channel-logout accepts a
JWT-signed logout token from the IdP, validates the JWT against
the IdP's JWKS (same alg allow-list as login), pins required
claims (iss / aud / iat / jti / events; exactly one of
sub / sid; nonce MUST be absent), defeats replay via
jti-based deduplication, and revokes matching sessions.
For threat-model coverage of these surfaces, see
auth-threat-model.md. For the
operator-runnable performance baselines, see
auth-benchmarks.md.
OIDC first-admin bootstrap
Coexists with the env-var-token bootstrap path. When the
operator sets CERTCTL_BOOTSTRAP_ADMIN_GROUPS + (optionally)
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID, the first user with one of
those IdP groups becomes admin on first login per tenant.
Subsequent users go through normal mapping. The admin-existence
probe ensures only one wins between the two bootstrap paths;
once any actor holds r-admin, the OIDC bootstrap hook silently
falls through to normal mapping. Audit row on every grant
(bootstrap.oidc_first_admin, event_category=auth).
Break-glass admin
Default-OFF (CERTCTL_BREAKGLASS_ENABLED=false). When enabled,
the local-password admin path bypasses OIDC + group-claim layers;
intended ONLY for SSO-broken incidents.
- Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte
salt, 32-byte output, per-password random salt, PHC-format
hash). Hash column is
json:"-"so handlers cannot wire-leak. - Lockout state machine: 5 failures (default; configurable via
CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD) within 1h reset window (_LOCKOUT_RESET_INTERVAL) trips a 30s lockout (_LOCKOUT_DURATION). Atomic single-statement IncrementFailure defeats concurrent racing attempts. - Constant-time across all failure paths via
verifyDummy()— wrong-password / locked-account / no-actor all take statistically indistinguishable time. - Surface invisibility: when disabled, ALL four endpoints return HTTP 404 (NOT 403). Scanners cannot distinguish "endpoint disabled" from "endpoint doesn't exist".
- WARN log at server boot when
ENABLED=true; audit row on every break-glass login (auth.breakglass_login_*,event_category=auth); WebAuthn/FIDO2 second factor pairing on the v3 roadmap (Decision 12).
Operator should DISABLE break-glass within 24h of SSO recovery
to avoid a permanent backdoor; the runbook at
auth-threat-model.md#break-glass-risks-phase-75
documents the full state machine.
Demo-to-production cutover (Audit 2026-05-11 A-8)
Migration 000029_rbac.up.sql unconditionally seeds an
actor-demo-anon → r-admin row into actor_roles. This row is the
runtime principal injected by the demo-mode middleware when
CERTCTL_AUTH_TYPE=none. Under any non-none auth type the row is
DORMANT — the middleware chain never resolves to it. But its existence
is a footgun: a future regression that resolves an unauthenticated
request to actor-demo-anon (a misrouted CORS preflight, a fallback in
a new auth-exempt route) would silently re-elevate to admin.
certctl-server detects this residue at startup and emits a WARN log +
an auth.demo_residual_grants_detected audit row listing every grant
present on actor-demo-anon. Every production deploy will see this
WARN on first boot — the migration baseline is part of the install,
not a side effect of running demo mode.
Operator workflow at production cutover:
-
Drain the WARN by calling the cleanup endpoint with an admin API key:
curl -X POST --cacert deploy/test/certs/ca.crt \ -H "Authorization: Bearer $ADMIN_KEY" \ https://certctl.example.com:8443/api/v1/auth/demo-residual/cleanup # → {"removed": 1}The endpoint is gated
auth.role.assign(admin-class) and refuses to run whenCERTCTL_AUTH_TYPE=none(HTTP 503 — the residue IS the active runtime state at that auth type). The cleanup is idempotent; a second call returns{"removed": 0}and still leaves an audit row.Equivalent SQL for operators preferring direct DB access:
DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon'; -
To make subsequent boots refuse startup if the row reappears (the most paranoid stance), set:
CERTCTL_DEMO_MODE_RESIDUAL_STRICT=trueWith the flag set, any
actor-demo-anonrow under a non-noneauth type causes certctl-server to log the WARN AND exit non-zero before binding the HTTPS listener. Default isfalse(WARN only). -
The CI guard
scripts/ci-guards/no-new-synthetic-admin.shpins the set of source files that may reference theactor-demo-anonliteral. New runtime code paths that resolve to the synthetic actor are rejected at PR time so the credibility gap stays closed.
Migrating an existing deployment to OIDC
An existing API-key-only deployment that wants to add OIDC follows
the step-by-step at
docs/migration/oidc-enable.md:
configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP
per the relevant runbook, configure the certctl-side OIDCProvider
- group→role mappings, verify the login flow against a single test user, then announce the SSO endpoint to the rest of the organization.
Per-user rate limiting
Authenticated callers are bucketed by API-key name;
unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees)
are bucketed by source IP. RPS and BurstSize are per-key budgets.
PerUserRPS / PerUserBurstSize give authenticated clients a separate
budget when set non-zero.
API key rotation
Audit reference: L-004. CWE-924 (improper enforcement of message integrity during transmission in a communication channel) - operator UX variant.
certctl's API keys are configured via the CERTCTL_API_KEYS_NAMED env var
(format name1:key1,name2:key2:admin) and parsed at startup into an
in-memory list. There is no DB-resident key store, no GUI, no /api/v1/keys
endpoint - the env var IS the key inventory.
The env var supports a double-key rotation window: two entries can share a name during the rollover, and both keys validate. Operators run the rotation as:
-
Generate the new key.
openssl rand -hex 32produces a 256-bit value with sufficient entropy. -
Append the new entry to
CERTCTL_API_KEYS_NAMEDalongside the existing one:CERTCTL_API_KEYS_NAMED="alice:OLDKEY:admin,alice:NEWKEY:admin"Both entries MUST carry the same admin flag - startup fails loud if they don't (a non-admin shouldn't share an identity with an admin).
-
Restart certctl. A startup INFO log confirms the rotation window is active:
INFO api-key rotation window active name=alice entries=2 see=docs/security.md::api-key-rotation -
Roll the new key out to all clients. Both keys validate during this phase. Audit-trail actor + per-user rate-limit bucket stay consistent across the rollover (both entries produce the same
UserKeycontext value, the shared name). -
Remove the old entry from
CERTCTL_API_KEYS_NAMED:CERTCTL_API_KEYS_NAMED="alice:NEWKEY:admin" -
Restart certctl. OLDKEY now fails with 401. Rotation complete.
The rotation window has no operator-set timeout - it lasts for as long as both entries are in the env var. Best practice is a 24-72h window covering a full deploy cadence; if a client hasn't rolled to NEWKEY by the end of step 4, extend the window before step 5.
What the contract guarantees
- Two entries with the same
name: allowed if both have the sameadminflag. - Two entries with the same
namebut mismatched admin: rejected at startup (privilege escalation guard). - Two entries with the same
(name, key)pair: rejected at startup (typo guard - rotation requires DIFFERENT keys under the same name). - Single-entry steady state: the simple legacy behaviour.
What the contract does NOT do
- No automatic expiration of OLDKEY. The operator removes the entry
in step 5; certctl doesn't track timestamps. A future enhancement
could add a
rotated_atannotation if operators ask for it. - No GUI / API for key management. Keys are env-var only by design; building a key-management surface is a separate feature project.
- No revocation list. If a key leaks, the only path is to remove it from the env var and restart. That's appropriate for a small env-var inventory; it would not scale to a per-user-key-issued model.
Security carve-outs & operator-tunable defaults
Phase 2 of the architecture diligence remediation (2026-05-13) consolidated the following carve-outs into one canonical section so operators reviewing security posture have a single search target. Each entry cites the exact file:line of the carve-out, why it exists, and what the operator should do.
TLS verification — dev escape hatches
certctl has three InsecureSkipVerify=true sites that are dev/probe
escape hatches, never enabled by default in production:
- Agent dev escape —
cmd/agent/main.go:179(wired fromcmd/agent/main.go:61config field +cmd/agent/main.go:1371CLI flag). Operators flip this only when debugging an agent against a self-signed control plane that hasn't been added to the agent's trust store. Document as--insecure-skip-verifyin the agent's install runbook; the agent logs a startup WARN any time the flag is set. SEC-M3 pins that the carve-out is intentional. - Agent verification probe —
cmd/agent/verify.go:78. The probe intentionally opens a TLS connection with verification disabled so it can inspect any certificate the endpoint serves (including self-signed or expired ones — that's the whole point of a probe). The probe never returns trust state to a security-relevant code path; it only reads cert metadata. SEC-M3 pins this. - tlsprobe (network scanner) —
internal/tlsprobe/probe.go:54. Same rationale as the agent verify probe — network discovery must introspect any certificate it finds, including the ones with the problems we're scanning for. SEC-M3 pins this.
F5 target connector — InsecureSkipVerify per-config
The F5 target connector exposes an Insecure: bool field on its
per-target config blob (default false). When set,
internal/connector/target/f5/f5.go:134 builds the HTTP client with
InsecureSkipVerify: config.Insecure. SEC-M5 closure: operator
opt-in for self-signed F5 BIG-IP device certs; mitigation is to run
the F5 + the proxy-agent on a network-segmented internal subnet.
Document in the F5 connector's per-target setup guide.
ACME issuer — CERTCTL_ACME_INSECURE (now gated on ACK)
internal/connector/issuer/acme/acme.go:201 builds the ACME HTTP
client with InsecureSkipVerify: true for the Pebble integration
test path. The per-issuer runtime setting comes from
CERTCTL_ACME_INSECURE (internal/config/config.go:2116); Phase 2
SEC-M4 closure (2026-05-13) added the fail-closed gate so the operator
must ALSO set CERTCTL_ACME_INSECURE_ACK=true for the server to boot.
Production deploys must never set either flag. The boot-time WARN log
at cmd/server/main.go:611 continues to fire for the ACK'd case so
every restart logs the reminder.
CSP 'unsafe-inline' on style-src
internal/api/middleware/securityheaders.go:58 ships the dashboard
CSP with style-src 'self' 'unsafe-inline'. This is required because
Tailwind compiles utility classes into a single stylesheet at build
time, but inline-style attributes appear in the dashboard via inline
<svg> elements + Recharts' <ResponsiveContainer> injecting inline
width/height. SEC-L1 closure: the carve-out is necessary today; the
planned tightening flow is the frontend audit's FE-H2 (icon library)
- decorative-SVG sweep that then unlocks the CSP hardening (drops
'unsafe-inline').
Break-glass admin — Argon2id rest-defense reminder
The break-glass admin path (docs/operator/runbooks/disaster-recovery.md)
hashes the operator-supplied password with Argon2id and stores the
hash in the breakglass_credentials table. SEC-L2 reminder: the
strength of the rest-defense is operator-supplied — pick a password
with sufficient entropy (≥ 64 random bits via openssl rand -base64 12) and rotate after every use. Argon2id resists offline cracking
but an operator-supplied "Password123" hashes the same way.
Body-size limit (1 MB default) — operator-tunable
The http.MaxBytesReader wrap caps inbound request bodies at 1 MB
by default. The cap is necessary defense against unbounded-body DOS
but catches legitimate operator workflows:
- Bulk truststore PEM bundle uploads (CA bundles for federated trust stores can be > 1 MB).
- Multi-MB CRL pushes via the CRL-cache endpoint.
- Bulk-import of certificates with embedded chains.
SEC-L3 closure: operators raise the cap via CERTCTL_MAX_BODY_SIZE
(units: bytes; e.g. CERTCTL_MAX_BODY_SIZE=10485760 for 10 MB).
Document in deploy/ENVIRONMENTS.md.
Demo Compose placeholder credentials
deploy/docker-compose.demo.yml ships CERTCTL_AUTH_SECRET=change-me-in-production,
CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key, and
CERTCTL_API_KEY=change-me-in-production as documented demo
defaults. The runtime Validate() fail-closed guards
(internal/config/config.go::Validate, Bundle 2 2026-05-12) refuse
to start if those literal strings reach a non-demo config. Phase 2
DEPL-M2 closure adds a CI guard
(scripts/ci-guards/no-change-me-in-prod-compose.sh) that fails the
build at PR time if a change-me-* literal leaks into a non-demo
compose file — catching the regression one layer before the runtime
guard fires.
Kubernetes NetworkPolicy — operator-opt-in
deploy/helm/certctl/templates/networkpolicy.yaml ships the template
but deploy/helm/certctl/values.yaml defaults networkPolicy.enabled: false. DEPL-M3 rationale: most Kubernetes clusters don't have a
NetworkPolicy controller installed (kind / minikube / fresh k3s); a
default-enabled NetworkPolicy renders fine but produces no
enforcement, and bare-metal kube-router-style controllers may
interpret a permissive default differently. Production deploys with a
real NetworkPolicy controller (Calico, Cilium, Antrea) flip the
values key to true and tune the policy in their values overlay.
Document the production-enable in
docs/operator/runbooks/ha.md (added Phase 2 DEPL-H1).
Reporting a vulnerability
Email certctl@proton.me. Coordinated disclosure preferred; we will
acknowledge within 72h.