Files
certctl/docs/operator/security.md
T
shankar0123 69a2b5c55a config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)
Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.

config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================

Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
  - ErrAgentBootstrapTokenRequired (SEC-H1)
  - ErrACMEInsecureWithoutAck      (SEC-M4)
  - ErrDemoModeAckExpired          (SEC-H3)

SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.

SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.

SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.

Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.

Helm chart + HA runbook (DEPL-H1)
=================================

Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.

CI guard (DEPL-M2)
==================

scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.

Consolidated 'Security carve-outs' doc section
==============================================

docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
  - SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
  - SEC-M5: F5 connector InsecureSkipVerify per-config field
  - SEC-M4: ACME insecure + new ACK gate
  - SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
  - SEC-L2: break-glass Argon2id rest-defense reminder
  - SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
  - DEPL-M2: change-me-* placeholder credentials in demo overlay
  - DEPL-M3: K8s NetworkPolicy operator-opt-in default

Each entry cites the file:line, the rationale for the carve-out, and
the operator action.

CHANGELOG + ENVIRONMENTS coverage
==================================

CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.

deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.

WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.

Sandbox limitation
==================

The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.

CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
2026-05-13 19:50:00 +00:00

528 lines
24 KiB
Markdown

# certctl Security Posture & Operator Guidance
> Last reviewed: 2026-05-11
This document collects the operator-facing security guidance that the source
code's per-finding comment blocks reference. Each section names the audit
finding it closes, the threat model, and the operator action required (if
any).
## OCSP responder availability
**Audit reference:** CWE-770 (uncontrolled resource consumption); RFC
6960 (OCSP); RFC 7633 (Must-Staple).
certctl ships an OCSP responder at `/.well-known/pki/ocsp/{issuer_id}/{serial}`
that signs a fresh response per request. The unauth handler chain
applies the same per-key rate limiter the authenticated chain uses;
per-IP keying applies because OCSP traffic is unauthenticated. Without
this defense an attacker could DoS the responder and force fail-open
relying parties to accept revoked certificates as valid.
The rate limiter alone does not solve the underlying revocation-bypass risk.
**The architectural fix is for issued certificates to carry the OCSP
Must-Staple TLS Feature extension** (RFC 7633, OID 1.3.6.1.5.5.7.1.24). When
present, conforming TLS clients refuse to negotiate a session unless the
server staples a fresh signed OCSP response in the TLS handshake. This shifts
revocation enforcement from the client's discretion (which most fail-open by
default) to a hard requirement that the connection cannot complete without
proof of non-revocation.
### Operator action
For certificates issued to systems where revocation correctness matters:
1. **Configure the issuer profile to set `must-staple: true`.** Out-of-the-box
profiles in `migrations/seed.sql` do not set this; operators add it at
profile-creation time via the API or by editing seed data.
2. **Confirm the relying party honors the extension.** OpenSSL ≥ 1.1.0,
Firefox, and Chrome 84+ all enforce Must-Staple. Older clients silently
ignore it.
3. **Confirm the deployment target is configured for OCSP stapling** so the
server can actually deliver the stapled response in the handshake.
- **nginx:** `ssl_stapling on; ssl_stapling_verify on;`
- **Apache:** `SSLUseStapling on`
- **HAProxy:** `set ssl ocsp-response /path/to/response.der`
- **Envoy:** `ocsp_staple_policy: must_staple`
### What this does NOT cover
- **CRL fallback.** Must-Staple does not affect CRL behavior. Operators with
CRL-based relying parties should use the rate-limit + caching defense
alone; there is no client-side equivalent to Must-Staple for CRLs.
- **Self-issued certs in air-gapped networks.** When the relying party
cannot reach the OCSP responder at all (the threat model the audit
cited), Must-Staple is the only mechanism that closes the bypass. CRL
distribution similarly requires the relying party to fetch the CRL,
which is also subject to the same network-availability concern.
## Postgres transport encryption
See [docs/database-tls.md](database-tls.md).
## Encryption at rest
PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password
Storage Cheat Sheet floor) for the operator-supplied passphrase that
derives the AES-256-GCM key for sensitive config columns. v3 blob format
with a per-ciphertext random salt; v1/v2 read fallback for legacy rows.
See [internal/crypto/encryption.go](../../internal/crypto/encryption.go) and
the accompanying tests for the format spec.
## Authentication surface
Two layers decide auth-exempt status:
1. **Router layer:** `internal/api/router/router.go::AuthExemptRouterRoutes`
- the endpoints registered via direct `r.mux.Handle` without going
through the middleware chain (`/health`, `/ready`, `/api/v1/auth/info`,
`/api/v1/version`, plus `/api/v1/auth/bootstrap` GET + POST for the
first-admin path).
2. **Dispatch layer:** `internal/api/router/router.go::AuthExemptDispatchPrefixes`
- URL-prefix routing in `cmd/server/main.go::buildFinalHandler` for
`/.well-known/pki/*`, `/.well-known/est/*`, `/.well-known/est-mtls`,
and `/scep[/...]*` (incl. `/scep-mtls`).
Both lists have AST-walking regression tests (`auth_exempt_test.go`) that
fail CI if a new bypass lands without updating the documented constant.
### Role-based authorization
Role-based authorization runs on top of API-key authentication. Every
gated handler routes through the `auth.RequirePermission` middleware
(or its router-level wrap `rbacGate`); the middleware resolves the
actor's effective permissions via the service-layer
`Authorizer.CheckPermission` and returns HTTP 403 BEFORE the handler
body runs on miss. The seven default roles (`admin` / `operator` /
`viewer` / `agent` / `mcp` / `cli` / `auditor`), 33-permission
canonical catalogue, and the auditor split (`r-auditor` holds only
`audit.read` + `audit.export`) are seeded by migration 000029.
For the operator how-to, see [`rbac.md`](rbac.md). For the
threat model + compliance mapping, see
[`auth-threat-model.md`](auth-threat-model.md). For the upgrade
flow from an API-key-only deployment, see
[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md).
### Day-0 admin bootstrap
Fresh deployments where no admin actor exists yet can mint the
first admin via `POST /api/v1/auth/bootstrap` - set
`CERTCTL_BOOTSTRAP_TOKEN`, POST a single curl with the token, and
the server returns the plaintext key value once. The token is
constant-time-compared; the strategy is one-shot via mutex; the
admin-existence probe re-closes the path once an admin lands.
The token is NEVER logged. The minted plaintext key flows only
into the HTTP response body. See
[`rbac.md`](rbac.md#day-0-bootstrap-first-admin-path) for the
full flow.
### Approval-bypass closure
`CertificateProfile.RequiresApproval=true` profiles route both
issuance/renewal AND profile edits through the
`ApprovalService` two-person integrity gate. The flip-flop loophole
(an admin disabling approval, mutating, re-enabling) is closed by
gating profile-edit through the same approval flow. Same-actor
self-approve is rejected at the service layer with
`ErrApproveBySameActor`. See
[`docs/reference/profiles.md`](../reference/profiles.md) for the
full gate semantics.
### OIDC federation
OIDC SSO runs on top of the API-key + RBAC foundation. Operators
configure one or more identity providers (Keycloak, Authentik, Okta,
Auth0, Entra ID, or Google Workspace via Keycloak broker); end users
sign in at the IdP, certctl validates the returned ID token, and a
session cookie is minted.
The token-validation pipeline pins:
- Algorithm allow-list: RS256 / RS512 / ES256 / ES384 / EdDSA only.
HS256 / HS384 / HS512 / `none` are rejected at the service-layer
sentinel level.
- IdP-downgrade-attack defense at provider creation AND every
RefreshKeys: the IdP's advertised
`id_token_signing_alg_values_supported` is intersected with the
allow-list; a provider that advertises HS-family is rejected
before any token is signed under the weak alg.
- Exact `iss` match (`ErrIssuerMismatch`).
- `aud` membership + `azp` for multi-aud tokens (per OIDC core
§3.1.3.7 step 5).
- `at_hash` REQUIRED-when-access_token-present (a tightening of the
spec MAY → MUST so a substituted access token cannot ride alongside
a clean ID token).
- Single-use state + nonce (32-byte random server-generated;
atomic `DELETE...RETURNING` on consume).
- PKCE-S256 mandatory; `plain` rejected.
- Configurable `iat` window (default 300s, capped 600s).
- JWKS cache with operator-triggered RefreshKeys + auto-refresh on
TTL expiry (default 3600s); JWKS-fetch failure during a key
rotation returns 503 to the in-flight login (existing sessions
untouched).
OIDC `client_secret` is encrypted at rest via AES-256-GCM (v3 blob
format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using
the `CERTCTL_CONFIG_ENCRYPTION_KEY` passphrase. The encryption
invariant is pinned by an integration test
(`internal/repository/postgres/oidc_encryption_invariant_test.go`)
that asserts ciphertext != plaintext + correct blob shape +
round-trip recovery + wrong-passphrase fails.
Per-IdP setup guides at
[`oidc-runbooks/index.md`](oidc-runbooks/index.md) cover Keycloak,
Authentik, Okta, Auth0, Entra ID, and Google Workspace.
### Sessions + back-channel logout
Successful OIDC login mints a session cookie:
`v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
The HMAC input is **length-prefixed** as `len:sid:len:kid` to defeat
concatenation-collision attacks on bare-concat designs. Cookie
attributes:
- `HttpOnly=true` (no JS access; defends XSS cookie theft).
- `Secure=true` (HTTPS-only; defends network MITM).
- `SameSite=Lax` default (configurable to Strict via
`CERTCTL_SESSION_SAMESITE`).
- `Path=/`, host-only.
Idle timeout default 1h; absolute timeout default 8h; both
configurable via `CERTCTL_SESSION_IDLE_TIMEOUT` and
`CERTCTL_SESSION_ABSOLUTE_TIMEOUT`. The scheduler's
`sessionGCLoop` (default 1h interval) sweeps expired rows.
CSRF defense: plaintext CSRF token in the JS-readable
`certctl_csrf` cookie (intentionally `HttpOnly=false` for the GUI
to echo into the `X-CSRF-Token` header); SHA-256 hash on the
session row; `subtle.ConstantTimeCompare` in `CSRFMiddleware`.
API-key actors are CSRF-exempt (no session row in context).
Session signing keys rotate via `RotateSigningKey`; the old key
stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION` (default
24h) so existing cookies validate during rollover. Past retention,
the old key's row is dropped and any cookie still signed under it
returns `ErrSigningKeyNotFound`. `EnsureInitialSigningKey` is
fail-fatal at server boot.
Back-channel logout per **OpenID Connect Back-Channel Logout 1.0**
(NOT RFC 8414): `POST /auth/oidc/back-channel-logout` accepts a
JWT-signed logout token from the IdP, validates the JWT against
the IdP's JWKS (same alg allow-list as login), pins required
claims (`iss` / `aud` / `iat` / `jti` / `events`; exactly one of
`sub` / `sid`; `nonce` MUST be absent), defeats replay via
`jti`-based deduplication, and revokes matching sessions.
For threat-model coverage of these surfaces, see
[`auth-threat-model.md`](auth-threat-model.md). For the
operator-runnable performance baselines, see
[`auth-benchmarks.md`](auth-benchmarks.md).
### OIDC first-admin bootstrap
Coexists with the env-var-token bootstrap path. When the
operator sets `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + (optionally)
`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`, the first user with one of
those IdP groups becomes admin on first login per tenant.
Subsequent users go through normal mapping. The admin-existence
probe ensures only one wins between the two bootstrap paths;
once any actor holds `r-admin`, the OIDC bootstrap hook silently
falls through to normal mapping. Audit row on every grant
(`bootstrap.oidc_first_admin`, `event_category=auth`).
### Break-glass admin
Default-OFF (`CERTCTL_BREAKGLASS_ENABLED=false`). When enabled,
the local-password admin path bypasses OIDC + group-claim layers;
intended ONLY for SSO-broken incidents.
- Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte
salt, 32-byte output, per-password random salt, PHC-format
hash). Hash column is `json:"-"` so handlers cannot wire-leak.
- Lockout state machine: 5 failures (default; configurable via
`CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD`) within 1h reset window
(`_LOCKOUT_RESET_INTERVAL`) trips a 30s lockout (`_LOCKOUT_DURATION`).
Atomic single-statement IncrementFailure defeats concurrent
racing attempts.
- Constant-time across all failure paths via `verifyDummy()`
wrong-password / locked-account / no-actor all take statistically
indistinguishable time.
- Surface invisibility: when disabled, ALL four endpoints return
HTTP 404 (NOT 403). Scanners cannot distinguish "endpoint
disabled" from "endpoint doesn't exist".
- WARN log at server boot when `ENABLED=true`; audit row on every
break-glass login (`auth.breakglass_login_*`,
`event_category=auth`); WebAuthn/FIDO2 second factor pairing
on the v3 roadmap (Decision 12).
Operator should DISABLE break-glass within 24h of SSO recovery
to avoid a permanent backdoor; the runbook at
[`auth-threat-model.md#break-glass-risks-phase-75`](auth-threat-model.md)
documents the full state machine.
### Demo-to-production cutover (Audit 2026-05-11 A-8)
Migration `000029_rbac.up.sql` unconditionally seeds an
`actor-demo-anon → r-admin` row into `actor_roles`. This row is the
runtime principal injected by the demo-mode middleware when
`CERTCTL_AUTH_TYPE=none`. Under any non-`none` auth type the row is
DORMANT — the middleware chain never resolves to it. But its existence
is a footgun: a future regression that resolves an unauthenticated
request to `actor-demo-anon` (a misrouted CORS preflight, a fallback in
a new auth-exempt route) would silently re-elevate to admin.
certctl-server detects this residue at startup and emits a WARN log +
an `auth.demo_residual_grants_detected` audit row listing every grant
present on `actor-demo-anon`. **Every production deploy will see this
WARN on first boot** — the migration baseline is part of the install,
not a side effect of running demo mode.
Operator workflow at production cutover:
1. Drain the WARN by calling the cleanup endpoint with an admin API key:
```bash
curl -X POST --cacert deploy/test/certs/ca.crt \
-H "Authorization: Bearer $ADMIN_KEY" \
https://certctl.example.com:8443/api/v1/auth/demo-residual/cleanup
# → {"removed": 1}
```
The endpoint is gated `auth.role.assign` (admin-class) and refuses
to run when `CERTCTL_AUTH_TYPE=none` (HTTP 503 — the residue IS the
active runtime state at that auth type). The cleanup is idempotent;
a second call returns `{"removed": 0}` and still leaves an audit row.
Equivalent SQL for operators preferring direct DB access:
```sql
DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon';
```
2. To make subsequent boots refuse startup if the row reappears (the
most paranoid stance), set:
```
CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true
```
With the flag set, any `actor-demo-anon` row under a non-`none`
auth type causes certctl-server to log the WARN AND exit non-zero
before binding the HTTPS listener. Default is `false` (WARN only).
3. The CI guard `scripts/ci-guards/no-new-synthetic-admin.sh` pins the
set of source files that may reference the `actor-demo-anon` literal.
New runtime code paths that resolve to the synthetic actor are
rejected at PR time so the credibility gap stays closed.
### Migrating an existing deployment to OIDC
An existing API-key-only deployment that wants to add OIDC follows
the step-by-step at
[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md):
configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP
per the relevant runbook, configure the certctl-side OIDCProvider
+ group→role mappings, verify the login flow against a single
test user, then announce the SSO endpoint to the rest of the
organization.
## Per-user rate limiting
Authenticated callers are bucketed by API-key name;
unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees)
are bucketed by source IP. `RPS` and `BurstSize` are per-key budgets.
`PerUserRPS` / `PerUserBurstSize` give authenticated clients a separate
budget when set non-zero.
## API key rotation
**Audit reference:** L-004. CWE-924 (improper enforcement of message integrity during transmission in a communication channel) - operator UX variant.
certctl's API keys are configured via the `CERTCTL_API_KEYS_NAMED` env var
(format `name1:key1,name2:key2:admin`) and parsed at startup into an
in-memory list. There is no DB-resident key store, no GUI, no `/api/v1/keys`
endpoint - the env var IS the key inventory.
The env var supports a **double-key rotation window**: two entries can share a
name during the rollover, and both keys validate. Operators run the
rotation as:
1. **Generate the new key.** `openssl rand -hex 32` produces a 256-bit
value with sufficient entropy.
2. **Append the new entry to `CERTCTL_API_KEYS_NAMED`** alongside the
existing one:
```
CERTCTL_API_KEYS_NAMED="alice:OLDKEY:admin,alice:NEWKEY:admin"
```
Both entries MUST carry the same admin flag - startup fails loud if
they don't (a non-admin shouldn't share an identity with an admin).
3. **Restart certctl.** A startup INFO log confirms the rotation window
is active:
```
INFO api-key rotation window active name=alice entries=2 see=docs/security.md::api-key-rotation
```
4. **Roll the new key out to all clients.** Both keys validate during
this phase. Audit-trail actor + per-user rate-limit bucket stay
consistent across the rollover (both entries produce the same
`UserKey` context value, the shared name).
5. **Remove the old entry** from `CERTCTL_API_KEYS_NAMED`:
```
CERTCTL_API_KEYS_NAMED="alice:NEWKEY:admin"
```
6. **Restart certctl.** OLDKEY now fails with 401. Rotation complete.
The rotation window has no operator-set timeout - it lasts for as long
as both entries are in the env var. Best practice is a 24-72h window
covering a full deploy cadence; if a client hasn't rolled to NEWKEY by
the end of step 4, extend the window before step 5.
### What the contract guarantees
- Two entries with the same `name`: **allowed** if both have the same
`admin` flag.
- Two entries with the same `name` but mismatched admin: **rejected at
startup** (privilege escalation guard).
- Two entries with the same `(name, key)` pair: **rejected at startup**
(typo guard - rotation requires DIFFERENT keys under the same name).
- Single-entry steady state: the simple legacy behaviour.
### What the contract does NOT do
- **No automatic expiration of OLDKEY.** The operator removes the entry
in step 5; certctl doesn't track timestamps. A future enhancement
could add a `rotated_at` annotation if operators ask for it.
- **No GUI / API for key management.** Keys are env-var only by design;
building a key-management surface is a separate feature project.
- **No revocation list.** If a key leaks, the only path is to remove it
from the env var and restart. That's appropriate for a small env-var
inventory; it would not scale to a per-user-key-issued model.
## Security carve-outs &amp; operator-tunable defaults
Phase 2 of the architecture diligence remediation (2026-05-13)
consolidated the following carve-outs into one canonical section so
operators reviewing security posture have a single search target. Each
entry cites the exact file:line of the carve-out, why it exists, and
what the operator should do.
### TLS verification — dev escape hatches
certctl has three `InsecureSkipVerify=true` sites that are dev/probe
escape hatches, never enabled by default in production:
- **Agent dev escape** — `cmd/agent/main.go:179` (wired from
`cmd/agent/main.go:61` config field + `cmd/agent/main.go:1371` CLI
flag). Operators flip this only when debugging an agent against a
self-signed control plane that hasn't been added to the agent's
trust store. Document as `--insecure-skip-verify` in the agent's
install runbook; the agent logs a startup WARN any time the flag
is set. SEC-M3 pins that the carve-out is intentional.
- **Agent verification probe** — `cmd/agent/verify.go:78`. The probe
intentionally opens a TLS connection with verification disabled so
it can inspect any certificate the endpoint serves (including
self-signed or expired ones — that's the whole point of a probe).
The probe never returns trust state to a security-relevant code
path; it only reads cert metadata. SEC-M3 pins this.
- **tlsprobe (network scanner)** — `internal/tlsprobe/probe.go:54`.
Same rationale as the agent verify probe — network discovery must
introspect any certificate it finds, including the ones with the
problems we're scanning for. SEC-M3 pins this.
### F5 target connector — `InsecureSkipVerify` per-config
The F5 target connector exposes an `Insecure: bool` field on its
per-target config blob (default `false`). When set,
`internal/connector/target/f5/f5.go:134` builds the HTTP client with
`InsecureSkipVerify: config.Insecure`. SEC-M5 closure: operator
opt-in for self-signed F5 BIG-IP device certs; mitigation is to run
the F5 + the proxy-agent on a network-segmented internal subnet.
Document in the F5 connector's per-target setup guide.
### ACME issuer — `CERTCTL_ACME_INSECURE` (now gated on ACK)
`internal/connector/issuer/acme/acme.go:201` builds the ACME HTTP
client with `InsecureSkipVerify: true` for the Pebble integration
test path. The per-issuer runtime setting comes from
`CERTCTL_ACME_INSECURE` (`internal/config/config.go:2116`); Phase 2
SEC-M4 closure (2026-05-13) added the fail-closed gate so the operator
must ALSO set `CERTCTL_ACME_INSECURE_ACK=true` for the server to boot.
Production deploys must never set either flag. The boot-time WARN log
at `cmd/server/main.go:611` continues to fire for the ACK'd case so
every restart logs the reminder.
### CSP `'unsafe-inline'` on `style-src`
`internal/api/middleware/securityheaders.go:58` ships the dashboard
CSP with `style-src 'self' 'unsafe-inline'`. This is required because
Tailwind compiles utility classes into a single stylesheet at build
time, but inline-style attributes appear in the dashboard via inline
`<svg>` elements + Recharts' `<ResponsiveContainer>` injecting inline
width/height. SEC-L1 closure: the carve-out is necessary today; the
planned tightening flow is the frontend audit's FE-H2 (icon library)
+ decorative-SVG sweep that then unlocks the CSP hardening (drops
`'unsafe-inline'`).
### Break-glass admin — Argon2id rest-defense reminder
The break-glass admin path (`docs/operator/runbooks/disaster-recovery.md`)
hashes the operator-supplied password with Argon2id and stores the
hash in the `breakglass_credentials` table. SEC-L2 reminder: the
strength of the rest-defense is operator-supplied — pick a password
with sufficient entropy (≥ 64 random bits via `openssl rand -base64
12`) and rotate after every use. Argon2id resists offline cracking
but an operator-supplied "Password123" hashes the same way.
### Body-size limit (1 MB default) — operator-tunable
The `http.MaxBytesReader` wrap caps inbound request bodies at 1 MB
by default. The cap is necessary defense against unbounded-body DOS
but catches legitimate operator workflows:
- Bulk truststore PEM bundle uploads (CA bundles for federated trust
stores can be > 1 MB).
- Multi-MB CRL pushes via the CRL-cache endpoint.
- Bulk-import of certificates with embedded chains.
SEC-L3 closure: operators raise the cap via `CERTCTL_MAX_BODY_SIZE`
(units: bytes; e.g. `CERTCTL_MAX_BODY_SIZE=10485760` for 10 MB).
Document in `deploy/ENVIRONMENTS.md`.
### Demo Compose placeholder credentials
`deploy/docker-compose.demo.yml` ships `CERTCTL_AUTH_SECRET=change-me-in-production`,
`CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key`, and
`CERTCTL_API_KEY=change-me-in-production` as documented demo
defaults. The runtime `Validate()` fail-closed guards
(`internal/config/config.go::Validate`, Bundle 2 2026-05-12) refuse
to start if those literal strings reach a non-demo config. Phase 2
DEPL-M2 closure adds a CI guard
(`scripts/ci-guards/no-change-me-in-prod-compose.sh`) that fails the
build at PR time if a `change-me-*` literal leaks into a non-demo
compose file — catching the regression one layer before the runtime
guard fires.
### Kubernetes NetworkPolicy — operator-opt-in
`deploy/helm/certctl/templates/networkpolicy.yaml` ships the template
but `deploy/helm/certctl/values.yaml` defaults `networkPolicy.enabled:
false`. DEPL-M3 rationale: most Kubernetes clusters don't have a
NetworkPolicy controller installed (kind / minikube / fresh k3s); a
default-enabled NetworkPolicy renders fine but produces no
enforcement, and bare-metal `kube-router`-style controllers may
interpret a permissive default differently. Production deploys with a
real NetworkPolicy controller (Calico, Cilium, Antrea) flip the
values key to `true` and tune the policy in their values overlay.
Document the production-enable in
`docs/operator/runbooks/ha.md` (added Phase 2 DEPL-H1).
## Reporting a vulnerability
Email `certctl@proton.me`. Coordinated disclosure preferred; we will
acknowledge within 72h.