mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:01:30 +00:00
config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)
Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.
config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================
Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
- ErrAgentBootstrapTokenRequired (SEC-H1)
- ErrACMEInsecureWithoutAck (SEC-M4)
- ErrDemoModeAckExpired (SEC-H3)
SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.
SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.
SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.
Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.
Helm chart + HA runbook (DEPL-H1)
=================================
Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.
CI guard (DEPL-M2)
==================
scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.
Consolidated 'Security carve-outs' doc section
==============================================
docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
- SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
- SEC-M5: F5 connector InsecureSkipVerify per-config field
- SEC-M4: ACME insecure + new ACK gate
- SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
- SEC-L2: break-glass Argon2id rest-defense reminder
- SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
- DEPL-M2: change-me-* placeholder credentials in demo overlay
- DEPL-M3: K8s NetworkPolicy operator-opt-in default
Each entry cites the file:line, the rationale for the carve-out, and
the operator action.
CHANGELOG + ENVIRONMENTS coverage
==================================
CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.
deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.
WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.
Sandbox limitation
==================
The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.
CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
This commit is contained in:
@@ -2,6 +2,50 @@
|
|||||||
|
|
||||||
## Unreleased
|
## Unreleased
|
||||||
|
|
||||||
|
### Breaking changes (scheduled for v2.2.0)
|
||||||
|
|
||||||
|
- **SEC-H1 staged: `CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY` opt-in flag.**
|
||||||
|
Phase 2 of the architecture diligence remediation (2026-05-13) introduces
|
||||||
|
a new env var that, when set to `true`, makes the server refuse to start
|
||||||
|
unless `CERTCTL_AGENT_BOOTSTRAP_TOKEN` is also set to a real value.
|
||||||
|
Default in this release: `false` (preserves the v2.1.x warn-mode
|
||||||
|
pass-through behavior for backward compatibility). Default flip to
|
||||||
|
`true` is scheduled for v2.2.0 per `WORKSPACE-ROADMAP.md`.
|
||||||
|
|
||||||
|
**Operator action before the v2.2.0 upgrade:** generate a real
|
||||||
|
bootstrap token (`openssl rand -base64 32`) and set
|
||||||
|
`CERTCTL_AGENT_BOOTSTRAP_TOKEN` in your env. When v2.2.0 ships, the
|
||||||
|
deny-empty default flips to `true` and a missing or empty token will
|
||||||
|
fail closed at boot. Operators with the token already set: no action
|
||||||
|
required.
|
||||||
|
|
||||||
|
- **SEC-M4: `CERTCTL_ACME_INSECURE` now requires explicit ACK.**
|
||||||
|
Pre-Phase-2, `CERTCTL_ACME_INSECURE=true` produced only a boot-time
|
||||||
|
WARN log. Post-Phase-2 (THIS release), the server refuses to start
|
||||||
|
unless `CERTCTL_ACME_INSECURE_ACK=true` is set alongside it. ACME
|
||||||
|
directory TLS verification is the load-bearing defense against a
|
||||||
|
network attacker intercepting ACME enrollment; the existing flag was
|
||||||
|
too easy to flip via a copy-pasted Pebble runbook.
|
||||||
|
|
||||||
|
**Operator action:** if you intentionally run against a self-signed
|
||||||
|
ACME server (Pebble, step-ca, internal dev), add
|
||||||
|
`CERTCTL_ACME_INSECURE_ACK=true` to your env. Production deploys
|
||||||
|
MUST never set either flag.
|
||||||
|
|
||||||
|
- **SEC-H3: `CERTCTL_DEMO_MODE_ACK` is no longer sticky — 24h re-ack required.**
|
||||||
|
Pre-Phase-2, setting `CERTCTL_DEMO_MODE_ACK=true` was sticky for the
|
||||||
|
lifetime of the container. Post-Phase-2, operators must ALSO set
|
||||||
|
`CERTCTL_DEMO_MODE_ACK_TS=$(date +%s)` to a unix epoch within the
|
||||||
|
last 24h. The next container restart past 24h refuses to start
|
||||||
|
unless a fresh TS is supplied. Catches the "forgotten demo deployment
|
||||||
|
promoted to production" failure mode.
|
||||||
|
|
||||||
|
**Operator action:** demo deploys must set `CERTCTL_DEMO_MODE_ACK_TS`
|
||||||
|
at every `docker compose up`. The demo Compose helper script handles
|
||||||
|
this automatically when wired; standalone demo deploys add it
|
||||||
|
manually. Production deploys: this guard is irrelevant
|
||||||
|
(`CERTCTL_DEMO_MODE_ACK` should not be set in production).
|
||||||
|
|
||||||
### Security
|
### Security
|
||||||
|
|
||||||
- **Alg-downgrade defense relaxed for Keycloak-shape IdPs (v2.1.0 pre-tag fix).**
|
- **Alg-downgrade defense relaxed for Keycloak-shape IdPs (v2.1.0 pre-tag fix).**
|
||||||
|
|||||||
@@ -417,6 +417,11 @@ Every `CERTCTL_*` environment variable is read by the server's `internal/config/
|
|||||||
| `CERTCTL_CORS_ORIGINS` | (empty) | Allowed CORS origins, comma-separated. Empty = deny all cross-origin |
|
| `CERTCTL_CORS_ORIGINS` | (empty) | Allowed CORS origins, comma-separated. Empty = deny all cross-origin |
|
||||||
| `CERTCTL_RATE_LIMIT_RPS` | `10` | Requests per second per client |
|
| `CERTCTL_RATE_LIMIT_RPS` | `10` | Requests per second per client |
|
||||||
| `CERTCTL_RATE_LIMIT_BURST` | `20` | Burst allowance above RPS |
|
| `CERTCTL_RATE_LIMIT_BURST` | `20` | Burst allowance above RPS |
|
||||||
|
| `CERTCTL_AGENT_BOOTSTRAP_TOKEN` | (empty) | Agent-registration bootstrap secret. Empty = v2.1.x warn-mode pass-through. Set to a real value (`openssl rand -base64 32`); the deny-empty flag's default flip in v2.2.0 will require it. |
|
||||||
|
| `CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY` | `false` | Phase 2 SEC-H1 staged flag. When `true`, the server refuses to start unless `CERTCTL_AGENT_BOOTSTRAP_TOKEN` is non-empty. Default flip to `true` scheduled for v2.2.0. |
|
||||||
|
| `CERTCTL_DEMO_MODE_ACK` | `false` | Acknowledges demo-mode synthetic admin posture (required when `CERTCTL_AUTH_TYPE=none` binds to a non-loopback host). Must be paired with `CERTCTL_DEMO_MODE_ACK_TS` per Phase 2 SEC-H3. |
|
||||||
|
| `CERTCTL_DEMO_MODE_ACK_TS` | (empty) | Phase 2 SEC-H3: unix-epoch timestamp at which DemoModeAck was last acknowledged. When `CERTCTL_DEMO_MODE_ACK=true`, this must parse as a unix epoch within the last 24h. Set via `CERTCTL_DEMO_MODE_ACK_TS=$(date +%s)` at every `docker compose up`. |
|
||||||
|
| `CERTCTL_ACME_INSECURE_ACK` | `false` | Phase 2 SEC-M4: explicit ACK required to boot with `CERTCTL_ACME_INSECURE=true`. Production deploys MUST never set either flag. |
|
||||||
|
|
||||||
### Agent
|
### Agent
|
||||||
|
|
||||||
|
|||||||
@@ -15,7 +15,10 @@ fullnameOverride: ""
|
|||||||
# Certctl Server Configuration
|
# Certctl Server Configuration
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
server:
|
server:
|
||||||
# Number of replicas (for HA deployments)
|
# Number of replicas (for HA deployments).
|
||||||
|
# Phase 2 DEPL-H1: production HA is operator-opt-in across this field
|
||||||
|
# + podDisruptionBudget.enabled + server.service.sessionAffinity.
|
||||||
|
# See docs/operator/runbooks/ha.md for the smallest-possible HA overlay.
|
||||||
replicas: 1
|
replicas: 1
|
||||||
|
|
||||||
# Image configuration
|
# Image configuration
|
||||||
@@ -561,7 +564,11 @@ kubernetesSecrets:
|
|||||||
enabled: false
|
enabled: false
|
||||||
|
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# Pod Disruption Budget (for HA deployments)
|
# Pod Disruption Budget (for HA deployments).
|
||||||
|
# Phase 2 DEPL-H1: defaults to enabled=false because a PDB template
|
||||||
|
# rendered at `replicas: 1` blocks every rolling restart on a
|
||||||
|
# single-node cluster. Production HA flips this to true alongside
|
||||||
|
# server.replicas ≥ 2. See docs/operator/runbooks/ha.md.
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
podDisruptionBudget:
|
podDisruptionBudget:
|
||||||
enabled: false
|
enabled: false
|
||||||
|
|||||||
@@ -0,0 +1,113 @@
|
|||||||
|
# High-Availability Deployment Runbook
|
||||||
|
|
||||||
|
> Last reviewed: 2026-05-13
|
||||||
|
|
||||||
|
<!-- Phase 2 DEPL-H1 closure -->
|
||||||
|
|
||||||
|
|
||||||
|
certctl's Helm chart ships with conservative single-replica defaults
|
||||||
|
that produce a working `helm install` against any Kubernetes cluster.
|
||||||
|
Production HA is operator-opt-in across three values surfaces — none
|
||||||
|
of which the chart flips on your behalf.
|
||||||
|
|
||||||
|
This runbook documents the three changes, why they default off, and
|
||||||
|
the smallest-possible HA values overlay.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why HA is opt-in (not default)
|
||||||
|
|
||||||
|
Three load-bearing reasons the chart defaults are `replicas: 1` and
|
||||||
|
`podDisruptionBudget.enabled: false`:
|
||||||
|
|
||||||
|
1. **A 1-replica deployment works on every cluster.** A multi-replica
|
||||||
|
default with `minAvailable: 2` would render a PDB at install time;
|
||||||
|
if the cluster has fewer than 2 nodes available (single-node
|
||||||
|
`kind` / `minikube` / fresh `k3s` clusters), Helm renders fine but
|
||||||
|
the first `kubectl rollout` blocks indefinitely waiting for the
|
||||||
|
second replica that can never schedule. Defaulting off keeps the
|
||||||
|
demo path one-command.
|
||||||
|
|
||||||
|
2. **Postgres is a singleton in the bundled chart.** The chart's
|
||||||
|
`postgres-statefulset.yaml` runs ONE Postgres pod. Scaling the
|
||||||
|
server tier past 1 replica without an externalized Postgres + a
|
||||||
|
pgbouncer-style proxy doesn't actually buy HA at the DB tier — the
|
||||||
|
single Postgres pod is the failure domain. Operators who want true
|
||||||
|
HA route Postgres to a managed service (RDS, Cloud SQL, AlloyDB,
|
||||||
|
AKS-managed-Postgres, Aiven) or run their own cluster (Patroni,
|
||||||
|
CloudNativePG, Zalando postgres-operator). See the
|
||||||
|
[external-Postgres values example](../../deploy/helm/examples/values-external-db.yaml).
|
||||||
|
|
||||||
|
3. **Session affinity is HTTPS-only.** The control plane is HTTPS-only
|
||||||
|
(TLS 1.3 pinned). Adding `sessionAffinity: ClientIP` to the
|
||||||
|
server Service mid-deployment when a sticky front-end LB is in
|
||||||
|
play (NGINX Ingress, Cloud LB with backend service) is the right
|
||||||
|
default for OIDC + RBAC session cookies. But operators who terminate
|
||||||
|
TLS at a different layer (Envoy mesh, Cloudflare in front of the
|
||||||
|
cluster) may have already solved affinity upstream — flipping it
|
||||||
|
on by default would over-constrain those paths.
|
||||||
|
|
||||||
|
## The smallest production-HA overlay
|
||||||
|
|
||||||
|
Three Helm values to flip:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# values-ha.yaml — copy into your overlay and edit to taste.
|
||||||
|
|
||||||
|
server:
|
||||||
|
# ≥ 2 replicas is the minimum for the PDB to render. 3 gives you
|
||||||
|
# a true rolling-restart tolerance window (1 down for upgrade,
|
||||||
|
# 2 still serving) without dropping below minAvailable.
|
||||||
|
replicas: 3
|
||||||
|
|
||||||
|
service:
|
||||||
|
# Required when the front-end LB doesn't already enforce
|
||||||
|
# session affinity. OIDC + RBAC session cookies need to land
|
||||||
|
# on the same backend pod for the session lifetime.
|
||||||
|
sessionAffinity: ClientIP
|
||||||
|
|
||||||
|
podDisruptionBudget:
|
||||||
|
# Renders the PDB template; controller-side voluntary disruptions
|
||||||
|
# (node-drain for k8s upgrade, cluster-autoscaler scale-down)
|
||||||
|
# respect this floor.
|
||||||
|
enabled: true
|
||||||
|
# With server.replicas: 3, minAvailable: 2 leaves headroom for one
|
||||||
|
# rolling restart at a time.
|
||||||
|
minAvailable: 2
|
||||||
|
# maxUnavailable is mutually exclusive with minAvailable; pick one.
|
||||||
|
# maxUnavailable: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
helm upgrade certctl deploy/helm/certctl/ -f values-ha.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## What you still own as the operator
|
||||||
|
|
||||||
|
Three things the chart does not solve, even at `replicas: 3`:
|
||||||
|
|
||||||
|
1. **Postgres HA.** Route to an externalized Postgres (managed cloud
|
||||||
|
or operator-managed cluster). The chart's bundled StatefulSet
|
||||||
|
pod is a development/single-AZ pattern, not a production HA path.
|
||||||
|
2. **TLS material lifecycle.** The chart accepts an `existingSecret`
|
||||||
|
for the server cert; rotating it is operator-side automation.
|
||||||
|
The dashboard + agent can issue their own certs via the local CA
|
||||||
|
(eat-your-own-dogfood); the operator can wire `cert-manager` if
|
||||||
|
they prefer that path.
|
||||||
|
3. **Backup CronJob.** Phase 4 of the architecture diligence
|
||||||
|
remediation plan (DEPL-H2) ships a `backup-cronjob.yaml` template;
|
||||||
|
until that lands, backups are operator-run per the existing
|
||||||
|
`docs/operator/runbooks/postgres-backup.md` runbook.
|
||||||
|
|
||||||
|
## Cross-references
|
||||||
|
|
||||||
|
- `deploy/helm/certctl/values.yaml` lines 19, 446, 566 — the three
|
||||||
|
defaults this runbook documents.
|
||||||
|
- `docs/operator/runbooks/postgres-backup.md` — Postgres backup
|
||||||
|
runbook (today, operator-run).
|
||||||
|
- `docs/operator/runbooks/disaster-recovery.md` — DR procedure.
|
||||||
|
- Phase 4 (Helm Chart, DR, And Ops Surface) of the architecture
|
||||||
|
diligence remediation plan tracks the chart-level work
|
||||||
|
(backup CronJob, PrometheusRule starter, migration hook, etc.).
|
||||||
@@ -403,6 +403,124 @@ the end of step 4, extend the window before step 5.
|
|||||||
from the env var and restart. That's appropriate for a small env-var
|
from the env var and restart. That's appropriate for a small env-var
|
||||||
inventory; it would not scale to a per-user-key-issued model.
|
inventory; it would not scale to a per-user-key-issued model.
|
||||||
|
|
||||||
|
## Security carve-outs & operator-tunable defaults
|
||||||
|
|
||||||
|
Phase 2 of the architecture diligence remediation (2026-05-13)
|
||||||
|
consolidated the following carve-outs into one canonical section so
|
||||||
|
operators reviewing security posture have a single search target. Each
|
||||||
|
entry cites the exact file:line of the carve-out, why it exists, and
|
||||||
|
what the operator should do.
|
||||||
|
|
||||||
|
### TLS verification — dev escape hatches
|
||||||
|
|
||||||
|
certctl has three `InsecureSkipVerify=true` sites that are dev/probe
|
||||||
|
escape hatches, never enabled by default in production:
|
||||||
|
|
||||||
|
- **Agent dev escape** — `cmd/agent/main.go:179` (wired from
|
||||||
|
`cmd/agent/main.go:61` config field + `cmd/agent/main.go:1371` CLI
|
||||||
|
flag). Operators flip this only when debugging an agent against a
|
||||||
|
self-signed control plane that hasn't been added to the agent's
|
||||||
|
trust store. Document as `--insecure-skip-verify` in the agent's
|
||||||
|
install runbook; the agent logs a startup WARN any time the flag
|
||||||
|
is set. SEC-M3 pins that the carve-out is intentional.
|
||||||
|
- **Agent verification probe** — `cmd/agent/verify.go:78`. The probe
|
||||||
|
intentionally opens a TLS connection with verification disabled so
|
||||||
|
it can inspect any certificate the endpoint serves (including
|
||||||
|
self-signed or expired ones — that's the whole point of a probe).
|
||||||
|
The probe never returns trust state to a security-relevant code
|
||||||
|
path; it only reads cert metadata. SEC-M3 pins this.
|
||||||
|
- **tlsprobe (network scanner)** — `internal/tlsprobe/probe.go:54`.
|
||||||
|
Same rationale as the agent verify probe — network discovery must
|
||||||
|
introspect any certificate it finds, including the ones with the
|
||||||
|
problems we're scanning for. SEC-M3 pins this.
|
||||||
|
|
||||||
|
### F5 target connector — `InsecureSkipVerify` per-config
|
||||||
|
|
||||||
|
The F5 target connector exposes an `Insecure: bool` field on its
|
||||||
|
per-target config blob (default `false`). When set,
|
||||||
|
`internal/connector/target/f5/f5.go:134` builds the HTTP client with
|
||||||
|
`InsecureSkipVerify: config.Insecure`. SEC-M5 closure: operator
|
||||||
|
opt-in for self-signed F5 BIG-IP device certs; mitigation is to run
|
||||||
|
the F5 + the proxy-agent on a network-segmented internal subnet.
|
||||||
|
Document in the F5 connector's per-target setup guide.
|
||||||
|
|
||||||
|
### ACME issuer — `CERTCTL_ACME_INSECURE` (now gated on ACK)
|
||||||
|
|
||||||
|
`internal/connector/issuer/acme/acme.go:201` builds the ACME HTTP
|
||||||
|
client with `InsecureSkipVerify: true` for the Pebble integration
|
||||||
|
test path. The per-issuer runtime setting comes from
|
||||||
|
`CERTCTL_ACME_INSECURE` (`internal/config/config.go:2116`); Phase 2
|
||||||
|
SEC-M4 closure (2026-05-13) added the fail-closed gate so the operator
|
||||||
|
must ALSO set `CERTCTL_ACME_INSECURE_ACK=true` for the server to boot.
|
||||||
|
Production deploys must never set either flag. The boot-time WARN log
|
||||||
|
at `cmd/server/main.go:611` continues to fire for the ACK'd case so
|
||||||
|
every restart logs the reminder.
|
||||||
|
|
||||||
|
### CSP `'unsafe-inline'` on `style-src`
|
||||||
|
|
||||||
|
`internal/api/middleware/securityheaders.go:58` ships the dashboard
|
||||||
|
CSP with `style-src 'self' 'unsafe-inline'`. This is required because
|
||||||
|
Tailwind compiles utility classes into a single stylesheet at build
|
||||||
|
time, but inline-style attributes appear in the dashboard via inline
|
||||||
|
`<svg>` elements + Recharts' `<ResponsiveContainer>` injecting inline
|
||||||
|
width/height. SEC-L1 closure: the carve-out is necessary today; the
|
||||||
|
planned tightening flow is the frontend audit's FE-H2 (icon library)
|
||||||
|
+ decorative-SVG sweep that then unlocks the CSP hardening (drops
|
||||||
|
`'unsafe-inline'`).
|
||||||
|
|
||||||
|
### Break-glass admin — Argon2id rest-defense reminder
|
||||||
|
|
||||||
|
The break-glass admin path (`docs/operator/runbooks/disaster-recovery.md`)
|
||||||
|
hashes the operator-supplied password with Argon2id and stores the
|
||||||
|
hash in the `breakglass_credentials` table. SEC-L2 reminder: the
|
||||||
|
strength of the rest-defense is operator-supplied — pick a password
|
||||||
|
with sufficient entropy (≥ 64 random bits via `openssl rand -base64
|
||||||
|
12`) and rotate after every use. Argon2id resists offline cracking
|
||||||
|
but an operator-supplied "Password123" hashes the same way.
|
||||||
|
|
||||||
|
### Body-size limit (1 MB default) — operator-tunable
|
||||||
|
|
||||||
|
The `http.MaxBytesReader` wrap caps inbound request bodies at 1 MB
|
||||||
|
by default. The cap is necessary defense against unbounded-body DOS
|
||||||
|
but catches legitimate operator workflows:
|
||||||
|
|
||||||
|
- Bulk truststore PEM bundle uploads (CA bundles for federated trust
|
||||||
|
stores can be > 1 MB).
|
||||||
|
- Multi-MB CRL pushes via the CRL-cache endpoint.
|
||||||
|
- Bulk-import of certificates with embedded chains.
|
||||||
|
|
||||||
|
SEC-L3 closure: operators raise the cap via `CERTCTL_MAX_BODY_SIZE`
|
||||||
|
(units: bytes; e.g. `CERTCTL_MAX_BODY_SIZE=10485760` for 10 MB).
|
||||||
|
Document in `deploy/ENVIRONMENTS.md`.
|
||||||
|
|
||||||
|
### Demo Compose placeholder credentials
|
||||||
|
|
||||||
|
`deploy/docker-compose.demo.yml` ships `CERTCTL_AUTH_SECRET=change-me-in-production`,
|
||||||
|
`CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key`, and
|
||||||
|
`CERTCTL_API_KEY=change-me-in-production` as documented demo
|
||||||
|
defaults. The runtime `Validate()` fail-closed guards
|
||||||
|
(`internal/config/config.go::Validate`, Bundle 2 2026-05-12) refuse
|
||||||
|
to start if those literal strings reach a non-demo config. Phase 2
|
||||||
|
DEPL-M2 closure adds a CI guard
|
||||||
|
(`scripts/ci-guards/no-change-me-in-prod-compose.sh`) that fails the
|
||||||
|
build at PR time if a `change-me-*` literal leaks into a non-demo
|
||||||
|
compose file — catching the regression one layer before the runtime
|
||||||
|
guard fires.
|
||||||
|
|
||||||
|
### Kubernetes NetworkPolicy — operator-opt-in
|
||||||
|
|
||||||
|
`deploy/helm/certctl/templates/networkpolicy.yaml` ships the template
|
||||||
|
but `deploy/helm/certctl/values.yaml` defaults `networkPolicy.enabled:
|
||||||
|
false`. DEPL-M3 rationale: most Kubernetes clusters don't have a
|
||||||
|
NetworkPolicy controller installed (kind / minikube / fresh k3s); a
|
||||||
|
default-enabled NetworkPolicy renders fine but produces no
|
||||||
|
enforcement, and bare-metal `kube-router`-style controllers may
|
||||||
|
interpret a permissive default differently. Production deploys with a
|
||||||
|
real NetworkPolicy controller (Calico, Cilium, Antrea) flip the
|
||||||
|
values key to `true` and tune the policy in their values overlay.
|
||||||
|
Document the production-enable in
|
||||||
|
`docs/operator/runbooks/ha.md` (added Phase 2 DEPL-H1).
|
||||||
|
|
||||||
## Reporting a vulnerability
|
## Reporting a vulnerability
|
||||||
|
|
||||||
Email `certctl@proton.me`. Coordinated disclosure preferred; we will
|
Email `certctl@proton.me`. Coordinated disclosure preferred; we will
|
||||||
|
|||||||
+148
-2
@@ -2,6 +2,7 @@ package config
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"crypto/tls"
|
"crypto/tls"
|
||||||
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"net"
|
"net"
|
||||||
@@ -11,6 +12,54 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// Phase 2 (Default Hardening + Operator Docs) introduced three new
|
||||||
|
// error sentinels surfaced by Validate(). Tests pin them by
|
||||||
|
// errors.Is(err, ErrX) AND by message-text match for double safety;
|
||||||
|
// downstream callers may inspect the wrapped chain to react to a
|
||||||
|
// specific failure class without parsing the user-facing message.
|
||||||
|
//
|
||||||
|
// All three are staged behavior. Their default in this release is
|
||||||
|
// "off / opt-in" — production deploys must explicitly acknowledge to
|
||||||
|
// activate. The default-flip schedule lives in WORKSPACE-ROADMAP.md.
|
||||||
|
var (
|
||||||
|
// ErrAgentBootstrapTokenRequired is returned by Validate() when
|
||||||
|
// CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true and the token is
|
||||||
|
// empty. Phase 2 SEC-H1 closure — staged feature flag (default
|
||||||
|
// false this release; default true scheduled for v2.2.0).
|
||||||
|
ErrAgentBootstrapTokenRequired = errors.New(
|
||||||
|
"CERTCTL_AGENT_BOOTSTRAP_TOKEN is empty and CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true — refuse to start. " +
|
||||||
|
"Generate a real secret (e.g. openssl rand -base64 32) and set CERTCTL_AGENT_BOOTSTRAP_TOKEN, " +
|
||||||
|
"or unset CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY to keep the warn-mode pass-through default",
|
||||||
|
)
|
||||||
|
|
||||||
|
// ErrACMEInsecureWithoutAck is returned by Validate() when
|
||||||
|
// CERTCTL_ACME_INSECURE=true and CERTCTL_ACME_INSECURE_ACK is missing
|
||||||
|
// or false. Phase 2 SEC-M4 closure: upgrade the existing boot-time
|
||||||
|
// WARN log to a hard refuse-to-start gate behind an explicit ACK.
|
||||||
|
ErrACMEInsecureWithoutAck = errors.New(
|
||||||
|
"CERTCTL_ACME_INSECURE=true but CERTCTL_ACME_INSECURE_ACK is not true — refuse to start. " +
|
||||||
|
"ACME directory TLS verification is DISABLED; every round-trip skips certificate chain validation. " +
|
||||||
|
"Production deploys MUST NOT enable this. To unlock for dev / Pebble / step-ca with self-signed roots, " +
|
||||||
|
"set CERTCTL_ACME_INSECURE_ACK=true alongside CERTCTL_ACME_INSECURE=true",
|
||||||
|
)
|
||||||
|
|
||||||
|
// ErrDemoModeAckExpired is returned by Validate() when
|
||||||
|
// DemoModeAck=true and CERTCTL_DEMO_MODE_ACK_TS is missing or older
|
||||||
|
// than 24h. Phase 2 SEC-H3 closure: the sticky DemoModeAck bit now
|
||||||
|
// expires, forcing operators of accidentally-promoted demo
|
||||||
|
// deployments to re-acknowledge the synthetic-admin posture daily.
|
||||||
|
ErrDemoModeAckExpired = errors.New(
|
||||||
|
"CERTCTL_DEMO_MODE_ACK=true requires CERTCTL_DEMO_MODE_ACK_TS=<unix-epoch> set within the last 24h — refuse to start. " +
|
||||||
|
"This guard catches forgotten demo deployments accidentally left in production. " +
|
||||||
|
"Set CERTCTL_DEMO_MODE_ACK_TS=$(date +%s) at every compose-up; the demo compose helper script does this automatically",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
// demoModeAckMaxAge is the maximum allowable age of
|
||||||
|
// CERTCTL_DEMO_MODE_ACK_TS before the demo-mode ACK is considered
|
||||||
|
// expired. Hard-coded at 24h per Phase 2 SEC-H3.
|
||||||
|
const demoModeAckMaxAge = 24 * time.Hour
|
||||||
|
|
||||||
// Config represents the complete application configuration.
|
// Config represents the complete application configuration.
|
||||||
// All configuration values are read from environment variables with CERTCTL_ prefix.
|
// All configuration values are read from environment variables with CERTCTL_ prefix.
|
||||||
type Config struct {
|
type Config struct {
|
||||||
@@ -655,6 +704,20 @@ type ACMEConfig struct {
|
|||||||
// Only use for testing with self-signed ACME servers like Pebble. Never in production.
|
// Only use for testing with self-signed ACME servers like Pebble. Never in production.
|
||||||
// Setting: CERTCTL_ACME_INSECURE environment variable.
|
// Setting: CERTCTL_ACME_INSECURE environment variable.
|
||||||
Insecure bool
|
Insecure bool
|
||||||
|
|
||||||
|
// InsecureAck is the Phase 2 SEC-M4 closure (2026-05-13): when
|
||||||
|
// Insecure=true, Validate() refuses to start unless InsecureAck is
|
||||||
|
// also true. Pre-Phase-2 the Insecure flag only emitted a boot-time
|
||||||
|
// WARN log; this guard converts that to a hard fail-closed gate so
|
||||||
|
// the dev-only escape hatch cannot be flipped accidentally in
|
||||||
|
// production via a copy-pasted Pebble runbook.
|
||||||
|
//
|
||||||
|
// Acknowledged (Insecure=true + InsecureAck=true): boot proceeds + WARN logs.
|
||||||
|
// Unack'd (Insecure=true + InsecureAck=false): ErrACMEInsecureWithoutAck.
|
||||||
|
// Off (Insecure=false): InsecureAck is ignored entirely.
|
||||||
|
//
|
||||||
|
// Setting: CERTCTL_ACME_INSECURE_ACK environment variable.
|
||||||
|
InsecureAck bool
|
||||||
}
|
}
|
||||||
|
|
||||||
// ACMEServerConfig is the SERVER-side ACME (RFC 8555 + RFC 9773 ARI)
|
// ACMEServerConfig is the SERVER-side ACME (RFC 8555 + RFC 9773 ARI)
|
||||||
@@ -1590,6 +1653,16 @@ type AuthConfig struct {
|
|||||||
// Setting: CERTCTL_AGENT_BOOTSTRAP_TOKEN environment variable.
|
// Setting: CERTCTL_AGENT_BOOTSTRAP_TOKEN environment variable.
|
||||||
AgentBootstrapToken string
|
AgentBootstrapToken string
|
||||||
|
|
||||||
|
// AgentBootstrapTokenDenyEmpty is the staged feature flag for SEC-H1
|
||||||
|
// (Phase 2, 2026-05-13). When true AND AgentBootstrapToken is empty,
|
||||||
|
// Validate() returns ErrAgentBootstrapTokenRequired and the server
|
||||||
|
// refuses to start. Default: false (warn-mode pass-through preserved
|
||||||
|
// for backward compatibility with operators on the v2.1.x line).
|
||||||
|
// WORKSPACE-ROADMAP.md schedules the default flip to true for the
|
||||||
|
// v2.2.0 cut — operators get one upgrade-window to set a real token.
|
||||||
|
// Setting: CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY environment variable.
|
||||||
|
AgentBootstrapTokenDenyEmpty bool
|
||||||
|
|
||||||
// Session holds the Auth Bundle 2 Phase 4 session-service tunables.
|
// Session holds the Auth Bundle 2 Phase 4 session-service tunables.
|
||||||
// Defaults are documented on the SessionConfig fields. The session
|
// Defaults are documented on the SessionConfig fields. The session
|
||||||
// service is wired into cmd/server/main.go alongside the OIDC
|
// service is wired into cmd/server/main.go alongside the OIDC
|
||||||
@@ -1624,6 +1697,22 @@ type AuthConfig struct {
|
|||||||
// Setting: CERTCTL_DEMO_MODE_ACK environment variable.
|
// Setting: CERTCTL_DEMO_MODE_ACK environment variable.
|
||||||
DemoModeAck bool
|
DemoModeAck bool
|
||||||
|
|
||||||
|
// DemoModeAckTS is the unix-epoch timestamp at which DemoModeAck was
|
||||||
|
// last acknowledged. Phase 2 SEC-H3 closure (2026-05-13): the sticky
|
||||||
|
// DemoModeAck bit now expires after 24h. When DemoModeAck=true,
|
||||||
|
// Validate() requires DemoModeAckTS to be set AND parse as a unix
|
||||||
|
// epoch within the last demoModeAckMaxAge (24h); otherwise
|
||||||
|
// ErrDemoModeAckExpired fires and the server refuses to start.
|
||||||
|
//
|
||||||
|
// This catches the canonical "demo deployment accidentally
|
||||||
|
// promoted to production and forgotten about" failure mode: the
|
||||||
|
// container restart that re-loads config now refuses unless the
|
||||||
|
// operator re-supplies a fresh timestamp.
|
||||||
|
//
|
||||||
|
// Setting: CERTCTL_DEMO_MODE_ACK_TS (unix epoch, e.g. `$(date +%s)`).
|
||||||
|
// The demo compose helper sets this automatically at compose-up.
|
||||||
|
DemoModeAckTS string
|
||||||
|
|
||||||
// DemoModeResidualStrict refuses startup when Auth.Type != none
|
// DemoModeResidualStrict refuses startup when Auth.Type != none
|
||||||
// and `actor-demo-anon` has residual role grants in actor_roles.
|
// and `actor-demo-anon` has residual role grants in actor_roles.
|
||||||
// Default false (emit WARN log + audit row instead). Audit
|
// Default false (emit WARN log + audit row instead). Audit
|
||||||
@@ -1915,7 +2004,8 @@ func Load() (*Config, error) {
|
|||||||
Secret: getEnv("CERTCTL_AUTH_SECRET", ""),
|
Secret: getEnv("CERTCTL_AUTH_SECRET", ""),
|
||||||
// Audit 2026-05-10 HIGH-12 closure: required-true to allow
|
// Audit 2026-05-10 HIGH-12 closure: required-true to allow
|
||||||
// CERTCTL_AUTH_TYPE=none with a non-loopback listen address.
|
// CERTCTL_AUTH_TYPE=none with a non-loopback listen address.
|
||||||
DemoModeAck: getEnvBool("CERTCTL_DEMO_MODE_ACK", false),
|
DemoModeAck: getEnvBool("CERTCTL_DEMO_MODE_ACK", false),
|
||||||
|
DemoModeAckTS: getEnv("CERTCTL_DEMO_MODE_ACK_TS", ""),
|
||||||
// Audit 2026-05-11 A-8 closure: when true, the preflight
|
// Audit 2026-05-11 A-8 closure: when true, the preflight
|
||||||
// residual-grants detector refuses startup if actor-demo-anon
|
// residual-grants detector refuses startup if actor-demo-anon
|
||||||
// has any actor_roles rows. Default false (WARN-only).
|
// has any actor_roles rows. Default false (WARN-only).
|
||||||
@@ -1927,7 +2017,8 @@ func Load() (*Config, error) {
|
|||||||
|
|
||||||
// Bundle-5 / Audit H-007: agent-registration bootstrap secret.
|
// Bundle-5 / Audit H-007: agent-registration bootstrap secret.
|
||||||
// Empty (default) = warn-mode pass-through; v2.2.0 will require it.
|
// Empty (default) = warn-mode pass-through; v2.2.0 will require it.
|
||||||
AgentBootstrapToken: getEnv("CERTCTL_AGENT_BOOTSTRAP_TOKEN", ""),
|
AgentBootstrapToken: getEnv("CERTCTL_AGENT_BOOTSTRAP_TOKEN", ""),
|
||||||
|
AgentBootstrapTokenDenyEmpty: getEnvBool("CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY", false),
|
||||||
// Bundle 1 Phase 6: one-shot bootstrap token for the
|
// Bundle 1 Phase 6: one-shot bootstrap token for the
|
||||||
// /v1/auth/bootstrap endpoint that mints the first admin
|
// /v1/auth/bootstrap endpoint that mints the first admin
|
||||||
// key. Empty = bootstrap endpoint disabled (default).
|
// key. Empty = bootstrap endpoint disabled (default).
|
||||||
@@ -2114,6 +2205,7 @@ func Load() (*Config, error) {
|
|||||||
Profile: getEnv("CERTCTL_ACME_PROFILE", ""),
|
Profile: getEnv("CERTCTL_ACME_PROFILE", ""),
|
||||||
ARIEnabled: getEnvBool("CERTCTL_ACME_ARI_ENABLED", false),
|
ARIEnabled: getEnvBool("CERTCTL_ACME_ARI_ENABLED", false),
|
||||||
Insecure: getEnvBool("CERTCTL_ACME_INSECURE", false),
|
Insecure: getEnvBool("CERTCTL_ACME_INSECURE", false),
|
||||||
|
InsecureAck: getEnvBool("CERTCTL_ACME_INSECURE_ACK", false),
|
||||||
},
|
},
|
||||||
// ACME server (RFC 8555 + RFC 9773 ARI) — distinct from the
|
// ACME server (RFC 8555 + RFC 9773 ARI) — distinct from the
|
||||||
// consumer-side ACME issuer connector above. Server uses
|
// consumer-side ACME issuer connector above. Server uses
|
||||||
@@ -2603,6 +2695,60 @@ func (c *Config) Validate() error {
|
|||||||
return fmt.Errorf("auth secret is required for auth type %s", c.Auth.Type)
|
return fmt.Errorf("auth secret is required for auth type %s", c.Auth.Type)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-H1 closure (2026-05-13): the AgentBootstrapTokenDenyEmpty
|
||||||
|
// staged feature flag. When the operator opts in via
|
||||||
|
// CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true AND the bootstrap
|
||||||
|
// token is empty, Validate() returns a fail-closed error. Default
|
||||||
|
// flag value is false, preserving the existing v2.1.x warn-mode
|
||||||
|
// pass-through behavior for backward compatibility. The default-flip
|
||||||
|
// to true is scheduled for v2.2.0 in WORKSPACE-ROADMAP.md — operators
|
||||||
|
// get one upgrade window to set a real token.
|
||||||
|
if c.Auth.AgentBootstrapTokenDenyEmpty && c.Auth.AgentBootstrapToken == "" {
|
||||||
|
return fmt.Errorf("phase-2 SEC-H1 fail-closed guard: %w", ErrAgentBootstrapTokenRequired)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-M4 closure (2026-05-13): convert the existing boot-time
|
||||||
|
// WARN log for CERTCTL_ACME_INSECURE=true into a hard refuse-to-start
|
||||||
|
// gate behind an explicit ACK env var. The dev-only escape hatch can
|
||||||
|
// no longer be flipped accidentally via a copy-pasted Pebble runbook
|
||||||
|
// — production deploys must explicitly set both Insecure=true AND
|
||||||
|
// InsecureAck=true to acknowledge they understand the consequences.
|
||||||
|
// The boot-time WARN log path in cmd/server/main.go continues to fire
|
||||||
|
// for the ACK'd case so the operator sees the reminder every restart.
|
||||||
|
if c.ACME.Insecure && !c.ACME.InsecureAck {
|
||||||
|
return fmt.Errorf("phase-2 SEC-M4 fail-closed guard: %w", ErrACMEInsecureWithoutAck)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-H3 closure (2026-05-13): the sticky DemoModeAck bit
|
||||||
|
// now expires after demoModeAckMaxAge (24h). When the operator sets
|
||||||
|
// CERTCTL_DEMO_MODE_ACK=true, they MUST also set
|
||||||
|
// CERTCTL_DEMO_MODE_ACK_TS=$(date +%s) and re-supply it within the
|
||||||
|
// 24h window on every restart. The demo compose helper script does
|
||||||
|
// this automatically at compose-up. Catches the canonical
|
||||||
|
// "forgotten demo deployment promoted to production" failure mode:
|
||||||
|
// the next container restart refuses unless the operator re-acks.
|
||||||
|
if c.Auth.DemoModeAck {
|
||||||
|
if c.Auth.DemoModeAckTS == "" {
|
||||||
|
return fmt.Errorf("phase-2 SEC-H3 fail-closed guard (missing TS): %w", ErrDemoModeAckExpired)
|
||||||
|
}
|
||||||
|
ackEpoch, err := strconv.ParseInt(strings.TrimSpace(c.Auth.DemoModeAckTS), 10, 64)
|
||||||
|
if err != nil {
|
||||||
|
return fmt.Errorf("phase-2 SEC-H3 fail-closed guard: CERTCTL_DEMO_MODE_ACK_TS=%q must parse as a unix epoch integer (try $(date +%%s)); parse error %w: %w",
|
||||||
|
c.Auth.DemoModeAckTS, err, ErrDemoModeAckExpired)
|
||||||
|
}
|
||||||
|
ackTime := time.Unix(ackEpoch, 0)
|
||||||
|
if time.Since(ackTime) > demoModeAckMaxAge {
|
||||||
|
return fmt.Errorf("phase-2 SEC-H3 fail-closed guard (TS age %s exceeds %s): %w",
|
||||||
|
time.Since(ackTime).Round(time.Second), demoModeAckMaxAge, ErrDemoModeAckExpired)
|
||||||
|
}
|
||||||
|
// Future-dated timestamps are also rejected — likely operator clock skew
|
||||||
|
// or a typo. Allow a small future skew (1m) to absorb minor clock drift.
|
||||||
|
if time.Until(ackTime) > time.Minute {
|
||||||
|
return fmt.Errorf("phase-2 SEC-H3 fail-closed guard (TS is %s in the future, exceeds 1m clock-skew tolerance): %w",
|
||||||
|
time.Until(ackTime).Round(time.Second), ErrDemoModeAckExpired)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Audit 2026-05-10 HIGH-12 closure: refuse to start when
|
// Audit 2026-05-10 HIGH-12 closure: refuse to start when
|
||||||
// CERTCTL_AUTH_TYPE=none is bound to a non-loopback address unless
|
// CERTCTL_AUTH_TYPE=none is bound to a non-loopback address unless
|
||||||
// the operator explicitly acknowledges the bypass via
|
// the operator explicitly acknowledges the bypass via
|
||||||
|
|||||||
@@ -7,10 +7,12 @@ import (
|
|||||||
"crypto/x509"
|
"crypto/x509"
|
||||||
"crypto/x509/pkix"
|
"crypto/x509/pkix"
|
||||||
"encoding/pem"
|
"encoding/pem"
|
||||||
|
"errors"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"math/big"
|
"math/big"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
"testing"
|
"testing"
|
||||||
"time"
|
"time"
|
||||||
@@ -398,6 +400,233 @@ func TestLoad_CommaSeparatedList(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-H1 (2026-05-13) — AgentBootstrapTokenDenyEmpty staged flag.
|
||||||
|
// When false (default), an empty token is permitted (v2.1.x warn-mode
|
||||||
|
// pass-through preserved). When true, an empty token fails closed.
|
||||||
|
func TestValidate_AgentBootstrapTokenDenyEmpty_DefaultFalse_AllowsEmpty(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "", AgentBootstrapTokenDenyEmpty: false},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with deny-empty=false + empty token: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_AgentBootstrapTokenDenyEmpty_True_EmptyTokenFailsClosed(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "", AgentBootstrapTokenDenyEmpty: true},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrAgentBootstrapTokenRequired")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrAgentBootstrapTokenRequired) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrAgentBootstrapTokenRequired", err)
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true") {
|
||||||
|
t.Errorf("Validate() error = %q; want message to mention the deny-empty env var name", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_AgentBootstrapTokenDenyEmpty_True_RealTokenPasses(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "a-real-32-byte-token-value-here-x", AgentBootstrapTokenDenyEmpty: true},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with deny-empty=true + real token: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-M4 (2026-05-13) — ACME insecure now requires explicit ACK.
|
||||||
|
func TestValidate_ACMEInsecure_WithoutAck_FailsClosed(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
ACME: ACMEConfig{Insecure: true, InsecureAck: false},
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrACMEInsecureWithoutAck")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrACMEInsecureWithoutAck) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrACMEInsecureWithoutAck", err)
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "CERTCTL_ACME_INSECURE_ACK") {
|
||||||
|
t.Errorf("Validate() error = %q; want message to mention CERTCTL_ACME_INSECURE_ACK", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_ACMEInsecure_WithAck_Passes(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
ACME: ACMEConfig{Insecure: true, InsecureAck: true},
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with Insecure=true + InsecureAck=true: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_ACMEInsecureFalse_IgnoresAck(t *testing.T) {
|
||||||
|
// InsecureAck is irrelevant when Insecure=false. No fail-closed branch.
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
ACME: ACMEConfig{Insecure: false, InsecureAck: false},
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with Insecure=false: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2 SEC-H3 (2026-05-13) — DemoModeAck now expires after 24h via DemoModeAckTS.
|
||||||
|
// Note: DemoModeAck=true on a loopback bind requires only the timestamp guard;
|
||||||
|
// no HIGH-12 cross-firing because the existing HIGH-12 guard fires only on
|
||||||
|
// non-loopback hosts. All tests here keep the server host as loopback so we
|
||||||
|
// observe ONLY the new SEC-H3 behavior.
|
||||||
|
func TestValidate_DemoModeAck_MissingTS_FailsClosed(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: ""},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with empty TS")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "CERTCTL_DEMO_MODE_ACK_TS") {
|
||||||
|
t.Errorf("Validate() error = %q; want message to mention CERTCTL_DEMO_MODE_ACK_TS", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_DemoModeAck_StaleTS_FailsClosed(t *testing.T) {
|
||||||
|
// TS older than 24h → expired.
|
||||||
|
staleEpoch := time.Now().Add(-25 * time.Hour).Unix()
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(staleEpoch, 10)},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with 25h-old TS")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_DemoModeAck_FreshTS_Passes(t *testing.T) {
|
||||||
|
// TS within 24h → passes.
|
||||||
|
freshEpoch := time.Now().Add(-1 * time.Hour).Unix()
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(freshEpoch, 10)},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with 1h-old TS: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_DemoModeAck_NonNumericTS_FailsClosed(t *testing.T) {
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: "yesterday"},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with non-numeric TS")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "parse") {
|
||||||
|
t.Errorf("Validate() error = %q; want message to mention parse failure", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_DemoModeAck_FutureDatedTS_FailsClosed(t *testing.T) {
|
||||||
|
// > 1m future-dated → clock-skew rejection.
|
||||||
|
futureEpoch := time.Now().Add(10 * time.Minute).Unix()
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(futureEpoch, 10)},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
err := cfg.Validate()
|
||||||
|
if err == nil {
|
||||||
|
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with future-dated TS")
|
||||||
|
}
|
||||||
|
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||||
|
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||||
|
}
|
||||||
|
if !strings.Contains(err.Error(), "future") {
|
||||||
|
t.Errorf("Validate() error = %q; want message to mention future-dated TS", err.Error())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestValidate_DemoModeAckFalse_IgnoresTS(t *testing.T) {
|
||||||
|
// DemoModeAck=false → TS is irrelevant; no fail-closed branch.
|
||||||
|
cfg := &Config{
|
||||||
|
Server: validServerConfig(t),
|
||||||
|
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||||
|
Log: LogConfig{Level: "info", Format: "json"},
|
||||||
|
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: false, DemoModeAckTS: ""},
|
||||||
|
Keygen: KeygenConfig{Mode: "agent"},
|
||||||
|
Scheduler: validSchedulerConfig(),
|
||||||
|
}
|
||||||
|
if err := cfg.Validate(); err != nil {
|
||||||
|
t.Fatalf("Validate() returned error with DemoModeAck=false: %v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func TestValidate_ValidConfig(t *testing.T) {
|
func TestValidate_ValidConfig(t *testing.T) {
|
||||||
cfg := &Config{
|
cfg := &Config{
|
||||||
Server: validServerConfig(t),
|
Server: validServerConfig(t),
|
||||||
|
|||||||
+56
@@ -0,0 +1,56 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# scripts/ci-guards/no-change-me-in-prod-compose.sh
|
||||||
|
#
|
||||||
|
# Phase 2 DEPL-M2 closure (2026-05-13): the demo Compose overlay
|
||||||
|
# (`deploy/docker-compose.demo.yml`) intentionally ships placeholder
|
||||||
|
# `change-me-*` credentials (CERTCTL_AUTH_SECRET=change-me-in-production,
|
||||||
|
# CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key, etc.)
|
||||||
|
# behind the DemoModeAck=true exemption in Validate(). The runtime
|
||||||
|
# fail-closed guards in internal/config/config.go::Validate (Bundle 2
|
||||||
|
# 2026-05-12) refuse to start when those strings reach a non-demo
|
||||||
|
# config, so the runtime path is protected.
|
||||||
|
#
|
||||||
|
# This guard catches the SAME class of mistake one layer earlier — at
|
||||||
|
# CI / PR-review time, before the change reaches any operator's
|
||||||
|
# workstation. Specifically: any non-demo compose file (base
|
||||||
|
# `docker-compose.yml`, `docker-compose.dev.yml`, or any operator-
|
||||||
|
# authored overlay that doesn't carry the `.demo.yml` suffix) must
|
||||||
|
# never contain a `change-me-` literal.
|
||||||
|
#
|
||||||
|
# This is belt-and-suspenders alongside the runtime guard: a
|
||||||
|
# placeholder leaking into the base compose surfaces here BEFORE any
|
||||||
|
# operator's `docker compose up` triggers the fail-closed boot.
|
||||||
|
#
|
||||||
|
# Allowlist: deploy/docker-compose.demo.yml — the demo overlay
|
||||||
|
# legitimately ships the placeholder strings as documented defaults.
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Scan every tracked compose file under deploy/ except the demo overlay.
|
||||||
|
# Exclude comment lines (starting with optional whitespace then `#`) so
|
||||||
|
# documentation discussing the placeholder pattern doesn't trip the guard.
|
||||||
|
# The remaining matches are actual YAML key=value or env: lines.
|
||||||
|
VIOLATIONS=$(git ls-files 'deploy/*compose*.yml' 'deploy/*compose*.yaml' \
|
||||||
|
| grep -v 'docker-compose\.demo\.yml$' \
|
||||||
|
| xargs grep -nE 'change-me-' 2>/dev/null \
|
||||||
|
| grep -vE ':\s*#' \
|
||||||
|
|| true)
|
||||||
|
|
||||||
|
if [ -n "$VIOLATIONS" ]; then
|
||||||
|
echo "::error::DEPL-M2 regression: 'change-me-' placeholder credential leaked into a non-demo compose file."
|
||||||
|
echo ""
|
||||||
|
echo "Placeholder credentials are exempt only in deploy/docker-compose.demo.yml"
|
||||||
|
echo "(the demo overlay sets DemoModeAck=true at runtime, which unlocks them via"
|
||||||
|
echo "Validate()'s Bundle 2 fail-closed guards). Production / dev / staging compose"
|
||||||
|
echo "files must use real values."
|
||||||
|
echo ""
|
||||||
|
echo "Violations:"
|
||||||
|
echo "$VIOLATIONS"
|
||||||
|
echo ""
|
||||||
|
echo "Fix: either move the offending compose to the .demo.yml overlay, or replace"
|
||||||
|
echo "the placeholder with an env-var interpolation (\${CERTCTL_AUTH_SECRET:?required})"
|
||||||
|
echo "that fails compose-up cleanly when the operator forgot to set it."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "no-change-me-in-prod-compose guard OK: no 'change-me-' placeholders in non-demo compose files"
|
||||||
Reference in New Issue
Block a user