Files
certctl/scripts/ci-guards/no-change-me-in-prod-compose.sh
shankar0123 69a2b5c55a config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)
Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.

config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================

Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
  - ErrAgentBootstrapTokenRequired (SEC-H1)
  - ErrACMEInsecureWithoutAck      (SEC-M4)
  - ErrDemoModeAckExpired          (SEC-H3)

SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.

SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.

SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.

Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.

Helm chart + HA runbook (DEPL-H1)
=================================

Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.

CI guard (DEPL-M2)
==================

scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.

Consolidated 'Security carve-outs' doc section
==============================================

docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
  - SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
  - SEC-M5: F5 connector InsecureSkipVerify per-config field
  - SEC-M4: ACME insecure + new ACK gate
  - SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
  - SEC-L2: break-glass Argon2id rest-defense reminder
  - SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
  - DEPL-M2: change-me-* placeholder credentials in demo overlay
  - DEPL-M3: K8s NetworkPolicy operator-opt-in default

Each entry cites the file:line, the rationale for the carve-out, and
the operator action.

CHANGELOG + ENVIRONMENTS coverage
==================================

CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.

deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.

WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.

Sandbox limitation
==================

The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.

CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
2026-05-13 19:50:00 +00:00

57 lines
2.6 KiB
Bash
Executable File

#!/usr/bin/env bash
# scripts/ci-guards/no-change-me-in-prod-compose.sh
#
# Phase 2 DEPL-M2 closure (2026-05-13): the demo Compose overlay
# (`deploy/docker-compose.demo.yml`) intentionally ships placeholder
# `change-me-*` credentials (CERTCTL_AUTH_SECRET=change-me-in-production,
# CERTCTL_CONFIG_ENCRYPTION_KEY=change-me-32-char-encryption-key, etc.)
# behind the DemoModeAck=true exemption in Validate(). The runtime
# fail-closed guards in internal/config/config.go::Validate (Bundle 2
# 2026-05-12) refuse to start when those strings reach a non-demo
# config, so the runtime path is protected.
#
# This guard catches the SAME class of mistake one layer earlier — at
# CI / PR-review time, before the change reaches any operator's
# workstation. Specifically: any non-demo compose file (base
# `docker-compose.yml`, `docker-compose.dev.yml`, or any operator-
# authored overlay that doesn't carry the `.demo.yml` suffix) must
# never contain a `change-me-` literal.
#
# This is belt-and-suspenders alongside the runtime guard: a
# placeholder leaking into the base compose surfaces here BEFORE any
# operator's `docker compose up` triggers the fail-closed boot.
#
# Allowlist: deploy/docker-compose.demo.yml — the demo overlay
# legitimately ships the placeholder strings as documented defaults.
set -e
# Scan every tracked compose file under deploy/ except the demo overlay.
# Exclude comment lines (starting with optional whitespace then `#`) so
# documentation discussing the placeholder pattern doesn't trip the guard.
# The remaining matches are actual YAML key=value or env: lines.
VIOLATIONS=$(git ls-files 'deploy/*compose*.yml' 'deploy/*compose*.yaml' \
| grep -v 'docker-compose\.demo\.yml$' \
| xargs grep -nE 'change-me-' 2>/dev/null \
| grep -vE ':\s*#' \
|| true)
if [ -n "$VIOLATIONS" ]; then
echo "::error::DEPL-M2 regression: 'change-me-' placeholder credential leaked into a non-demo compose file."
echo ""
echo "Placeholder credentials are exempt only in deploy/docker-compose.demo.yml"
echo "(the demo overlay sets DemoModeAck=true at runtime, which unlocks them via"
echo "Validate()'s Bundle 2 fail-closed guards). Production / dev / staging compose"
echo "files must use real values."
echo ""
echo "Violations:"
echo "$VIOLATIONS"
echo ""
echo "Fix: either move the offending compose to the .demo.yml overlay, or replace"
echo "the placeholder with an env-var interpolation (\${CERTCTL_AUTH_SECRET:?required})"
echo "that fails compose-up cleanly when the operator forgot to set it."
exit 1
fi
echo "no-change-me-in-prod-compose guard OK: no 'change-me-' placeholders in non-demo compose files"