Follow-up to 78dcc9e (U-1 docker-compose fix) — closes the remaining adjacent
code paths that share the postgres-first-boot-password-binding root cause but
were scoped out of the original commit.
The runtime diagnostic in internal/repository/postgres/db.go::wrapPingError
(landed in 67f352d) already covers every NewDB call site, so Helm operators
and example users hit the SQLSTATE 28P01 guidance for free at startup. What
was missing: deployment-shape-specific remediation guidance (kubectl vs
docker-compose), the hardcoded password in the *root* .env.example, and
shared ops notes for the 5 examples/ compose files. This commit closes all
three.
Files changed:
- .env.example (root) — line 16 had `postgres://certctl:certctl@...` with
the password hardcoded literally instead of interpolating POSTGRES_PASSWORD.
Edit if a user copied this file as their .env (binary-direct deployment,
not docker-compose) and rotated POSTGRES_PASSWORD on line 10, the URL on
line 16 still carried 'certctl' — silent two-line drift. Replaced 'certctl'
with the same default that line 10 carries ('change-me-in-production') and
added an explanatory comment block describing the docker-compose
override semantics, when this URL matters (binary-direct), and the
cross-reference to the U-1 wrapPingError diagnostic. Also fixed an
adjacent bug: line 31 CERTCTL_SERVER_URL was `http://localhost:8443`,
which agents reject at startup since v2.2 (HTTPS-everywhere milestone made
the control plane HTTPS-only with TLS 1.3 pinned). Updated to https://
with a comment pointing operators at the bootstrap CA bundle.
- deploy/helm/certctl/values.yaml — postgresql.auth.password field had a
one-line 'REQUIRED' comment. Expanded into a full WARNING block (~25
lines) explaining the PVC retention semantics, the failure symptom,
and both kubectl-flavored remediation paths: non-destructive
(`kubectl exec ... ALTER ROLE`) preferred for environments with data,
and destructive (`helm uninstall + kubectl delete pvc`) for dev/demo.
Cross-references the wrapPingError runtime diagnostic.
- deploy/helm/certctl/README.md (new, ~115 lines) — chart-level operational
guide. Covers quick install, both remediation paths with concrete
kubectl commands, why-we-don't-fix-this-in-the-chart explanation,
cross-references to the docker-compose docs, server API key rotation
(the easy case — comma-separated key list), TLS provisioning shapes,
embedded-vs-external postgres, and uninstall semantics with the PVC
retention gotcha called out.
- examples/README.md (new, ~55 lines) — shared operational notes for the
5 example deployments. Covers the postgres password rotation trap with
example-flavored remediation paths (`docker compose -f examples/<x>/...`),
the TLS warning, and teardown semantics. Replaces what would otherwise
be 5x duplication across per-example READMEs.
- examples/{acme-nginx,acme-wildcard-dns01,multi-issuer,private-ca-traefik,
step-ca-haproxy}/*.md — one-line cross-reference at the top of each
example's primary doc, pointing at examples/README.md for the shared
ops notes. Avoids 5x duplication of the same warning text while still
surfacing the link in every operator's first-touch surface.
Verification:
- go build ./... — clean
- go vet ./... — clean
- go test -short ./internal/repository/postgres/ — 4/4 wrapPingError tests
still passing (no production-code touch in this commit)
- helm lint deploy/helm/certctl/ — clean (1 INFO about chart icon, pre-existing)
- helm template smoke test — renders without error
- python3 yaml.safe_load on values.yaml — parses
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-u-quickstart_postgres_password_volume_trap
Closes the three deliberate scope-outs from 78dcc9e (Helm,
root .env.example, examples/) end-to-end.
Adjacent bugs caught while in scope:
- root .env.example:16 hardcoded password not matching line 10
- root .env.example:31 http:// URL incompatible with HTTPS-only v2.2
Breaking change release. Plaintext HTTP listener removed. The certctl
control plane now terminates TLS 1.3 on :8443 via
http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape
hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md.
Server
- cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert
swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback),
preflightServerTLS validation
- cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe,
watchSIGHUP wiring, cert/key path config threading
- tls_test.go: 418-line regression coverage of reload, preflight,
callback behavior, SAN validation
Config
- CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required)
- Plaintext rejection: agents/CLI/MCP pre-flight-fail on http://
URLs with a pointer to docs/upgrade-to-tls.md
Agents, CLI, MCP
- All three pre-flight-reject http:// URLs with fail-loud diagnostic
- CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust
- CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass
(loud warning on startup)
- install-agent.sh emits both vars as commented template lines
docker-compose
- certctl-tls-init sidecar generates SAN-valid self-signed cert into
deploy/test/certs/ on first boot
- All demo-stack curls pin against ca.crt with --cacert
Helm chart
- Three TLS provisioning modes, exactly one required:
- server.tls.existingSecret (operator-supplied)
- server.tls.certManager.enabled (cert-manager integration)
- server.tls.selfSigned.enabled (eval only — not for production)
- server-certificate.yaml template for cert-manager mode
- helm install without a TLS source fails at template render with
a pointer to docs/tls.md
CI
- .github/workflows/ci.yml Helm Chart Validation step renders the
chart in both existingSecret and cert-manager modes, plus an
inverse guard-regression test that asserts helm template MUST
refuse to render when no TLS source is configured. Previously
the single `helm template` invocation hit the certctl.tls.required
fail-loud guard and exit-1'd CI. Four invocations now: lint
(existingSecret), template (existingSecret), template
(cert-manager), template (no args — must fail).
Integration tests
- deploy/test/integration_test.go stands up the Compose stack over
HTTPS, extracts the CA bundle, and exercises every certctl API
over https://localhost:8443
- All 34 integration subtests green (per Phase 8 local CI-parity)
Documentation
- New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload)
- New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade
warnings, fleet-roll sequencing)
- CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry
(file heading unchanged; release tag is v2.0.47)
- All curls in docs/, examples/, deploy/helm/ guides use
https://localhost:8443 --cacert
Verification
- grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits
- grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin
API default, SSRF doc comment) — zero certctl endpoints
- Tasks #197–#206 (Phases 0–8) all closed in the tracker
Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix).
Three related ACME ecosystem changes shipped as a single milestone:
1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with
`profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing
acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support.
ES256 JWS signing with kid mode, nonce management, directory discovery.
Empty profile delegates to standard library path (zero behavior change).
Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on
ACME issuer config.
2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source,
docs, README, and examples. Zero remaining occurrences of RFC 9702.
3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating
renewal thresholds against SC-081v3 validity reduction timeline (200→100→47
days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the
expected renewal path for 6-day shortlived certs.
New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Audit all docs and examples against current codebase state. Fix seed_demo.sql
domain constant casing (IssuerType, TargetType, AgentStatus) that would cause
agent dispatch failures. Fix example docker-compose health endpoints (/health
not /api/v1/health) and env var names (CERTCTL_DATABASE_URL). Update connector
counts, test numbers, and planned→implemented status across docs. Convert 3
ASCII flow diagrams to Mermaid.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>