Files
certctl/deploy/docker-compose.test.yml
T
Shankar Reddy df4b56798c fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the
canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build);
postgres reported unhealthy indefinitely, dependent containers never started.

Root cause: deploy/docker-compose.yml mounted a hand-curated subset of
migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/.
Postgres applied them at initdb time. Once seed.sql referenced columns added
by migrations *after* the mounted cutoff (e.g., policy_rules.severity from
migration 000013), initdb crashed mid-seed and the container loop wedged.
Two sources of truth (compose mount list vs in-tree migration ladder)
diverged the moment a seed-touching migration shipped, and the only thing
that fixed it was hand-editing the compose file every release.

Fix: remove the dual source. Postgres boots empty; the server applies
migrations + seed at startup via RunMigrations + RunSeed. Helm has used
this pattern since day one (postgres-init emptyDir); compose now matches.

Bundled with four ride-along audit findings whose fixes share the same
schema/db code surface, so operators take the schema-change pain only once:

  cat-u-seed_initdb_schema_drift           [P1, primary] — initdb-mount fix
  cat-o-retry_interval_unit_mismatch       [P1] — column rename minutes→seconds
  cat-o-notification_created_at_dead_field [P2] — add column + populate
  cat-o-health_check_column_orphans        [P1] — drop unwired columns
  cat-u-no_version_endpoint                [P2] — add /api/v1/version

Single migration (000017_db_coupling_cleanup) bundles the three schema
changes under a DO \$\$ guard so re-application is safe; reduces
operator-visible 'schema-change releases' from four to one.

Backend
- internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed
  (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in
  every shipped INSERT) so repeated boots are safe; missing-file is no-op
  so custom packaging that strips seeds still boots cleanly.
- cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set)
  immediately after RunMigrations.
- internal/repository/postgres/notification.go: NotificationRepository.Create
  now sets created_at (with time.Now() fallback when caller leaves it zero);
  scanNotification reads it back; List + ListRetryEligible SELECT extended.
- internal/repository/postgres/renewal_policy.go: column references updated
  to retry_interval_seconds across SELECT/INSERT/UPDATE sites.
- internal/api/handler/version.go: new VersionHandler exposes
  {version, commit, modified, build_time, go_version} from
  runtime/debug.ReadBuildInfo() with ldflags-supplied Version override.
- internal/api/router/router.go: register GET /api/v1/version through the
  no-auth chain (CORS + ContentType) alongside /health, /ready,
  /api/v1/auth/info.
- cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit
  ExcludePaths so rollout polling doesn't dominate the audit trail.
- internal/config/config.go: add DatabaseConfig.DemoSeed +
  CERTCTL_DEMO_SEED env var.

Migration
- migrations/000017_db_coupling_cleanup.up.sql + .down.sql:
    (1) renewal_policies.retry_interval_minutes → retry_interval_seconds
        (DO \$\$ guard, idempotent re-application)
    (2) notification_events ADD COLUMN created_at TIMESTAMPTZ
        NOT NULL DEFAULT NOW()
    (3) network_scan_targets DROP orphan health_check_enabled +
        health_check_interval_seconds
- migrations/seed.sql: column reference updated to retry_interval_seconds.
- migrations/seed_demo.sql: same column rename + applied at runtime now via
  RunDemoSeed (no longer initdb-mounted).

Compose
- deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files +
  seed.sql); add start_period: 30s to postgres + certctl-server healthchecks
  to absorb the runtime migration + seed application window on first boot.
- deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount
  removed; that file never existed); same healthcheck start_period.
- deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with
  CERTCTL_DEMO_SEED=true env var on certctl-server.

Tests
- internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo,
  TestVersion_RejectsNonGet, TestVersion_LdflagsOverride.
- internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently,
  TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently,
  TestMigration000017_RetryIntervalRename,
  TestMigration000017_NotificationCreatedAt,
  TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips).
- internal/repository/postgres/notification_test.go:
  TestNotificationRepository_CreatedAt_IsPersisted +
  TestNotificationRepository_CreatedAt_DefaultsToNow.

CI guardrail
- .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb
  (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql
  re-appears in /docker-entrypoint-initdb.d in any compose file. Catches
  future drift before a fresh-clone operator hits it.

Spec / Docs
- api/openapi.yaml: add /api/v1/version operation under Health tag.
- docs/architecture.md: replace the 'initdb may run the same SQL' paragraph
  with a post-U-3 single-source-of-truth explanation.
- CHANGELOG.md: full unreleased-section entry covering all 5 closures,
  breaking changes, and the new env var.

Audit doc
- coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14
  cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to
   RESOLVED with closure prose pointing at this commit.

Verification: build/vet/test -short -race all clean across all touched
packages locally; govulncheck reports 0 vulnerabilities affecting our
code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the
post-fix tree.
2026-04-25 13:29:23 +00:00

430 lines
18 KiB
YAML

# =============================================================================
# certctl Testing Environment — Docker Compose
# =============================================================================
#
# Spins up the full certctl platform with real CA backends for manual QA:
#
# 0. certctl-tls-init — one-shot init container; writes self-signed
# server.crt/.key/ca.crt into ./test/certs (bind
# mount, not a named volume — host-readable for
# the Go integration test binary)
# 1. PostgreSQL 16 — database (clean, no demo data)
# 2. certctl-server — control plane API + web dashboard on :8443 (HTTPS)
# 3. certctl-agent — polls for work, deploys certs to NGINX
# 4. step-ca — private CA (JWK provisioner, auto-bootstraps)
# 5. Pebble — ACME test server (simulates Let's Encrypt)
# 6. pebble-challtestsrv — DNS/HTTP challenge test server for Pebble
# 7. NGINX — TLS target server on :8080 (HTTP) / :8444 (HTTPS)
#
# Usage:
# cd deploy
# docker compose -f docker-compose.test.yml up --build
#
# Dashboard: https://localhost:8443 (self-signed — use --cacert test/certs/ca.crt)
# API key: test-key-2026
# NGINX: https://localhost:8444 (self-signed placeholder until cert deployed)
#
# Integration tests: `go test -tags integration ./deploy/test/...` picks up
# the CA bundle at ./test/certs/ca.crt automatically via CERTCTL_TEST_CA_BUNDLE.
#
# See docs/test-env.md for the full walkthrough.
# =============================================================================
services:
# ---------------------------------------------------------------------------
# HTTPS-Everywhere Phase 6 — self-signed TLS bootstrap for the test harness.
# ---------------------------------------------------------------------------
# Mirrors the production `certctl-tls-init` (see docker-compose.yml §10-43)
# but writes into a *host bind mount* (./test/certs) instead of a named
# volume. The named-volume approach works fine inside Docker but hides the
# CA bundle from the Go integration test binary that runs on the host; the
# bind mount exposes /etc/certctl/tls/ca.crt at deploy/test/certs/ca.crt
# so `newTestClient()` can load it into an x509.CertPool and validate the
# self-signed server cert. Test-only divergence, explicitly documented.
#
# The generated cert has SAN=DNS:certctl-server,DNS:localhost,IP:127.0.0.1
# so both in-cluster traffic (agent → certctl-server:8443) and host traffic
# (go test → localhost:8443) validate cleanly. Destroy via
# `docker compose -f docker-compose.test.yml down -v` + `rm -rf test/certs`
# to force regeneration. Keys written 0600, certs 0644, owned 1000:1000
# (the UID the server binary runs as inside its container per Dockerfile:64).
certctl-tls-init:
image: alpine/openssl:latest
container_name: certctl-test-tls-init
restart: "no"
entrypoint: /bin/sh
command:
- -c
- |
set -eu
CERT=/etc/certctl/tls/server.crt
KEY=/etc/certctl/tls/server.key
CA=/etc/certctl/tls/ca.crt
if [ -f "$$CERT" ] && [ -f "$$KEY" ] && [ -f "$$CA" ]; then
echo "TLS cert already present at $$CERT — skipping generation"
else
mkdir -p /etc/certctl/tls
openssl req -x509 -newkey ec \
-pkeyopt ec_paramgen_curve:P-256 \
-nodes \
-keyout "$$KEY" \
-out "$$CERT" \
-days 3650 \
-subj "/CN=certctl-server" \
-addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1,IP:::1"
cp "$$CERT" "$$CA"
echo "Generated self-signed TLS cert for certctl-test-server (ECDSA-P256/SHA-256, 3650d, CN=certctl-server)"
fi
# The test server container runs as root (see `user: "0:0"` below)
# because setup-trust.sh needs to update the system trust store, so
# the perms here are really about host-side readability — 0644 on
# the CA/cert lets `go test` on the host read the bundle without a
# chown dance.
chown 1000:1000 "$$CERT" "$$KEY" "$$CA" || true
chmod 0644 "$$CERT" "$$CA"
chmod 0600 "$$KEY"
volumes:
- ./test/certs:/etc/certctl/tls
networks:
certctl-test:
ipv4_address: 10.30.50.9
# ---------------------------------------------------------------------------
# Database
# ---------------------------------------------------------------------------
#
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10): the test stack used
# to mount a hand-curated subset of migrations + seed.sql + a never-checked-in
# seed_test.sql into postgres `/docker-entrypoint-initdb.d/`. Same hazard as
# the production compose — initdb crashed any time a new migration shipped
# that the seed depended on without the mount list being updated. Post-U-3
# the schema is built EXCLUSIVELY by the server at startup via
# internal/repository/postgres.RunMigrations + RunSeed. Postgres comes up
# empty and the server lands the full ladder + baseline seed in one shot.
# `start_period: 30s` matches the production compose and shields slow CI
# runners from healthcheck flap during initdb.
postgres:
image: postgres:16-alpine
container_name: certctl-test-postgres
environment:
POSTGRES_DB: certctl
POSTGRES_USER: certctl
POSTGRES_PASSWORD: testpass
volumes:
- test_postgres_data:/var/lib/postgresql/data
networks:
certctl-test:
ipv4_address: 10.30.50.2
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U certctl -d certctl"]
interval: 5s
timeout: 5s
retries: 5
start_period: 30s
restart: unless-stopped
# ---------------------------------------------------------------------------
# Pebble — ACME test server (simulates Let's Encrypt)
# ---------------------------------------------------------------------------
# Pebble is the official ACME test server from Let's Encrypt (RFC 8555).
# It validates challenges via the companion challtestsrv.
# Root CA cert available at https://pebble:15000/roots/0 (management API).
pebble-challtestsrv:
image: ghcr.io/letsencrypt/pebble-challtestsrv:latest
container_name: certctl-test-challtestsrv
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
# Matches the official Pebble docker-compose format.
# -doh "" disables DoH (default :8443 would conflict with certctl server).
# defaultIPv4 must point to the certctl-server (10.30.50.6) because that's where
# the ACME HTTP-01 challenge server runs (port 80 inside the container).
# Pebble resolves domains via challtestsrv, then connects to this IP to validate.
command: -defaultIPv4 10.30.50.6 -defaultIPv6 "" -doh ""
networks:
certctl-test:
ipv4_address: 10.30.50.3
restart: unless-stopped
pebble:
image: ghcr.io/letsencrypt/pebble:latest
container_name: certctl-test-pebble
depends_on:
- pebble-challtestsrv
environment:
PEBBLE_VA_NOSLEEP: 1
PEBBLE_VA_ALWAYS_VALID: 0
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
command:
- -config
- /test/config/pebble-config.json
- -dnsserver
- "10.30.50.3:8053"
- -strict
volumes:
- ./test/pebble-config.json:/test/config/pebble-config.json:ro
networks:
certctl-test:
ipv4_address: 10.30.50.4
restart: unless-stopped
# ---------------------------------------------------------------------------
# step-ca — Private CA (Smallstep)
# ---------------------------------------------------------------------------
# Auto-bootstraps on first run: generates root CA + JWK provisioner "admin".
# Root cert: /home/step/certs/root_ca.crt (inside stepca_data volume)
# Provisioner key: /home/step/secrets/provisioner_key (encrypted JWK)
step-ca:
image: smallstep/step-ca:latest
container_name: certctl-test-stepca
environment:
DOCKER_STEPCA_INIT_NAME: "certctl-test-ca"
DOCKER_STEPCA_INIT_DNS_NAMES: "step-ca,localhost"
DOCKER_STEPCA_INIT_PROVISIONER_NAME: "admin"
DOCKER_STEPCA_INIT_PASSWORD: "password123"
DOCKER_STEPCA_INIT_ADDRESS: ":9000"
volumes:
- stepca_data:/home/step
networks:
certctl-test:
ipv4_address: 10.30.50.5
healthcheck:
test: ["CMD", "curl", "-fk", "https://localhost:9000/health"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 10
restart: unless-stopped
# ---------------------------------------------------------------------------
# certctl Server (Control Plane)
# ---------------------------------------------------------------------------
# Connects to PostgreSQL, Pebble (ACME), step-ca, and Local CA.
#
# TLS trust problem: Pebble and step-ca use self-signed root CAs that
# aren't in Alpine's trust store. The ACME and step-ca connectors use
# Go's default http.Client (no InsecureSkipVerify), so they need the
# CA certs in the system trust store.
#
# Solution: setup-trust.sh runs as root, fetches Pebble CA from its
# management API, copies step-ca root cert from the shared volume,
# runs update-ca-certificates, then execs the server binary.
certctl-server:
build:
context: ..
dockerfile: Dockerfile
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Node frontend stage and Go module
# download can reach the public registries behind corporate proxies.
# Defaults to empty; omit the variables from the host environment for
# un-proxied builds and the behaviour is byte-identical to the pre-fix
# tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-test-server
depends_on:
postgres:
condition: service_healthy
pebble:
condition: service_started
step-ca:
condition: service_healthy
# HTTPS-Everywhere Phase 6: block server boot until the init container
# has written server.crt / server.key / ca.crt into ./test/certs. The
# init container runs once and exits 0; service_completed_successfully
# makes that a gating dependency rather than a liveness one.
certctl-tls-init:
condition: service_completed_successfully
# Run as root so update-ca-certificates can write to /etc/ssl/certs.
# Container isolation provides the security boundary.
user: "0:0"
entrypoint: ["/bin/sh", "/app/setup-trust.sh"]
environment:
# Database
CERTCTL_DATABASE_URL: postgres://certctl:testpass@postgres:5432/certctl?sslmode=disable
# Server
CERTCTL_SERVER_HOST: 0.0.0.0
CERTCTL_SERVER_PORT: 8443
# HTTPS-Everywhere Phase 6: point the server at the init-container-generated
# cert/key pair (bind-mounted from ./test/certs). Same paths as production
# compose so the server binary code path is identical; only the host-side
# storage differs (bind mount vs named volume — see §certctl-tls-init block).
CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt
CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key
CERTCTL_LOG_LEVEL: debug
# Auth — API key required (production-like)
CERTCTL_AUTH_TYPE: api-key
CERTCTL_AUTH_SECRET: test-key-2026
# Key generation — agent-side (production-like)
CERTCTL_KEYGEN_MODE: agent
# Local CA issuer (iss-local) — self-signed mode (no CA cert/key paths)
# This is the simplest issuer, always available.
# ACME issuer (iss-acme-staging) — pointed at Pebble
CERTCTL_ACME_DIRECTORY_URL: https://pebble:14000/dir
CERTCTL_ACME_EMAIL: test@certctl.dev
CERTCTL_ACME_CHALLENGE_TYPE: http-01
CERTCTL_ACME_INSECURE: "true"
# step-ca issuer (iss-stepca)
CERTCTL_STEPCA_URL: https://step-ca:9000
CERTCTL_STEPCA_ROOT_CERT: /stepca-data/certs/root_ca.crt
CERTCTL_STEPCA_PROVISIONER: admin
CERTCTL_STEPCA_PASSWORD: password123
CERTCTL_STEPCA_KEY_PATH: /stepca-data/secrets/provisioner_key
# EST server (RFC 7030) — uses Local CA by default
CERTCTL_EST_ENABLED: "true"
CERTCTL_EST_ISSUER_ID: iss-local
# Dynamic issuer/target config encryption (M34/M35)
CERTCTL_CONFIG_ENCRYPTION_KEY: test-encryption-key-32chars!!
# Network scanning
CERTCTL_NETWORK_SCAN_ENABLED: "true"
# Post-deployment TLS verification
CERTCTL_VERIFY_DEPLOYMENT: "true"
CERTCTL_VERIFY_TIMEOUT: "10s"
CERTCTL_VERIFY_DELAY: "3s"
ports:
- "8443:8443"
volumes:
- ./test/setup-trust.sh:/app/setup-trust.sh:ro
# step-ca data volume (root cert at /certs/root_ca.crt, key at /secrets/provisioner_key)
- stepca_data:/stepca-data:ro
# HTTPS-Everywhere Phase 6: read-only bind mount of the init-generated
# TLS material. The init container writes here; server reads here; the
# agent mounts the same host path at the same container path (see below)
# so /etc/certctl/tls/ca.crt resolves to the *same* bytes on both sides.
- ./test/certs:/etc/certctl/tls:ro
networks:
certctl-test:
ipv4_address: 10.30.50.6
healthcheck:
# HTTPS-Everywhere Phase 6: healthcheck now speaks TLS with --cacert to
# verify the self-signed server cert against the init-generated bundle.
# /health requires auth when CERTCTL_AUTH_TYPE=api-key, so include the
# Bearer token. curl exits non-zero on both TLS handshake failure and
# non-2xx status — either failure keeps depends_on: {condition:
# service_healthy} from unblocking the agent, which is what we want.
test: ["CMD", "curl", "--cacert", "/etc/certctl/tls/ca.crt", "-f", "-H", "Authorization: Bearer test-key-2026", "https://localhost:8443/health"]
interval: 10s
timeout: 5s
start_period: 30s
retries: 10
restart: unless-stopped
# ---------------------------------------------------------------------------
# NGINX — TLS Target Server
# ---------------------------------------------------------------------------
# The agent deploys certificates here via the shared nginx_certs volume.
# nginx-entrypoint.sh generates a self-signed placeholder cert so NGINX
# can boot before the agent deploys a real cert.
#
# Ports: 8080 (HTTP) / 8444 (HTTPS) — offset to avoid conflict with server.
nginx:
image: nginx:alpine
container_name: certctl-test-nginx
entrypoint: ["/bin/sh", "/entrypoint.sh"]
volumes:
- ./test/nginx.conf:/etc/nginx/nginx.conf:ro
- ./test/nginx-entrypoint.sh:/entrypoint.sh:ro
- nginx_certs:/etc/nginx/certs
ports:
- "8080:80"
- "8444:443"
networks:
certctl-test:
ipv4_address: 10.30.50.7
healthcheck:
test: ["CMD-SHELL", "curl -fk https://localhost/health || exit 1"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 5
restart: unless-stopped
# ---------------------------------------------------------------------------
# certctl Agent
# ---------------------------------------------------------------------------
# Polls the server for work, generates ECDSA P-256 keys locally,
# deploys certs to NGINX via the shared volume, and discovers existing
# certs in the NGINX cert directory.
certctl-agent:
build:
context: ..
dockerfile: Dockerfile.agent
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Go module download stage can reach
# the public Go module proxy behind corporate proxies. Defaults to
# empty; omit the variables from the host environment for un-proxied
# builds and the behaviour is byte-identical to the pre-fix tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-test-agent
depends_on:
certctl-server:
condition: service_healthy
environment:
# HTTPS-Everywhere Phase 6: agent dials the server over TLS and validates
# the self-signed cert against the CA bundle pinned by
# CERTCTL_SERVER_CA_BUNDLE_PATH. Same env vars + container paths as
# production compose so the agent binary code path (loadCABundle →
# x509.CertPool → *tls.Config{RootCAs, MinVersion: TLS13}) is identical.
CERTCTL_SERVER_URL: https://certctl-server:8443
CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
CERTCTL_API_KEY: test-key-2026
CERTCTL_AGENT_NAME: test-agent-01
CERTCTL_AGENT_ID: agent-test-01
CERTCTL_KEYGEN_MODE: agent
CERTCTL_LOG_LEVEL: debug
CERTCTL_DISCOVERY_DIRS: /nginx-certs
volumes:
- agent_keys:/var/lib/certctl/keys
- nginx_certs:/nginx-certs
# HTTPS-Everywhere Phase 6: same bind mount as the server, same path,
# so /etc/certctl/tls/ca.crt resolves to the identical bytes. This is
# the only way the CN=certctl-server cert validates on the agent side.
- ./test/certs:/etc/certctl/tls:ro
networks:
certctl-test:
ipv4_address: 10.30.50.8
restart: unless-stopped
# =============================================================================
# Network
# =============================================================================
# Static IPs are required because:
# - Pebble needs to know the challtestsrv DNS server address (10.30.50.3)
# - challtestsrv resolves all domains to certctl-server (10.30.50.6) for HTTP-01 challenges
# - Avoids DNS race conditions during startup
networks:
certctl-test:
driver: bridge
ipam:
config:
- subnet: 10.30.50.0/24
# =============================================================================
# Volumes
# =============================================================================
volumes:
test_postgres_data:
driver: local
stepca_data:
driver: local
agent_keys:
driver: local
nginx_certs:
driver: local