mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:21:32 +00:00
a3d8b9c607
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
218 lines
8.1 KiB
YAML
218 lines
8.1 KiB
YAML
services:
|
|
# HTTPS-Everywhere Phase 3 — self-signed TLS bootstrap (init container).
|
|
# Generates a CN=certctl-server ECDSA-P256 (SHA-256 signature) cert with
|
|
# the SAN list locked by milestone §3.6 on first boot; subsequent boots
|
|
# see the cert already present in the `certs` named volume and no-op out.
|
|
# Server + agent mount the volume read-only. Destroy via `docker compose
|
|
# down -v` to force regeneration. This bootstrap is for docker-compose
|
|
# demos and local dev only; Helm operators supply a Secret / cert-manager
|
|
# Certificate per docs/tls.md.
|
|
#
|
|
# Rationale for ECDSA-P256 (was ed25519 pre-v2.0.48): Apple's TLS stack
|
|
# — Safari Network Framework and the macOS-bundled LibreSSL 3.3.6
|
|
# /usr/bin/curl — does not advertise ed25519 in the ClientHello
|
|
# signature_algorithms extension for server certs, yielding "tls: peer
|
|
# doesn't support any of the certificate's signature algorithms" at
|
|
# handshake. ECDSA-P256 with SHA-256 is universally supported. See
|
|
# docs/tls.md Pattern 1.
|
|
certctl-tls-init:
|
|
image: alpine/openssl:latest
|
|
container_name: certctl-tls-init
|
|
restart: "no"
|
|
entrypoint: /bin/sh
|
|
command:
|
|
- -c
|
|
- |
|
|
set -eu
|
|
CERT=/etc/certctl/tls/server.crt
|
|
KEY=/etc/certctl/tls/server.key
|
|
CA=/etc/certctl/tls/ca.crt
|
|
if [ -f "$$CERT" ] && [ -f "$$KEY" ] && [ -f "$$CA" ]; then
|
|
echo "TLS cert already present at $$CERT — skipping generation"
|
|
else
|
|
mkdir -p /etc/certctl/tls
|
|
openssl req -x509 -newkey ec \
|
|
-pkeyopt ec_paramgen_curve:P-256 \
|
|
-nodes \
|
|
-keyout "$$KEY" \
|
|
-out "$$CERT" \
|
|
-days 3650 \
|
|
-subj "/CN=certctl-server" \
|
|
-addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1,IP:::1"
|
|
cp "$$CERT" "$$CA"
|
|
echo "Generated self-signed TLS cert for certctl-server (ECDSA-P256/SHA-256, 3650d, CN=certctl-server)"
|
|
fi
|
|
# certctl binary runs as UID 1000 inside the server container per
|
|
# Dockerfile:64-65; the cert + key must be readable by that UID.
|
|
chown 1000:1000 "$$CERT" "$$KEY" "$$CA"
|
|
chmod 0644 "$$CERT" "$$CA"
|
|
chmod 0600 "$$KEY"
|
|
volumes:
|
|
- certs:/etc/certctl/tls
|
|
networks:
|
|
- certctl-network
|
|
|
|
# PostgreSQL database
|
|
#
|
|
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10):
|
|
# Pre-U-3 this stack mounted a hand-curated subset of `migrations/*.up.sql`
|
|
# plus `seed.sql` into `/docker-entrypoint-initdb.d/`, and postgres
|
|
# initdb-applied them on first boot. The mount list rotted every time a
|
|
# new migration shipped that the seed depended on (000013 added
|
|
# policy_rules.severity, 000017 renames retry_interval_minutes, etc.) —
|
|
# initdb crashed, the container reported `unhealthy` indefinitely, and
|
|
# `docker compose -f deploy/docker-compose.yml up -d --build` from a
|
|
# fresh clone of v2.0.50 hit it on the first try.
|
|
#
|
|
# Post-U-3 the schema is built EXCLUSIVELY by the server at startup via
|
|
# internal/repository/postgres.RunMigrations + RunSeed. Single source of
|
|
# truth, no list to keep in sync. Postgres comes up empty; the server
|
|
# waits for it healthy, then applies the full migration ladder + seed in
|
|
# one shot. Helm + the dev examples were already runtime-only (Path B)
|
|
# and worked through the same window.
|
|
#
|
|
# `start_period: 30s` gives postgres room to bootstrap on slow runners
|
|
# (CI macOS, low-spec laptops) before the healthcheck failure counter
|
|
# starts ticking. Pre-U-3 a slow first-init combined with the
|
|
# `unhealthy` flap to cascade into certctl-server's `service_healthy`
|
|
# depends_on, blocking the whole stack.
|
|
postgres:
|
|
image: postgres:16-alpine
|
|
container_name: certctl-postgres
|
|
environment:
|
|
POSTGRES_DB: certctl
|
|
POSTGRES_USER: certctl
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-certctl}
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- postgres_data:/var/lib/postgresql/data
|
|
networks:
|
|
- certctl-network
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U certctl -d certctl"]
|
|
interval: 5s
|
|
timeout: 5s
|
|
retries: 5
|
|
start_period: 30s
|
|
restart: unless-stopped
|
|
|
|
# Certctl Server (API + scheduler)
|
|
certctl-server:
|
|
build:
|
|
context: ..
|
|
dockerfile: Dockerfile
|
|
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
|
# vars into the Docker build so the Node frontend stage and Go module
|
|
# download can reach the public registries behind corporate proxies.
|
|
# Defaults to empty; omit the variables from the host environment for
|
|
# un-proxied builds and the behaviour is byte-identical to the pre-fix
|
|
# tree.
|
|
args:
|
|
HTTP_PROXY: ${HTTP_PROXY:-}
|
|
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
|
NO_PROXY: ${NO_PROXY:-}
|
|
container_name: certctl-server
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
certctl-tls-init:
|
|
condition: service_completed_successfully
|
|
environment:
|
|
CERTCTL_DATABASE_URL: postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/certctl?sslmode=disable
|
|
CERTCTL_SERVER_HOST: 0.0.0.0
|
|
CERTCTL_SERVER_PORT: 8443
|
|
CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt
|
|
CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key
|
|
CERTCTL_LOG_LEVEL: info
|
|
CERTCTL_AUTH_TYPE: none
|
|
CERTCTL_KEYGEN_MODE: server # Demo uses server-side keygen; production should use "agent"
|
|
CERTCTL_NETWORK_SCAN_ENABLED: "true" # Enable network scan GUI with seeded demo targets
|
|
CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key} # AES-256-GCM for dynamic issuer/target config
|
|
ports:
|
|
- "8443:8443"
|
|
volumes:
|
|
- certs:/etc/certctl/tls:ro
|
|
networks:
|
|
- certctl-network
|
|
healthcheck:
|
|
test: ["CMD", "curl", "--cacert", "/etc/certctl/tls/ca.crt", "-f", "https://localhost:8443/health"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
# U-3: server boot now does RunMigrations + RunSeed before listening on
|
|
# 8443. On a fresh clone the full migration ladder + seed application
|
|
# can take ~10s on a small VM; start_period prevents the first few
|
|
# healthcheck attempts from counting as failures while that work runs.
|
|
start_period: 30s
|
|
restart: unless-stopped
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "10m"
|
|
max-file: "3"
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '1.0'
|
|
memory: 512M
|
|
|
|
# Certctl Agent
|
|
certctl-agent:
|
|
build:
|
|
context: ..
|
|
dockerfile: Dockerfile.agent
|
|
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
|
# vars into the Docker build so the Go module download stage can reach
|
|
# the public Go module proxy behind corporate proxies. Defaults to
|
|
# empty; omit the variables from the host environment for un-proxied
|
|
# builds and the behaviour is byte-identical to the pre-fix tree.
|
|
args:
|
|
HTTP_PROXY: ${HTTP_PROXY:-}
|
|
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
|
NO_PROXY: ${NO_PROXY:-}
|
|
container_name: certctl-agent
|
|
depends_on:
|
|
certctl-server:
|
|
condition: service_healthy
|
|
environment:
|
|
CERTCTL_SERVER_URL: https://certctl-server:8443
|
|
CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
|
|
CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
|
|
CERTCTL_AGENT_NAME: docker-agent
|
|
CERTCTL_LOG_LEVEL: info
|
|
CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys # Agent scans this directory for existing certificates
|
|
volumes:
|
|
- agent_keys:/var/lib/certctl/keys
|
|
- certs:/etc/certctl/tls:ro
|
|
networks:
|
|
- certctl-network
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pgrep -f certctl-agent || exit 1"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
restart: unless-stopped
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "10m"
|
|
max-file: "3"
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '0.5'
|
|
memory: 256M
|
|
|
|
networks:
|
|
certctl-network:
|
|
driver: bridge
|
|
|
|
volumes:
|
|
postgres_data:
|
|
driver: local
|
|
agent_keys:
|
|
driver: local
|
|
certs:
|
|
driver: local
|