mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:41:30 +00:00
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
This commit is contained in:
@@ -7,8 +7,20 @@
|
||||
# To start fresh (wipe previous data):
|
||||
# docker compose -f docker-compose.yml -f docker-compose.demo.yml down -v
|
||||
# docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build
|
||||
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 this overlay mounted
|
||||
# `seed_demo.sql` into postgres `/docker-entrypoint-initdb.d/`. That worked
|
||||
# only because the production stack also mounted the migrations there, so
|
||||
# the schema existed at initdb time. Once U-3 dropped the production
|
||||
# initdb mounts (single source of truth: server runs RunMigrations + RunSeed
|
||||
# at boot), the demo seed could no longer be applied at initdb time — the
|
||||
# tables it references wouldn't exist yet.
|
||||
#
|
||||
# Post-U-3 the demo overlay just sets CERTCTL_DEMO_SEED=true; the server
|
||||
# applies seed_demo.sql at boot via postgres.RunDemoSeed AFTER baseline
|
||||
# migrations + seed.sql are in place. Same single source of truth, no
|
||||
# initdb mounts, no schema-vs-seed drift.
|
||||
services:
|
||||
postgres:
|
||||
volumes:
|
||||
- ../migrations/seed_demo.sql:/docker-entrypoint-initdb.d/030_seed_demo.sql
|
||||
certctl-server:
|
||||
environment:
|
||||
CERTCTL_DEMO_SEED: "true"
|
||||
|
||||
@@ -93,6 +93,17 @@ services:
|
||||
# ---------------------------------------------------------------------------
|
||||
# Database
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10): the test stack used
|
||||
# to mount a hand-curated subset of migrations + seed.sql + a never-checked-in
|
||||
# seed_test.sql into postgres `/docker-entrypoint-initdb.d/`. Same hazard as
|
||||
# the production compose — initdb crashed any time a new migration shipped
|
||||
# that the seed depended on without the mount list being updated. Post-U-3
|
||||
# the schema is built EXCLUSIVELY by the server at startup via
|
||||
# internal/repository/postgres.RunMigrations + RunSeed. Postgres comes up
|
||||
# empty and the server lands the full ladder + baseline seed in one shot.
|
||||
# `start_period: 30s` matches the production compose and shields slow CI
|
||||
# runners from healthcheck flap during initdb.
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: certctl-test-postgres
|
||||
@@ -102,19 +113,6 @@ services:
|
||||
POSTGRES_PASSWORD: testpass
|
||||
volumes:
|
||||
- test_postgres_data:/var/lib/postgresql/data
|
||||
- ../migrations/000001_initial_schema.up.sql:/docker-entrypoint-initdb.d/001_schema.sql
|
||||
- ../migrations/000002_agent_metadata.up.sql:/docker-entrypoint-initdb.d/002_agent_metadata.sql
|
||||
- ../migrations/000003_certificate_profiles.up.sql:/docker-entrypoint-initdb.d/003_certificate_profiles.sql
|
||||
- ../migrations/000004_agent_groups.up.sql:/docker-entrypoint-initdb.d/004_agent_groups.sql
|
||||
- ../migrations/000005_revocation.up.sql:/docker-entrypoint-initdb.d/005_revocation.sql
|
||||
- ../migrations/000006_discovery.up.sql:/docker-entrypoint-initdb.d/006_discovery.sql
|
||||
- ../migrations/000007_network_discovery.up.sql:/docker-entrypoint-initdb.d/007_network_discovery.sql
|
||||
- ../migrations/000008_verification.up.sql:/docker-entrypoint-initdb.d/008_verification.sql
|
||||
- ../migrations/000009_issuer_config.up.sql:/docker-entrypoint-initdb.d/009_issuer_config.sql
|
||||
- ../migrations/000010_target_config.up.sql:/docker-entrypoint-initdb.d/010_target_config.sql
|
||||
- ../migrations/seed.sql:/docker-entrypoint-initdb.d/020_seed.sql
|
||||
- ../migrations/seed_test.sql:/docker-entrypoint-initdb.d/025_seed_test.sql
|
||||
# No seed_demo.sql — start with a clean database for real testing
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.2
|
||||
@@ -125,6 +123,7 @@ services:
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
+29
-11
@@ -53,6 +53,29 @@ services:
|
||||
- certctl-network
|
||||
|
||||
# PostgreSQL database
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10):
|
||||
# Pre-U-3 this stack mounted a hand-curated subset of `migrations/*.up.sql`
|
||||
# plus `seed.sql` into `/docker-entrypoint-initdb.d/`, and postgres
|
||||
# initdb-applied them on first boot. The mount list rotted every time a
|
||||
# new migration shipped that the seed depended on (000013 added
|
||||
# policy_rules.severity, 000017 renames retry_interval_minutes, etc.) —
|
||||
# initdb crashed, the container reported `unhealthy` indefinitely, and
|
||||
# `docker compose -f deploy/docker-compose.yml up -d --build` from a
|
||||
# fresh clone of v2.0.50 hit it on the first try.
|
||||
#
|
||||
# Post-U-3 the schema is built EXCLUSIVELY by the server at startup via
|
||||
# internal/repository/postgres.RunMigrations + RunSeed. Single source of
|
||||
# truth, no list to keep in sync. Postgres comes up empty; the server
|
||||
# waits for it healthy, then applies the full migration ladder + seed in
|
||||
# one shot. Helm + the dev examples were already runtime-only (Path B)
|
||||
# and worked through the same window.
|
||||
#
|
||||
# `start_period: 30s` gives postgres room to bootstrap on slow runners
|
||||
# (CI macOS, low-spec laptops) before the healthcheck failure counter
|
||||
# starts ticking. Pre-U-3 a slow first-init combined with the
|
||||
# `unhealthy` flap to cascade into certctl-server's `service_healthy`
|
||||
# depends_on, blocking the whole stack.
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: certctl-postgres
|
||||
@@ -64,17 +87,6 @@ services:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
- ../migrations/000001_initial_schema.up.sql:/docker-entrypoint-initdb.d/001_schema.sql
|
||||
- ../migrations/000002_agent_metadata.up.sql:/docker-entrypoint-initdb.d/002_agent_metadata.sql
|
||||
- ../migrations/000003_certificate_profiles.up.sql:/docker-entrypoint-initdb.d/003_certificate_profiles.sql
|
||||
- ../migrations/000004_agent_groups.up.sql:/docker-entrypoint-initdb.d/004_agent_groups.sql
|
||||
- ../migrations/000005_revocation.up.sql:/docker-entrypoint-initdb.d/005_revocation.sql
|
||||
- ../migrations/000006_discovery.up.sql:/docker-entrypoint-initdb.d/006_discovery.sql
|
||||
- ../migrations/000007_network_discovery.up.sql:/docker-entrypoint-initdb.d/007_network_discovery.sql
|
||||
- ../migrations/000008_verification.up.sql:/docker-entrypoint-initdb.d/008_verification.sql
|
||||
- ../migrations/000009_issuer_config.up.sql:/docker-entrypoint-initdb.d/009_issuer_config.sql
|
||||
- ../migrations/000010_target_config.up.sql:/docker-entrypoint-initdb.d/010_target_config.sql
|
||||
- ../migrations/seed.sql:/docker-entrypoint-initdb.d/020_seed.sql
|
||||
networks:
|
||||
- certctl-network
|
||||
healthcheck:
|
||||
@@ -82,6 +94,7 @@ services:
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
|
||||
# Certctl Server (API + scheduler)
|
||||
@@ -127,6 +140,11 @@ services:
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
# U-3: server boot now does RunMigrations + RunSeed before listening on
|
||||
# 8443. On a fresh clone the full migration ladder + seed application
|
||||
# can take ~10s on a small VM; start_period prevents the first few
|
||||
# healthcheck attempts from counting as failures while that work runs.
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
logging:
|
||||
driver: "json-file"
|
||||
|
||||
Reference in New Issue
Block a user