mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:31:29 +00:00
a3d8b9c607
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
247 lines
10 KiB
Go
247 lines
10 KiB
Go
// Integration tests for the U-3 schema-vs-seed coupling fix.
|
|
//
|
|
// Pre-U-3 the deploy compose stack mounted both a hand-curated subset of
|
|
// `migrations/*.up.sql` and `seed.sql` into postgres
|
|
// `/docker-entrypoint-initdb.d/`. Postgres applied them at initdb time.
|
|
// When `seed.sql` was updated to reference columns added by migrations
|
|
// *after* the mounted cutoff (e.g., `policy_rules.severity` from
|
|
// `000013_policy_rule_severity.up.sql`), initdb crashed during the seed
|
|
// step and the container was reported `unhealthy` indefinitely.
|
|
//
|
|
// Post-U-3 the schema is built EXCLUSIVELY by the server at startup via
|
|
// internal/repository/postgres.RunMigrations + RunSeed. These tests pin
|
|
// that contract: RunSeed must complete without error against a freshly
|
|
// migrated database, and re-application must be idempotent so server
|
|
// restarts don't double-insert.
|
|
//
|
|
// Skipped under -short to keep CI fast lanes green; the integration lane
|
|
// runs them via the testcontainers harness.
|
|
package postgres_test
|
|
|
|
import (
|
|
"context"
|
|
"database/sql"
|
|
"testing"
|
|
|
|
"github.com/shankar0123/certctl/internal/repository/postgres"
|
|
)
|
|
|
|
// TestRunSeed_AppliesIdempotently verifies the U-3 contract that RunSeed
|
|
// can be called repeatedly against a populated database without error and
|
|
// without producing duplicate rows. The server invokes RunSeed on EVERY
|
|
// boot (it has no migration-state table to skip from), so any non-
|
|
// idempotent INSERT in seed.sql would crash the container loop on the
|
|
// second start.
|
|
//
|
|
// The assertion uses renewal_policies.id='rp-default' as a witness — that
|
|
// row is the most-referenced FK target in the seed (it's the default
|
|
// renewal policy attached to every certificate that doesn't override).
|
|
// If the seed double-inserted, we'd see SQLSTATE 23505 from the second
|
|
// RunSeed call. If the seed silently ON CONFLICT-DO-NOTHING'd as
|
|
// designed, the row count stays at exactly 1.
|
|
func TestRunSeed_AppliesIdempotently(t *testing.T) {
|
|
tdb := getTestDB(t)
|
|
db := tdb.freshSchema(t)
|
|
ctx := context.Background()
|
|
|
|
migrationsPath := findMigrationsDir()
|
|
|
|
// Apply the seed twice — second call simulates a server restart on a
|
|
// populated database. Both must succeed; pre-U-3 the second call
|
|
// would fail with 23505 if any INSERT lacked ON CONFLICT.
|
|
if err := postgres.RunSeed(db, migrationsPath); err != nil {
|
|
t.Fatalf("RunSeed (first call) returned error: %v", err)
|
|
}
|
|
if err := postgres.RunSeed(db, migrationsPath); err != nil {
|
|
t.Fatalf("RunSeed (second call — idempotency check) returned error: %v\n"+
|
|
"This means the seed produced a duplicate row; every INSERT in seed.sql "+
|
|
"must use ON CONFLICT (id) DO NOTHING because the server applies the "+
|
|
"seed on EVERY start.", err)
|
|
}
|
|
|
|
// Witness check: rp-default is the renewal policy every cert defaults
|
|
// to. Exactly one row must exist after two seed applications.
|
|
var count int
|
|
err := db.QueryRowContext(ctx,
|
|
`SELECT COUNT(*) FROM renewal_policies WHERE id = 'rp-default'`,
|
|
).Scan(&count)
|
|
if err != nil {
|
|
t.Fatalf("witness query failed: %v", err)
|
|
}
|
|
if count != 1 {
|
|
t.Errorf("renewal_policies WHERE id='rp-default' returned %d rows after two RunSeed calls; want exactly 1 (ON CONFLICT idempotency contract)", count)
|
|
}
|
|
}
|
|
|
|
// TestRunSeed_MissingFileIsNoOp verifies the fail-soft contract documented
|
|
// on RunSeed: an operator who deletes seed.sql for custom packaging (CI
|
|
// pipelines that bake their own seeds, cert-manager managed deployments)
|
|
// must still get a healthy server boot. RunSeed returning nil for a
|
|
// missing file is the only way to hold this contract — returning an error
|
|
// would force every minimal-image deployment to ship the seed file just
|
|
// to satisfy a no-op load.
|
|
//
|
|
// We point at a directory that exists (empty temp dir) but contains no
|
|
// seed.sql. RunSeed must return nil silently.
|
|
func TestRunSeed_MissingFileIsNoOp(t *testing.T) {
|
|
if testing.Short() {
|
|
t.Skip("skipping integration test in short mode")
|
|
}
|
|
|
|
// Use a brand-new empty directory so seed.sql is unambiguously absent.
|
|
emptyDir := t.TempDir()
|
|
|
|
// Pass a nil *sql.DB on purpose — RunSeed must short-circuit on the
|
|
// missing file BEFORE touching the DB. If the implementation ever
|
|
// regresses and tries to db.Exec(string(content)) with nil content,
|
|
// this will surface as a nil-deref instead of a silent corruption.
|
|
var db *sql.DB
|
|
if err := postgres.RunSeed(db, emptyDir); err != nil {
|
|
t.Fatalf("RunSeed against an empty directory should return nil; got: %v", err)
|
|
}
|
|
}
|
|
|
|
// TestRunDemoSeed_AppliesIdempotently mirrors the RunSeed idempotency
|
|
// contract for the demo overlay. The compose demo stack
|
|
// (deploy/docker-compose.demo.yml) sets CERTCTL_DEMO_SEED=true; the
|
|
// server applies seed_demo.sql at every boot. Same constraint as the
|
|
// baseline seed: if any INSERT lacks ON CONFLICT, the server will
|
|
// crash-loop on restart.
|
|
//
|
|
// Witness: seed_demo.sql inserts t-platform into the teams table at line
|
|
// 11. That row is referenced by every demo-team-owned certificate, so
|
|
// duplicate-insertion would block the entire demo on restart.
|
|
func TestRunDemoSeed_AppliesIdempotently(t *testing.T) {
|
|
tdb := getTestDB(t)
|
|
db := tdb.freshSchema(t)
|
|
ctx := context.Background()
|
|
|
|
migrationsPath := findMigrationsDir()
|
|
|
|
// Order matters — RunSeed must run first so the FK targets the demo
|
|
// seed depends on (rp-* renewal policies, etc.) exist before the
|
|
// demo INSERTs run. This mirrors the order in cmd/server/main.go.
|
|
if err := postgres.RunSeed(db, migrationsPath); err != nil {
|
|
t.Fatalf("RunSeed prerequisite failed: %v", err)
|
|
}
|
|
|
|
if err := postgres.RunDemoSeed(db, migrationsPath); err != nil {
|
|
t.Fatalf("RunDemoSeed (first call) returned error: %v", err)
|
|
}
|
|
if err := postgres.RunDemoSeed(db, migrationsPath); err != nil {
|
|
t.Fatalf("RunDemoSeed (second call — idempotency check) returned error: %v", err)
|
|
}
|
|
|
|
var count int
|
|
err := db.QueryRowContext(ctx,
|
|
`SELECT COUNT(*) FROM teams WHERE id = 't-platform'`,
|
|
).Scan(&count)
|
|
if err != nil {
|
|
t.Fatalf("witness query failed: %v", err)
|
|
}
|
|
if count != 1 {
|
|
t.Errorf("teams WHERE id='t-platform' returned %d rows after two RunDemoSeed calls; want exactly 1", count)
|
|
}
|
|
}
|
|
|
|
// TestMigration000017_RetryIntervalRename verifies the U-3 ride-along
|
|
// column rename: renewal_policies.retry_interval_minutes →
|
|
// retry_interval_seconds (cat-o-retry_interval_unit_mismatch). The unit
|
|
// was always seconds in practice — the column name lied. Migration 000017
|
|
// renames the column with a DO $$ guard so re-application is safe.
|
|
//
|
|
// After all migrations have been applied (which the test harness does in
|
|
// freshSchema), the new column must exist and the old column must NOT.
|
|
// information_schema.columns is the source of truth for both checks.
|
|
func TestMigration000017_RetryIntervalRename(t *testing.T) {
|
|
tdb := getTestDB(t)
|
|
db := tdb.freshSchema(t)
|
|
ctx := context.Background()
|
|
|
|
// Helper — true iff the named column exists on renewal_policies.
|
|
hasColumn := func(name string) bool {
|
|
t.Helper()
|
|
var n int
|
|
err := db.QueryRowContext(ctx, `
|
|
SELECT COUNT(*) FROM information_schema.columns
|
|
WHERE table_name = 'renewal_policies' AND column_name = $1
|
|
`, name).Scan(&n)
|
|
if err != nil {
|
|
t.Fatalf("information_schema query for column %q failed: %v", name, err)
|
|
}
|
|
return n > 0
|
|
}
|
|
|
|
if !hasColumn("retry_interval_seconds") {
|
|
t.Error("renewal_policies.retry_interval_seconds is missing — migration 000017 did not apply, or it was applied before the rename block")
|
|
}
|
|
if hasColumn("retry_interval_minutes") {
|
|
t.Error("renewal_policies.retry_interval_minutes still exists — the rename in migration 000017 must drop the old name (cat-o-retry_interval_unit_mismatch)")
|
|
}
|
|
}
|
|
|
|
// TestMigration000017_NotificationCreatedAt verifies the U-3 ride-along
|
|
// column add: notification_events.created_at NOT NULL DEFAULT NOW()
|
|
// (cat-o-notification_created_at_dead_field). Pre-U-3 the Go domain had
|
|
// the field but the DB lacked the column, so the JSON API serialised
|
|
// 0001-01-01.
|
|
func TestMigration000017_NotificationCreatedAt(t *testing.T) {
|
|
tdb := getTestDB(t)
|
|
db := tdb.freshSchema(t)
|
|
ctx := context.Background()
|
|
|
|
var dataType, isNullable, columnDefault sql.NullString
|
|
err := db.QueryRowContext(ctx, `
|
|
SELECT data_type, is_nullable, column_default
|
|
FROM information_schema.columns
|
|
WHERE table_name = 'notification_events' AND column_name = 'created_at'
|
|
`).Scan(&dataType, &isNullable, &columnDefault)
|
|
if err != nil {
|
|
t.Fatalf("information_schema query for created_at failed: %v\n"+
|
|
"Migration 000017 should have added notification_events.created_at TIMESTAMPTZ NOT NULL DEFAULT NOW().", err)
|
|
}
|
|
|
|
if dataType.String != "timestamp with time zone" {
|
|
t.Errorf("notification_events.created_at data_type = %q, want %q",
|
|
dataType.String, "timestamp with time zone")
|
|
}
|
|
if isNullable.String != "NO" {
|
|
t.Errorf("notification_events.created_at is_nullable = %q, want NO (the column must be NOT NULL so legacy rows get the DEFAULT)",
|
|
isNullable.String)
|
|
}
|
|
if columnDefault.String == "" {
|
|
t.Error("notification_events.created_at has no DEFAULT — legacy rows added before migration 000017 would fail the NOT NULL gate without one")
|
|
}
|
|
}
|
|
|
|
// TestMigration000017_HealthCheckOrphansDropped verifies the U-3
|
|
// ride-along column drop: network_scan_targets lost the orphan
|
|
// health_check_enabled / health_check_interval_seconds columns
|
|
// (cat-o-health_check_column_orphans). These were declared by an early
|
|
// migration but never wired into Go code — schema noise that confused
|
|
// operators reading raw SQL. Migration 000017 drops them.
|
|
func TestMigration000017_HealthCheckOrphansDropped(t *testing.T) {
|
|
tdb := getTestDB(t)
|
|
db := tdb.freshSchema(t)
|
|
ctx := context.Background()
|
|
|
|
hasColumn := func(name string) bool {
|
|
t.Helper()
|
|
var n int
|
|
err := db.QueryRowContext(ctx, `
|
|
SELECT COUNT(*) FROM information_schema.columns
|
|
WHERE table_name = 'network_scan_targets' AND column_name = $1
|
|
`, name).Scan(&n)
|
|
if err != nil {
|
|
t.Fatalf("information_schema query for column %q failed: %v", name, err)
|
|
}
|
|
return n > 0
|
|
}
|
|
|
|
for _, col := range []string{"health_check_enabled", "health_check_interval_seconds"} {
|
|
if hasColumn(col) {
|
|
t.Errorf("network_scan_targets.%s still exists — migration 000017 must drop it (cat-o-health_check_column_orphans)", col)
|
|
}
|
|
}
|
|
}
|