mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 11:28:54 +00:00
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
This commit is contained in:
@@ -339,6 +339,95 @@ func TestNotificationRepository_Requeue(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
// TestNotificationRepository_CreatedAt_IsPersisted is the U-3 ride-along
|
||||
// regression for cat-o-notification_created_at_dead_field. Pre-U-3 the
|
||||
// Go domain.NotificationEvent had a CreatedAt field but the DB had no
|
||||
// column — JSON serialisation produced 0001-01-01T00:00:00Z, breaking
|
||||
// timestamp ordering on operator dashboards. Post-U-3 migration 000017
|
||||
// adds the column NOT NULL DEFAULT NOW(), Create populates it, and
|
||||
// scanNotification reads it back.
|
||||
//
|
||||
// The contract under test is round-trip equivalence: the timestamp the
|
||||
// caller sets goes into the DB and comes back out unchanged (modulo
|
||||
// PostgreSQL's microsecond precision). Truncate to microseconds before
|
||||
// comparing because TIMESTAMPTZ rounds nanoseconds away.
|
||||
func TestNotificationRepository_CreatedAt_IsPersisted(t *testing.T) {
|
||||
tdb := getTestDB(t)
|
||||
db := tdb.freshSchema(t)
|
||||
repo := postgres.NewNotificationRepository(db)
|
||||
ctx := context.Background()
|
||||
|
||||
// A specific, recognisable timestamp. Truncated to microseconds so
|
||||
// the post-roundtrip equality assertion isn't tripped up by Postgres
|
||||
// dropping the nanosecond tail.
|
||||
want := time.Now().UTC().Add(-2 * time.Hour).Truncate(time.Microsecond)
|
||||
|
||||
notif := &domain.NotificationEvent{
|
||||
Type: domain.NotificationTypeExpirationWarning,
|
||||
Channel: domain.NotificationChannelWebhook,
|
||||
Recipient: "https://hooks.example.com/u3",
|
||||
Message: "U-3 round-trip witness",
|
||||
Status: string(domain.NotificationStatusPending),
|
||||
CreatedAt: want,
|
||||
}
|
||||
if err := repo.Create(ctx, notif); err != nil {
|
||||
t.Fatalf("Create failed: %v", err)
|
||||
}
|
||||
|
||||
// Re-read via List (which goes through scanNotification) so we're
|
||||
// testing both the INSERT and SELECT halves of the U-3 plumbing.
|
||||
got, err := repo.List(ctx, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("List failed: %v", err)
|
||||
}
|
||||
if len(got) != 1 {
|
||||
t.Fatalf("List returned %d rows, want 1", len(got))
|
||||
}
|
||||
if !got[0].CreatedAt.Equal(want) {
|
||||
t.Errorf("CreatedAt round-trip mismatch:\n set: %v\n got: %v\n"+
|
||||
"Pre-U-3 this would have come back as 0001-01-01 because the column didn't exist.",
|
||||
want, got[0].CreatedAt)
|
||||
}
|
||||
}
|
||||
|
||||
// TestNotificationRepository_CreatedAt_DefaultsToNow verifies the helper
|
||||
// behavior in Create: when the caller hands over an event with the
|
||||
// zero-value CreatedAt, Create substitutes time.Now() rather than
|
||||
// trusting the DB DEFAULT. This keeps wire-level JSON consistent with
|
||||
// what the row will hold once it's read back, and avoids a clock-skew
|
||||
// gap between "Go computed the timestamp" and "DB applied DEFAULT NOW()".
|
||||
func TestNotificationRepository_CreatedAt_DefaultsToNow(t *testing.T) {
|
||||
tdb := getTestDB(t)
|
||||
db := tdb.freshSchema(t)
|
||||
repo := postgres.NewNotificationRepository(db)
|
||||
ctx := context.Background()
|
||||
|
||||
before := time.Now().UTC().Add(-time.Second)
|
||||
|
||||
notif := &domain.NotificationEvent{
|
||||
Type: domain.NotificationTypeExpirationWarning,
|
||||
Channel: domain.NotificationChannelWebhook,
|
||||
Recipient: "https://hooks.example.com/zerotime",
|
||||
Message: "U-3 zero-time fallback",
|
||||
Status: string(domain.NotificationStatusPending),
|
||||
// CreatedAt left zero on purpose — the contract is that Create
|
||||
// fills it in from time.Now() when it's unset.
|
||||
}
|
||||
if err := repo.Create(ctx, notif); err != nil {
|
||||
t.Fatalf("Create failed: %v", err)
|
||||
}
|
||||
|
||||
after := time.Now().UTC().Add(time.Second)
|
||||
|
||||
if notif.CreatedAt.IsZero() {
|
||||
t.Fatalf("CreatedAt is still zero after Create — the fallback in NotificationRepository.Create did not fire")
|
||||
}
|
||||
if notif.CreatedAt.Before(before) || notif.CreatedAt.After(after) {
|
||||
t.Errorf("CreatedAt = %v is outside the [%v, %v] window — the substituted time.Now() should fall inside the test's wall-clock bracket",
|
||||
notif.CreatedAt, before, after)
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Helpers ──────────────────────────────────────────────────────────────
|
||||
|
||||
// past returns a stable "5 minutes ago" time for fixture seeding. Truncated
|
||||
|
||||
Reference in New Issue
Block a user