mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:41:30 +00:00
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
This commit is contained in:
+55
-4
@@ -86,6 +86,41 @@ func main() {
|
||||
}
|
||||
logger.Info("migrations completed")
|
||||
|
||||
// Apply baseline seed data.
|
||||
//
|
||||
// U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 seed.sql was mounted
|
||||
// into postgres `/docker-entrypoint-initdb.d/` alongside a hand-curated
|
||||
// subset of migrations. Adding a migration that introduced a new column
|
||||
// referenced by seed.sql (cat-o-retry_interval_unit_mismatch /
|
||||
// policy_rules.severity / etc.) without also updating the compose volume
|
||||
// mounts caused initdb to crash on first up. Post-U-3 the compose stack
|
||||
// drops all initdb mounts; postgres comes up with empty schema, the
|
||||
// server runs RunMigrations above, then this RunSeed call lands the
|
||||
// baseline data — all from a single source of truth (this binary).
|
||||
// See internal/repository/postgres/db.go::RunSeed for the contract.
|
||||
logger.Info("applying baseline seed", "path", cfg.Database.MigrationsPath)
|
||||
if err := postgres.RunSeed(db, cfg.Database.MigrationsPath); err != nil {
|
||||
logger.Error("failed to apply seed data", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
logger.Info("seed completed")
|
||||
|
||||
// Apply demo overlay seed when CERTCTL_DEMO_SEED=true. Pre-U-3 the demo
|
||||
// overlay (deploy/docker-compose.demo.yml) mounted seed_demo.sql into
|
||||
// postgres `/docker-entrypoint-initdb.d/`; that broke once U-3 dropped
|
||||
// the initdb migration mounts (the demo seed references tables that
|
||||
// wouldn't exist at initdb time). The runtime path here is the
|
||||
// post-U-3 replacement. Default-off so a vanilla deploy never lands
|
||||
// fake-history rows. See postgres.RunDemoSeed for the contract.
|
||||
if cfg.Database.DemoSeed {
|
||||
logger.Info("applying demo seed (CERTCTL_DEMO_SEED=true)", "path", cfg.Database.MigrationsPath)
|
||||
if err := postgres.RunDemoSeed(db, cfg.Database.MigrationsPath); err != nil {
|
||||
logger.Error("failed to apply demo seed data", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
logger.Info("demo seed completed")
|
||||
}
|
||||
|
||||
// Initialize repositories with real PostgreSQL connection
|
||||
auditRepo := postgres.NewAuditRepository(db)
|
||||
certificateRepo := postgres.NewCertificateRepository(db)
|
||||
@@ -406,6 +441,13 @@ func main() {
|
||||
statsHandler := handler.NewStatsHandler(statsService)
|
||||
metricsHandler := handler.NewMetricsHandler(statsService, time.Now())
|
||||
healthHandler := handler.NewHealthHandler(cfg.Auth.Type)
|
||||
// U-3 ride-along (cat-u-no_version_endpoint, P2): the version handler
|
||||
// answers GET /api/v1/version with build identity (ldflags Version,
|
||||
// VCS commit/dirty/timestamp, Go runtime version). Wired through the
|
||||
// no-auth dispatch + audit ExcludePaths below so probes and rollout
|
||||
// systems can read it without Bearer credentials and without flooding
|
||||
// the audit trail.
|
||||
versionHandler := handler.NewVersionHandler()
|
||||
discoveryHandler := handler.NewDiscoveryHandler(discoveryService)
|
||||
networkScanHandler := handler.NewNetworkScanHandler(networkScanService)
|
||||
verificationService := service.NewVerificationService(jobRepo, auditService, logger)
|
||||
@@ -554,6 +596,7 @@ func main() {
|
||||
Digest: *digestHandler,
|
||||
HealthChecks: healthCheckHandler,
|
||||
BulkRevocation: bulkRevocationHandler,
|
||||
Version: versionHandler,
|
||||
})
|
||||
// Register EST (RFC 7030) handlers if enabled
|
||||
if cfg.EST.Enabled {
|
||||
@@ -690,10 +733,15 @@ func main() {
|
||||
},
|
||||
)
|
||||
auditMiddleware := middleware.NewAuditLog(auditAdapter, middleware.AuditConfig{
|
||||
ExcludePaths: []string{"/health", "/ready"},
|
||||
// /api/v1/version is excluded for the same reason /health and /ready
|
||||
// are: rollout systems and blackbox probes hammer it on a tight
|
||||
// interval, and the audit trail's value comes from rare,
|
||||
// operator-authored mutations — not from sub-second readonly polls.
|
||||
// U-3 ride-along (cat-u-no_version_endpoint, P2).
|
||||
ExcludePaths: []string{"/health", "/ready", "/api/v1/version"},
|
||||
Logger: logger,
|
||||
})
|
||||
logger.Info("API audit logging enabled (excluding /health, /ready)")
|
||||
logger.Info("API audit logging enabled (excluding /health, /ready, /api/v1/version)")
|
||||
|
||||
middlewareStack := []func(http.Handler) http.Handler{
|
||||
middleware.RequestID,
|
||||
@@ -889,6 +937,7 @@ func preflightSCEPChallengePassword(enabled bool, challengePassword string) erro
|
||||
// Dispatch rules (M-001, audit 2026-04-19, option D):
|
||||
//
|
||||
// - /health, /ready, /api/v1/auth/info → no-auth (probes + login detection)
|
||||
// - /api/v1/version → no-auth (U-3 ride-along: build identity for rollout/probes)
|
||||
// - /.well-known/pki/* → no-auth (RFC 5280 CRL, RFC 6960 OCSP)
|
||||
// - /.well-known/est/* → no-auth (RFC 7030 §3.2.3)
|
||||
// - /scep, /scep/* → no-auth (RFC 8894 §3.2, CSR challengePassword)
|
||||
@@ -914,10 +963,12 @@ func buildFinalHandler(apiHandler, noAuthHandler http.Handler, webDir string, da
|
||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
path := r.URL.Path
|
||||
|
||||
// Health/ready and auth/info bypass auth middleware.
|
||||
// Health/ready, auth/info, and version bypass auth middleware.
|
||||
// Health/ready: Docker/K8s health probes don't carry Bearer tokens.
|
||||
// auth/info: React app calls this before login to detect auth mode.
|
||||
if path == "/health" || path == "/ready" || path == "/api/v1/auth/info" {
|
||||
// version: U-3 ride-along (cat-u-no_version_endpoint) — rollout
|
||||
// systems and blackbox probes need build identity without a key.
|
||||
if path == "/health" || path == "/ready" || path == "/api/v1/auth/info" || path == "/api/v1/version" {
|
||||
noAuthHandler.ServeHTTP(w, r)
|
||||
return
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user