mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 20:01:31 +00:00
df4b56798c
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
242 lines
10 KiB
Go
242 lines
10 KiB
Go
package postgres
|
|
|
|
import (
|
|
"database/sql"
|
|
"errors"
|
|
"fmt"
|
|
"os"
|
|
"path/filepath"
|
|
"strings"
|
|
|
|
"github.com/lib/pq"
|
|
)
|
|
|
|
// pgErrInvalidPassword is the SQLSTATE for class 28 / code 28P01 —
|
|
// invalid_password — emitted by PostgreSQL when the client presents
|
|
// credentials that don't match pg_authid. Defined locally because the
|
|
// lib/pq package does not export named constants for SQLSTATE codes (it
|
|
// only exposes the typed string alias pq.ErrorCode and a name-lookup map
|
|
// at runtime). Pinned as a string constant rather than a pq.ErrorCode
|
|
// literal so the contract is grep-able from operator-facing log lines.
|
|
//
|
|
// Reference: https://www.postgresql.org/docs/16/errcodes-appendix.html
|
|
const pgErrInvalidPassword = "28P01"
|
|
|
|
// NewDB opens a PostgreSQL database connection and sets up connection pooling.
|
|
func NewDB(connStr string) (*sql.DB, error) {
|
|
db, err := sql.Open("postgres", connStr)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to open database: %w", err)
|
|
}
|
|
|
|
// Configure connection pool
|
|
db.SetMaxOpenConns(25)
|
|
db.SetMaxIdleConns(5)
|
|
|
|
// Ping to verify connection
|
|
if err := db.Ping(); err != nil {
|
|
return nil, wrapPingError(err)
|
|
}
|
|
|
|
return db, nil
|
|
}
|
|
|
|
// wrapPingError converts a db.Ping() failure into an operator-friendly
|
|
// diagnostic. The default wrap is the original opaque
|
|
// `"failed to ping database: <inner>"` shape. The exception is SQLSTATE 28P01
|
|
// (invalid_password): when postgres rejects the server's credentials we emit
|
|
// extended guidance that names the most common operator misstep — editing
|
|
// POSTGRES_PASSWORD in `.env` after the postgres named volume has already
|
|
// been initialized — and lists both the destructive (`docker compose down -v`)
|
|
// and non-destructive (`ALTER ROLE`) remediations.
|
|
//
|
|
// U-1 (P1, GitHub #10): closes the audit-flagged
|
|
// cat-u-quickstart_postgres_password_volume_trap finding. The postgres
|
|
// docker-entrypoint runs initdb only when /var/lib/postgresql/data is empty;
|
|
// on subsequent boots the password baked into pg_authid on first boot wins
|
|
// over whatever the env var carries, so the env-vs-pg_authid divergence is
|
|
// intrinsic to how the postgres image bootstraps and cannot be fixed by us
|
|
// upstream of pg_authid. The ergonomic answer is to surface a clear
|
|
// diagnostic at the failure site so operators don't waste an hour on
|
|
// "is my password right" before discovering the volume needs to be torn
|
|
// down (or the role's password rotated in-place).
|
|
//
|
|
// The wrap chain is preserved via fmt.Errorf("%w", err) so callers using
|
|
// errors.As(err, &*pq.Error) on the returned value continue to work; this
|
|
// matches the audit's "no substring matching on err.Error()" requirement
|
|
// from the M-1 sentinel-error migration.
|
|
//
|
|
// Returns nil when err is nil so callers can defensively pipe through this
|
|
// helper without an extra branch.
|
|
func wrapPingError(err error) error {
|
|
if err == nil {
|
|
return nil
|
|
}
|
|
|
|
var pqErr *pq.Error
|
|
if errors.As(err, &pqErr) && string(pqErr.Code) == pgErrInvalidPassword {
|
|
return fmt.Errorf(
|
|
"failed to ping database: postgres rejected the configured credentials "+
|
|
"(SQLSTATE %s — invalid_password). If you recently rotated POSTGRES_PASSWORD "+
|
|
"on a docker-compose deploy, the postgres container's data volume still "+
|
|
"holds the previous password: initdb seeds POSTGRES_PASSWORD into pg_authid "+
|
|
"only on first boot of a fresh data dir, so editing the env var after that "+
|
|
"point updates only the certctl-server container. Reset destructively with "+
|
|
"`docker compose -f deploy/docker-compose.yml down -v && "+
|
|
"docker compose -f deploy/docker-compose.yml up -d --build` (this DESTROYS "+
|
|
"all data in the postgres volume), or non-destructively with "+
|
|
"`docker compose -f deploy/docker-compose.yml exec postgres "+
|
|
"psql -U certctl -c \"ALTER ROLE certctl PASSWORD '<new-password>';\"` "+
|
|
"and then redeploy with the matching POSTGRES_PASSWORD. Underlying error: %w",
|
|
pgErrInvalidPassword, err)
|
|
}
|
|
|
|
return fmt.Errorf("failed to ping database: %w", err)
|
|
}
|
|
|
|
// RunMigrations reads and executes SQL migration files from a directory.
|
|
func RunMigrations(db *sql.DB, migrationsPath string) error {
|
|
// Check if migrations directory exists
|
|
if _, err := os.Stat(migrationsPath); os.IsNotExist(err) {
|
|
return fmt.Errorf("migrations directory not found: %s", migrationsPath)
|
|
}
|
|
|
|
// Read all SQL files from the migrations directory
|
|
files, err := os.ReadDir(migrationsPath)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to read migrations directory: %w", err)
|
|
}
|
|
|
|
// Sort and filter to only .up.sql migration files (skip .down.sql rollbacks and seed files)
|
|
var sqlFiles []string
|
|
for _, file := range files {
|
|
if !file.IsDir() && strings.HasSuffix(file.Name(), ".up.sql") {
|
|
sqlFiles = append(sqlFiles, file.Name())
|
|
}
|
|
}
|
|
|
|
// Execute each migration file in order
|
|
for _, filename := range sqlFiles {
|
|
filePath := filepath.Join(migrationsPath, filename)
|
|
content, err := os.ReadFile(filePath)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to read migration file %s: %w", filename, err)
|
|
}
|
|
|
|
// Execute the SQL content
|
|
if _, err := db.Exec(string(content)); err != nil {
|
|
return fmt.Errorf("failed to execute migration %s: %w", filename, err)
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// RunSeed reads and executes the baseline seed SQL file from the migrations
|
|
// directory. Designed to run AFTER RunMigrations so every column referenced by
|
|
// the seed is already in place.
|
|
//
|
|
// U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 the deploy compose stack
|
|
// mounted both a hand-curated subset of `migrations/*.up.sql` and `seed.sql`
|
|
// into postgres `/docker-entrypoint-initdb.d/`. Postgres applied them at
|
|
// initdb time. When `seed.sql` was updated to reference columns added by
|
|
// migrations *after* the mounted cutoff (e.g., `policy_rules.severity` from
|
|
// `000013_policy_rule_severity.up.sql`), initdb crashed during the seed step
|
|
// and the container was reported `unhealthy` indefinitely — bare
|
|
// `docker compose -f deploy/docker-compose.yml up -d --build` from a fresh
|
|
// clone of v2.0.50 hit this on the first try (GitHub #10 reopened by
|
|
// mikeakasully). Helm and the example compose files were already runtime-
|
|
// only (Path B) and worked through the same window.
|
|
//
|
|
// Post-U-3 the compose stack drops all initdb mounts; postgres comes up with
|
|
// an empty schema; the server applies all migrations via RunMigrations and
|
|
// then this function applies the seed. Single source of truth, removes the
|
|
// drift hazard architecturally.
|
|
//
|
|
// The seed file is expected at `<migrationsPath>/seed.sql`. Missing-file is
|
|
// treated as a no-op (returns nil) so deployments that explicitly remove the
|
|
// seed (custom packaging, cert-manager managed schemas) don't break.
|
|
//
|
|
// Idempotency: every INSERT in the shipped seed.sql uses
|
|
// `ON CONFLICT (id) DO NOTHING`, so re-running on a populated DB is safe.
|
|
// This function is invoked on every server start, so the contract MUST hold.
|
|
//
|
|
// Demo seed: `seed_demo.sql` is applied separately by RunDemoSeed below
|
|
// when CERTCTL_DEMO_SEED=true (see internal/config/config.go::DemoSeed).
|
|
// Splitting demo from baseline keeps a default deploy from accidentally
|
|
// landing 90-days-of-fake-history into a real customer database, while
|
|
// still giving the demo overlay a single source of truth (no more initdb
|
|
// mounts). The demo seed itself uses ON CONFLICT (id) DO NOTHING so it's
|
|
// idempotent; missing-file is also tolerated (custom packaging may strip
|
|
// seed_demo.sql to shrink the image).
|
|
func RunSeed(db *sql.DB, migrationsPath string) error {
|
|
if _, err := os.Stat(migrationsPath); os.IsNotExist(err) {
|
|
return fmt.Errorf("migrations directory not found: %s", migrationsPath)
|
|
}
|
|
|
|
seedPath := filepath.Join(migrationsPath, "seed.sql")
|
|
content, err := os.ReadFile(seedPath)
|
|
if err != nil {
|
|
if os.IsNotExist(err) {
|
|
// Missing seed.sql is acceptable — operators may have removed it
|
|
// for custom-packaging reasons. Return nil rather than fail-loud.
|
|
return nil
|
|
}
|
|
return fmt.Errorf("failed to read seed file %s: %w", seedPath, err)
|
|
}
|
|
|
|
if _, err := db.Exec(string(content)); err != nil {
|
|
return fmt.Errorf("failed to execute seed file %s: %w", seedPath, err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// RunDemoSeed applies the demo overlay seed file
|
|
// (`<migrationsPath>/seed_demo.sql`) on top of the baseline seed.
|
|
//
|
|
// U-3 follow-on: pre-U-3 the demo overlay mounted `seed_demo.sql` into
|
|
// postgres `/docker-entrypoint-initdb.d/` and relied on initdb to apply it
|
|
// alongside the schema. Once U-3 dropped the initdb migration mounts, that
|
|
// path stopped working — postgres comes up empty, and the demo seed
|
|
// references tables (issuers, certificates, etc.) that wouldn't exist yet
|
|
// at initdb time. RunDemoSeed restores the demo capability through the
|
|
// same runtime path RunSeed uses, gated by CERTCTL_DEMO_SEED so production
|
|
// deploys never accidentally land the fake-history rows.
|
|
//
|
|
// Order contract: must run AFTER RunSeed so foreign-key references from
|
|
// demo rows to baseline rows (e.g., demo certificates referencing
|
|
// `rp-default` from baseline) resolve cleanly. The caller in
|
|
// cmd/server/main.go enforces this order.
|
|
//
|
|
// Missing-file is acceptable (returns nil) — operators packaging a
|
|
// production-only image often strip seed_demo.sql to shrink the artifact,
|
|
// and that should not break boot when CERTCTL_DEMO_SEED happens to be set.
|
|
//
|
|
// Idempotency: every INSERT in seed_demo.sql uses
|
|
// `ON CONFLICT (id) DO NOTHING`, so re-running on a populated DB is safe.
|
|
// Server restarts in demo mode therefore re-apply the file harmlessly.
|
|
func RunDemoSeed(db *sql.DB, migrationsPath string) error {
|
|
if _, err := os.Stat(migrationsPath); os.IsNotExist(err) {
|
|
return fmt.Errorf("migrations directory not found: %s", migrationsPath)
|
|
}
|
|
|
|
seedPath := filepath.Join(migrationsPath, "seed_demo.sql")
|
|
content, err := os.ReadFile(seedPath)
|
|
if err != nil {
|
|
if os.IsNotExist(err) {
|
|
// Custom production packaging frequently strips this file.
|
|
// Fail-soft to preserve the U-3 contract: a missing seed file
|
|
// must not gate server boot.
|
|
return nil
|
|
}
|
|
return fmt.Errorf("failed to read demo seed file %s: %w", seedPath, err)
|
|
}
|
|
|
|
if _, err := db.Exec(string(content)); err != nil {
|
|
return fmt.Errorf("failed to execute demo seed file %s: %w", seedPath, err)
|
|
}
|
|
|
|
return nil
|
|
}
|