mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 20:11:31 +00:00
a3d8b9c607
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
82 lines
4.0 KiB
SQL
82 lines
4.0 KiB
SQL
-- Migration 000017: DB coupling cleanup (U-3 bundle).
|
|
--
|
|
-- Closes three audit findings that share the migrations/ surface and the
|
|
-- "schema vs Go vs label drifts in different directions" pattern:
|
|
--
|
|
-- * cat-o-retry_interval_unit_mismatch (P1):
|
|
-- renewal_policies.retry_interval_minutes column stored seconds, named
|
|
-- minutes. Operators running raw SQL got 60x confusion.
|
|
--
|
|
-- * cat-o-notification_created_at_dead_field (P2):
|
|
-- internal/domain/notification.go::NotificationEvent.CreatedAt was
|
|
-- tagged json:"created_at" with no DB column behind it. Every API
|
|
-- response serialized 0001-01-01T00:00:00Z. Visible zero-value
|
|
-- timestamp on every notification row in the dashboard.
|
|
--
|
|
-- * cat-o-health_check_column_orphans (P1):
|
|
-- migration 000011 added network_scan_targets.health_check_enabled +
|
|
-- .health_check_interval_seconds. No Go field decoded either column;
|
|
-- no handler exposed them; OpenAPI schema didn't carry them. The
|
|
-- auto-health-check feature was never wired through. Removing dead
|
|
-- schema is cheaper than completing dead code; if the feature gets
|
|
-- revived, a future migration can re-add the columns alongside the
|
|
-- Go-side wiring.
|
|
--
|
|
-- Idempotency: RunMigrations at internal/repository/postgres/db.go has
|
|
-- no applied-tracking table — every server restart re-applies every
|
|
-- migration in sequence. Each block in this file MUST be safe to re-run
|
|
-- on a database that has already had it applied. The RENAME COLUMN in
|
|
-- block (1) is wrapped in a DO $$ guard that checks information_schema
|
|
-- before renaming; the ADD COLUMN in (2) and the DROP COLUMNs in (3)
|
|
-- use the standard IF NOT EXISTS / IF EXISTS clauses.
|
|
--
|
|
-- See the U-3 closure entry in
|
|
-- coverage-gap-audit-2026-04-24-v5/unified-audit.md and CHANGELOG.md
|
|
-- for the full rationale, the bundled-fix list, and the architectural
|
|
-- shift to runtime-only migration application.
|
|
|
|
-- (1) cat-o-retry_interval_unit_mismatch — rename column to match unit.
|
|
--
|
|
-- The values stored in this column have always been seconds (validator
|
|
-- at internal/service/renewal_policy.go enforces a [60, 86400] range
|
|
-- inclusive — 60 seconds to 24 hours, unambiguously seconds). The
|
|
-- column name was the bug; data conversion is a no-op. The Go field
|
|
-- has always been tagged json:"retry_interval_seconds", so the API
|
|
-- shape is unchanged.
|
|
DO $$
|
|
BEGIN
|
|
IF EXISTS (
|
|
SELECT 1 FROM information_schema.columns
|
|
WHERE table_name = 'renewal_policies'
|
|
AND column_name = 'retry_interval_minutes'
|
|
) THEN
|
|
ALTER TABLE renewal_policies
|
|
RENAME COLUMN retry_interval_minutes TO retry_interval_seconds;
|
|
END IF;
|
|
END $$;
|
|
|
|
-- (2) cat-o-notification_created_at_dead_field — add the missing column.
|
|
--
|
|
-- DEFAULT NOW() back-fills existing rows with the migration apply
|
|
-- timestamp. Acceptable trade-off: those rows had no real CreatedAt
|
|
-- info anyway (the field was a Go-only zero-value), and approximating
|
|
-- them with the migration time gives the dashboard a usable rendering
|
|
-- instead of '0001-01-01'. NOT NULL is enforced because the repo
|
|
-- INSERT path will set CreatedAt on every new row post-fix.
|
|
ALTER TABLE notification_events
|
|
ADD COLUMN IF NOT EXISTS created_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
|
|
|
|
-- (3) cat-o-health_check_column_orphans — drop unwired columns.
|
|
--
|
|
-- migrations/000011_health_checks.up.sql added these two columns with
|
|
-- the intent of wiring auto-health-checks for network-scan-discovered
|
|
-- endpoints. The Go side was never written; no handler reads or writes
|
|
-- them; the OpenAPI NetworkScanTarget schema doesn't expose them. The
|
|
-- columns have been carrying their default values (false / 300) on
|
|
-- every row since shipping. Dropping them removes dead schema; the
|
|
-- network_scan_targets row size shrinks marginally and operators stop
|
|
-- seeing flag/interval columns that don't actually do anything.
|
|
ALTER TABLE network_scan_targets
|
|
DROP COLUMN IF EXISTS health_check_enabled,
|
|
DROP COLUMN IF EXISTS health_check_interval_seconds;
|