mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:41:29 +00:00
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree.
This commit is contained in:
@@ -0,0 +1,158 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
"runtime"
|
||||
"runtime/debug"
|
||||
)
|
||||
|
||||
// VersionHandler exposes the running server's build identity at
|
||||
// /api/v1/version. U-3 ride-along (cat-u-no_version_endpoint, P2): pre-U-3
|
||||
// there was no in-band way for an operator (or an automated rollout system)
|
||||
// to ask "what version of certctl is this binary?" — they had to either read
|
||||
// the container image tag externally or trust whatever the README said. The
|
||||
// gap matters for the same operability story U-3 closes: when fresh-clone
|
||||
// quickstarts fail, the very first question is "what code did I actually
|
||||
// build", and the only honest answer needs to come from the binary itself.
|
||||
//
|
||||
// VersionInfo is populated from three sources, in priority order:
|
||||
//
|
||||
// 1. The Version field — typically supplied at build time via
|
||||
// `-ldflags='-X github.com/shankar0123/certctl/internal/api/handler.Version=v2.0.50'`.
|
||||
// Production releases set this from the git tag (see release.yml).
|
||||
//
|
||||
// 2. runtime/debug.ReadBuildInfo() — populated by Go 1.18+ for any binary
|
||||
// built from a module. Provides the VCS commit SHA, dirty flag, and
|
||||
// build timestamp. We read these fields directly so a `go build` from a
|
||||
// working tree (no -ldflags incantation) still produces a useful
|
||||
// /api/v1/version payload — the failure mode pre-U-3 was that everything
|
||||
// looked like "dev" everywhere, which made "is the bug fixed in this
|
||||
// binary" unanswerable.
|
||||
//
|
||||
// 3. Static fallbacks ("dev" / "unknown") — only reached when neither
|
||||
// ldflags nor build-info are populated, which in practice means
|
||||
// `go run` from a non-VCS-tracked workspace.
|
||||
//
|
||||
// The handler runs through the no-auth bypass dispatch in cmd/server/main.go
|
||||
// so probes and rollout systems can query it without presenting Bearer
|
||||
// credentials, mirroring how /health and /ready are reachable. Audit logging
|
||||
// excludes /api/v1/version for the same reason — the path is hot under
|
||||
// rollout polling and would otherwise dominate the audit trail.
|
||||
type VersionHandler struct{}
|
||||
|
||||
// Version is overridden at build time via:
|
||||
//
|
||||
// -ldflags='-X github.com/shankar0123/certctl/internal/api/handler.Version=<tag>'
|
||||
//
|
||||
// release.yml does this for the server container and CLI/agent binaries.
|
||||
// The empty default (rather than "dev") lets the Handler fall back to the
|
||||
// runtime/debug VCS revision when ldflags wasn't supplied — preferable to
|
||||
// returning a literal "dev" that masks the actual git SHA the binary was
|
||||
// built from.
|
||||
var Version = ""
|
||||
|
||||
// NewVersionHandler returns a value (not a pointer) to match the
|
||||
// HealthHandler convention — the handler holds no mutable state and is
|
||||
// safe to copy.
|
||||
func NewVersionHandler() VersionHandler {
|
||||
return VersionHandler{}
|
||||
}
|
||||
|
||||
// VersionInfo is the JSON shape returned by GET /api/v1/version.
|
||||
//
|
||||
// Field ordering and tag names are part of the contract — operator tooling
|
||||
// (k8s rollout checks, CI smoke tests, /api/v1/version Prometheus blackbox
|
||||
// probes) parses this payload and must continue to work across releases.
|
||||
// Don't rename a field without an OpenAPI bump and a deprecation cycle.
|
||||
type VersionInfo struct {
|
||||
// Version is the human-readable release identifier (e.g. "v2.0.50").
|
||||
// Falls back to the VCS revision when ldflags wasn't set, and to "dev"
|
||||
// when the build wasn't VCS-tracked at all.
|
||||
Version string `json:"version"`
|
||||
|
||||
// Commit is the git SHA of HEAD at build time, sourced from
|
||||
// runtime/debug.BuildInfo.Settings["vcs.revision"]. Empty string when
|
||||
// the binary was built outside a VCS-tracked workspace (rare —
|
||||
// `go build` from a tarball does this).
|
||||
Commit string `json:"commit"`
|
||||
|
||||
// Modified reports whether the build had uncommitted changes
|
||||
// (debug.BuildInfo.Settings["vcs.modified"]). True for developer
|
||||
// builds, false for release builds out of CI.
|
||||
Modified bool `json:"modified"`
|
||||
|
||||
// BuildTime is the RFC 3339 timestamp captured at build time
|
||||
// (debug.BuildInfo.Settings["vcs.time"]). Empty when not VCS-tracked.
|
||||
BuildTime string `json:"build_time"`
|
||||
|
||||
// GoVersion is the Go toolchain version that compiled the binary
|
||||
// (runtime.Version, e.g. "go1.25.9"). Useful when triaging stdlib
|
||||
// behavior differences ("the deploy that broke was on 1.24, this one
|
||||
// is on 1.25").
|
||||
GoVersion string `json:"go_version"`
|
||||
}
|
||||
|
||||
// readBuildInfo extracts the VCS settings from debug.BuildInfo and pairs
|
||||
// them with the ldflags-supplied Version. Split out from ServeHTTP so the
|
||||
// handler can be unit-tested by injecting synthetic BuildInfo (see
|
||||
// version_handler_test.go) without depending on the test binary's actual
|
||||
// debug info.
|
||||
//
|
||||
// debug.ReadBuildInfo returns ok=false when the binary was built without
|
||||
// module info — extremely rare for a Go 1.18+ build, but we guard it so
|
||||
// the handler degrades to "dev / unknown / runtime.Version()" instead of
|
||||
// nil-deref panicking.
|
||||
func readBuildInfo() VersionInfo {
|
||||
info := VersionInfo{
|
||||
Version: Version,
|
||||
GoVersion: runtime.Version(),
|
||||
}
|
||||
|
||||
bi, ok := debug.ReadBuildInfo()
|
||||
if !ok {
|
||||
// Pre-Go 1.18 binary or a stripped build with no buildinfo segment.
|
||||
// Both are pathological in 2026 but worth the two-line guard.
|
||||
if info.Version == "" {
|
||||
info.Version = "dev"
|
||||
}
|
||||
return info
|
||||
}
|
||||
|
||||
for _, s := range bi.Settings {
|
||||
switch s.Key {
|
||||
case "vcs.revision":
|
||||
info.Commit = s.Value
|
||||
case "vcs.modified":
|
||||
// debug.BuildInfo encodes this as the literal string "true" or
|
||||
// "false"; comparing to "true" is the canonical pattern (mirrors
|
||||
// how the standard library's own version sub-command parses it).
|
||||
info.Modified = s.Value == "true"
|
||||
case "vcs.time":
|
||||
info.BuildTime = s.Value
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback ladder for Version: ldflags > VCS commit > "dev". The git
|
||||
// SHA is more useful than "dev" because it's at least groundable — an
|
||||
// operator can `git show <sha>` to see what code is actually running.
|
||||
if info.Version == "" {
|
||||
if info.Commit != "" {
|
||||
info.Version = info.Commit
|
||||
} else {
|
||||
info.Version = "dev"
|
||||
}
|
||||
}
|
||||
|
||||
return info
|
||||
}
|
||||
|
||||
// ServeHTTP implements http.Handler. Returns the VersionInfo payload as
|
||||
// JSON with a 200 status. GET-only — any other method returns 405, matching
|
||||
// the HealthHandler convention.
|
||||
func (h VersionHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodGet {
|
||||
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||
return
|
||||
}
|
||||
JSON(w, http.StatusOK, readBuildInfo())
|
||||
}
|
||||
@@ -0,0 +1,108 @@
|
||||
package handler
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"runtime"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestVersion_ReturnsBuildInfo is the regression for the U-3 ride-along
|
||||
// cat-u-no_version_endpoint (P2). Three behaviors must hold for the
|
||||
// endpoint to be useful in operator tooling:
|
||||
//
|
||||
// 1. GET /api/v1/version returns 200 with a JSON body that decodes into
|
||||
// the documented VersionInfo shape — the wire contract that rollout
|
||||
// systems and Prometheus blackbox probes parse.
|
||||
// 2. The Go runtime version always populates (runtime.Version() can never
|
||||
// return empty), so consumers can always answer "which Go did this
|
||||
// binary compile with" even when ldflags / VCS info are missing.
|
||||
// 3. The Version field is never empty — the fallback ladder
|
||||
// (ldflags > VCS commit > "dev") guarantees a non-empty string so
|
||||
// consumers don't have to special-case absent values.
|
||||
//
|
||||
// We don't pin the exact Version value because it depends on whether the
|
||||
// test binary was built with -ldflags or under `go test`, both of which
|
||||
// the handler must tolerate. The "no empty string" check is the
|
||||
// behavioral contract.
|
||||
func TestVersion_ReturnsBuildInfo(t *testing.T) {
|
||||
h := NewVersionHandler()
|
||||
|
||||
req := httptest.NewRequest(http.MethodGet, "/api/v1/version", nil)
|
||||
rec := httptest.NewRecorder()
|
||||
h.ServeHTTP(rec, req)
|
||||
|
||||
if rec.Code != http.StatusOK {
|
||||
t.Fatalf("status = %d, want 200", rec.Code)
|
||||
}
|
||||
|
||||
contentType := rec.Header().Get("Content-Type")
|
||||
if !strings.HasPrefix(contentType, "application/json") {
|
||||
t.Errorf("Content-Type = %q, want application/json prefix (operator tooling parses JSON)", contentType)
|
||||
}
|
||||
|
||||
var got VersionInfo
|
||||
if err := json.NewDecoder(rec.Body).Decode(&got); err != nil {
|
||||
t.Fatalf("response body did not decode into VersionInfo: %v\nbody: %s", err, rec.Body.String())
|
||||
}
|
||||
|
||||
// Version must never be empty — the fallback ladder in readBuildInfo
|
||||
// guarantees this. An empty Version would force every downstream
|
||||
// consumer (k8s rollouts, Prometheus blackbox, the support tooling)
|
||||
// to special-case the missing value, which defeats the point of
|
||||
// /api/v1/version existing.
|
||||
if got.Version == "" {
|
||||
t.Error("Version is empty — the fallback ladder (ldflags > VCS commit > 'dev') must guarantee a non-empty value")
|
||||
}
|
||||
|
||||
// GoVersion must equal runtime.Version() — the handler reads it
|
||||
// directly and cannot be subverted by ldflags or BuildInfo. This is
|
||||
// the one field that should always be ground-truth.
|
||||
if got.GoVersion != runtime.Version() {
|
||||
t.Errorf("GoVersion = %q, want %q (must come straight from runtime.Version())",
|
||||
got.GoVersion, runtime.Version())
|
||||
}
|
||||
}
|
||||
|
||||
// TestVersion_RejectsNonGet pins the GET-only contract. /api/v1/version
|
||||
// is read-only build identity; POST/PUT/DELETE etc. are nonsensical and
|
||||
// should return 405 like the HealthHandler does. Operator tooling that
|
||||
// fat-fingers the verb gets a clear error rather than a confusing 200
|
||||
// from the wrong code path.
|
||||
func TestVersion_RejectsNonGet(t *testing.T) {
|
||||
h := NewVersionHandler()
|
||||
|
||||
for _, method := range []string{
|
||||
http.MethodPost, http.MethodPut, http.MethodDelete, http.MethodPatch,
|
||||
} {
|
||||
req := httptest.NewRequest(method, "/api/v1/version", nil)
|
||||
rec := httptest.NewRecorder()
|
||||
h.ServeHTTP(rec, req)
|
||||
if rec.Code != http.StatusMethodNotAllowed {
|
||||
t.Errorf("%s /api/v1/version → status %d, want 405", method, rec.Code)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestVersion_LdflagsOverride locks in the priority order: when the
|
||||
// build-time Version variable is non-empty (e.g. "v2.0.50" injected by
|
||||
// release.yml), readBuildInfo MUST surface that value verbatim and not
|
||||
// silently substitute the VCS commit. The release-pipeline contract
|
||||
// depends on this — a release tagged v2.0.50 should report "v2.0.50",
|
||||
// not the underlying SHA.
|
||||
//
|
||||
// We achieve test isolation by save/restore on the package-level Version
|
||||
// variable; t.Cleanup ensures parallel/subsequent tests see the original.
|
||||
func TestVersion_LdflagsOverride(t *testing.T) {
|
||||
original := Version
|
||||
t.Cleanup(func() { Version = original })
|
||||
|
||||
Version = "v2.0.50-test"
|
||||
got := readBuildInfo()
|
||||
if got.Version != "v2.0.50-test" {
|
||||
t.Errorf("Version = %q, want %q (ldflags-supplied Version must take priority over VCS fallback)",
|
||||
got.Version, "v2.0.50-test")
|
||||
}
|
||||
}
|
||||
@@ -68,6 +68,11 @@ type HandlerRegistry struct {
|
||||
HealthChecks *handler.HealthCheckHandler
|
||||
BulkRevocation handler.BulkRevocationHandler
|
||||
RenewalPolicies handler.RenewalPolicyHandler
|
||||
// Version handles GET /api/v1/version (U-3 ride-along,
|
||||
// cat-u-no_version_endpoint). Wired through the no-auth dispatch in
|
||||
// cmd/server/main.go so probes and rollout systems can read build
|
||||
// identity without Bearer credentials. See handler/version.go.
|
||||
Version handler.VersionHandler
|
||||
}
|
||||
|
||||
// RegisterHandlers sets up all API routes with their handlers.
|
||||
@@ -89,6 +94,17 @@ func (r *Router) RegisterHandlers(reg HandlerRegistry) {
|
||||
middleware.CORS,
|
||||
middleware.ContentType,
|
||||
))
|
||||
// Version endpoint (no auth middleware — used by rollout probes that
|
||||
// don't carry Bearer tokens; the dispatch layer in cmd/server/main.go
|
||||
// also routes /api/v1/version through the no-auth chain). U-3 ride-along
|
||||
// (cat-u-no_version_endpoint, P2). The handler reads
|
||||
// runtime/debug.BuildInfo for VCS attribution; ldflags-supplied Version
|
||||
// is preferred when present.
|
||||
r.mux.Handle("GET /api/v1/version", middleware.Chain(
|
||||
reg.Version,
|
||||
middleware.CORS,
|
||||
middleware.ContentType,
|
||||
))
|
||||
// Auth check endpoint (uses full middleware chain via r.Register)
|
||||
r.Register("GET /api/v1/auth/check", http.HandlerFunc(reg.Health.AuthCheck))
|
||||
|
||||
|
||||
Reference in New Issue
Block a user