mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:21:40 +00:00
85e60b24ec
Closes Audit-2026-04-25 H-006 (High), H-007 (High), M-011 (Medium),
L-006 (Low — verified-already-closed via C-1 master closure in v2.0.54).
Hardens the orchestrator-facing surface — k8s probes, agent enrollment,
shutdown audit drain, scheduler config plumbing.
What changed
- internal/api/handler/health.go — split contract:
* /health stays shallow 200 (k8s liveness — process alive)
* /ready accepts *sql.DB; runs db.PingContext(2s); 503 on failure
* Nil DB path returns 200 + db=not_configured (test fixtures)
- internal/api/handler/agent_bootstrap.go (NEW) — verifyBootstrapToken:
* empty expected = warn-mode pass-through
* non-empty = `Authorization: Bearer <token>` required
* crypto/subtle.ConstantTimeCompare; length-mismatch path runs dummy
compare to keep timing uniform
* ErrBootstrapTokenInvalid sentinel
- internal/api/handler/agents.go — RegisterAgent calls verifyBootstrapToken
BEFORE body parse so unauth probes don't even allocate a JSON decoder
- internal/config/config.go — two new env vars:
* CERTCTL_AGENT_BOOTSTRAP_TOKEN (Auth.AgentBootstrapToken)
* CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS (Server.AuditFlushTimeoutSeconds)
- cmd/server/main.go — 3 changes:
* pass *sql.DB into NewHealthHandler (H-006)
* pass cfg.Auth.AgentBootstrapToken into NewAgentHandler (H-007)
* configurable shutdown audit-flush timeout (M-011)
* one-shot startup WARN when bootstrap token unset (deprecation)
- new tests: agent_bootstrap_test.go (full deny/accept/warn-mode coverage,
constant-time compare path, length-mismatch); health_test.go extended
with /ready DB-probe failure (503), nil-DB pass-through, /health-shallow
L-006 verified
- cmd/server/main.go:557 already calls
sched.SetShortLivedExpiryCheckInterval(cfg.Scheduler.ShortLivedExpiryCheckInterval)
per the C-1 master closure in v2.0.54. Bundle 5 confirms; no code change.
Threat model: TB-1 (operator/orchestrator), TB-2 (Agent↔Server).
- CWE-754 (Improper Check for Unusual or Exceptional Conditions) for H-006
- CWE-306 + CWE-288 (Missing Authentication for Critical Function) for H-007
Verification
- go vet ./... → clean
- go build ./... → clean
- go test -short -count=1 ./... → all packages pass
- targeted Bundle-5 regressions → all pass
- npx tsc --noEmit (web) → clean
- npx vitest run (web) → in-flight (sandbox 45s
ceiling exceeded; no failure markers in dot stream; no frontend
changes in this bundle so no regression risk)
- python3 yaml.safe_load(api/openapi.yaml) → 89 paths
Backward compatibility
- Bootstrap token defaults to empty (warn-mode) — existing demo
deployments unaffected. Server logs deprecation WARN; v2.2.0 will
require it.
- Audit flush timeout default 30s preserves prior behaviour.
- Helm chart already routes readiness probe to /ready (no chart change
needed); now /ready actually probes the DB.
Bundle 5 of the 2026-04-25 comprehensive audit.
160 lines
5.6 KiB
Go
160 lines
5.6 KiB
Go
package handler
|
|
|
|
import (
|
|
"context"
|
|
"database/sql"
|
|
"net/http"
|
|
"time"
|
|
|
|
"github.com/shankar0123/certctl/internal/api/middleware"
|
|
)
|
|
|
|
// HealthHandler handles health and readiness check endpoints.
|
|
//
|
|
// Bundle-5 / Audit H-006 / CWE-754 (Improper Check for Unusual or
|
|
// Exceptional Conditions): pre-Bundle-5, both /health and /ready returned
|
|
// 200 unconditionally with no DB probe. A Kubernetes readinessProbe pointed
|
|
// at /ready would succeed even when the control plane was disconnected from
|
|
// Postgres, masking outages and routing user traffic to a broken instance.
|
|
//
|
|
// Post-Bundle-5 contract:
|
|
//
|
|
// GET /health → 200 always (process alive — liveness signal). No DB probe.
|
|
// k8s liveness probe: do NOT restart pod for DB hiccups.
|
|
// GET /ready → 200 if db.PingContext(2s) succeeds; 503 +
|
|
// {"status":"db_unavailable","error":"..."} if it fails.
|
|
// k8s readiness probe: drain pod when DB unreachable.
|
|
//
|
|
// The handler accepts a nullable DB pool. When nil (test fixtures, or the
|
|
// rare deploy without a DB), Ready degrades to "no probe configured" and
|
|
// returns 200 with {"status":"ready","db":"not_configured"} — preserves
|
|
// backwards compat for callers that haven't wired the dependency yet.
|
|
//
|
|
// G-1 (P1): AuthType is one of "api-key" or "none" — see
|
|
// internal/config.AuthType / config.ValidAuthTypes() for the typed
|
|
// constants and the rationale for dropping "jwt" (no JWT middleware
|
|
// ships with certctl; operators who need JWT/OIDC front certctl with
|
|
// an authenticating gateway and set AuthType="none" on the upstream).
|
|
type HealthHandler struct {
|
|
AuthType string // "api-key" or "none" (see config.AuthType constants)
|
|
|
|
// DB is the database pool used by Ready for connectivity probing.
|
|
// May be nil (test fixtures / no-db deploys); Ready degrades gracefully.
|
|
DB *sql.DB
|
|
|
|
// ReadyProbeTimeout is the per-probe ceiling for the DB ping. Defaults
|
|
// to 2s when zero. Exposed so tests can shorten it.
|
|
ReadyProbeTimeout time.Duration
|
|
}
|
|
|
|
// NewHealthHandler creates a new HealthHandler.
|
|
//
|
|
// Bundle-5 / H-006: db may be nil (test fixtures + no-db deploys). When nil,
|
|
// Ready returns 200 with {"db":"not_configured"} — preserves backwards
|
|
// compatibility for the call sites that haven't wired the dependency yet.
|
|
// Production main.go always passes a non-nil pool.
|
|
func NewHealthHandler(authType string, db *sql.DB) HealthHandler {
|
|
return HealthHandler{
|
|
AuthType: authType,
|
|
DB: db,
|
|
ReadyProbeTimeout: 2 * time.Second,
|
|
}
|
|
}
|
|
|
|
// Health responds with a simple health check indicating the service is alive.
|
|
// GET /health
|
|
//
|
|
// Bundle-5 / H-006: shallow on purpose — k8s liveness probe should NOT
|
|
// restart the pod when Postgres is degraded. Use /ready for readiness.
|
|
func (h HealthHandler) Health(w http.ResponseWriter, r *http.Request) {
|
|
if r.Method != http.MethodGet {
|
|
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
|
return
|
|
}
|
|
|
|
response := map[string]string{
|
|
"status": "healthy",
|
|
}
|
|
|
|
JSON(w, http.StatusOK, response)
|
|
}
|
|
|
|
// Ready responds with readiness status, indicating whether the service is
|
|
// ready to handle requests.
|
|
// GET /ready
|
|
//
|
|
// Bundle-5 / H-006: deep probe via db.PingContext with a 2-second ceiling.
|
|
// Returns 503 + {"status":"db_unavailable","error":"<sanitized>"} when the
|
|
// DB is unreachable so k8s drains the pod. Returns 200 when ping succeeds
|
|
// or when no DB pool is wired (test/no-db deploys).
|
|
func (h HealthHandler) Ready(w http.ResponseWriter, r *http.Request) {
|
|
if r.Method != http.MethodGet {
|
|
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
|
return
|
|
}
|
|
|
|
if h.DB == nil {
|
|
// No DB wired (test fixture or no-db deploy). Don't fail the probe;
|
|
// surface the state for operator visibility.
|
|
JSON(w, http.StatusOK, map[string]string{
|
|
"status": "ready",
|
|
"db": "not_configured",
|
|
})
|
|
return
|
|
}
|
|
|
|
timeout := h.ReadyProbeTimeout
|
|
if timeout <= 0 {
|
|
timeout = 2 * time.Second
|
|
}
|
|
ctx, cancel := context.WithTimeout(r.Context(), timeout)
|
|
defer cancel()
|
|
|
|
if err := h.DB.PingContext(ctx); err != nil {
|
|
// 503 is the correct readiness-failure status — k8s will drain
|
|
// traffic but won't tear down the pod (that's liveness's job).
|
|
JSON(w, http.StatusServiceUnavailable, map[string]string{
|
|
"status": "db_unavailable",
|
|
"error": err.Error(),
|
|
})
|
|
return
|
|
}
|
|
|
|
JSON(w, http.StatusOK, map[string]string{
|
|
"status": "ready",
|
|
"db": "reachable",
|
|
})
|
|
}
|
|
|
|
// AuthInfo responds with the server's authentication configuration.
|
|
// This lets the GUI know whether to show a login screen.
|
|
// GET /api/v1/auth/info (served without auth middleware)
|
|
func (h HealthHandler) AuthInfo(w http.ResponseWriter, r *http.Request) {
|
|
response := map[string]interface{}{
|
|
"auth_type": h.AuthType,
|
|
"required": h.AuthType != "none",
|
|
}
|
|
JSON(w, http.StatusOK, response)
|
|
}
|
|
|
|
// AuthCheck returns 200 if the request has valid auth credentials, along with
|
|
// the resolved named-key identity and admin flag so the GUI can gate
|
|
// admin-only affordances (e.g., the bulk-revoke button).
|
|
//
|
|
// M-003 (Phase B.4): surface the admin flag so the frontend hides affordances
|
|
// that would otherwise 403 at the server. This is a hint for UX only —
|
|
// authorization remains enforced at the handler layer (bulk_revocation.go).
|
|
//
|
|
// The auth middleware runs before this handler, so reaching here means auth
|
|
// passed. `user` falls back to an empty string when auth is disabled
|
|
// (CERTCTL_AUTH_TYPE=none).
|
|
// GET /api/v1/auth/check
|
|
func (h HealthHandler) AuthCheck(w http.ResponseWriter, r *http.Request) {
|
|
response := map[string]interface{}{
|
|
"status": "authenticated",
|
|
"user": middleware.GetUser(r.Context()),
|
|
"admin": middleware.IsAdmin(r.Context()),
|
|
}
|
|
JSON(w, http.StatusOK, response)
|
|
}
|