mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-13 08:58:55 +00:00
config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)
Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.
config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================
Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
- ErrAgentBootstrapTokenRequired (SEC-H1)
- ErrACMEInsecureWithoutAck (SEC-M4)
- ErrDemoModeAckExpired (SEC-H3)
SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.
SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.
SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.
Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.
Helm chart + HA runbook (DEPL-H1)
=================================
Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.
CI guard (DEPL-M2)
==================
scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.
Consolidated 'Security carve-outs' doc section
==============================================
docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
- SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
- SEC-M5: F5 connector InsecureSkipVerify per-config field
- SEC-M4: ACME insecure + new ACK gate
- SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
- SEC-L2: break-glass Argon2id rest-defense reminder
- SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
- DEPL-M2: change-me-* placeholder credentials in demo overlay
- DEPL-M3: K8s NetworkPolicy operator-opt-in default
Each entry cites the file:line, the rationale for the carve-out, and
the operator action.
CHANGELOG + ENVIRONMENTS coverage
==================================
CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.
deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.
WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.
Sandbox limitation
==================
The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.
CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
This commit is contained in:
@@ -7,10 +7,12 @@ import (
|
||||
"crypto/x509"
|
||||
"crypto/x509/pkix"
|
||||
"encoding/pem"
|
||||
"errors"
|
||||
"log/slog"
|
||||
"math/big"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
@@ -398,6 +400,233 @@ func TestLoad_CommaSeparatedList(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 2 SEC-H1 (2026-05-13) — AgentBootstrapTokenDenyEmpty staged flag.
|
||||
// When false (default), an empty token is permitted (v2.1.x warn-mode
|
||||
// pass-through preserved). When true, an empty token fails closed.
|
||||
func TestValidate_AgentBootstrapTokenDenyEmpty_DefaultFalse_AllowsEmpty(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "", AgentBootstrapTokenDenyEmpty: false},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with deny-empty=false + empty token: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_AgentBootstrapTokenDenyEmpty_True_EmptyTokenFailsClosed(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "", AgentBootstrapTokenDenyEmpty: true},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrAgentBootstrapTokenRequired")
|
||||
}
|
||||
if !errors.Is(err, ErrAgentBootstrapTokenRequired) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrAgentBootstrapTokenRequired", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true") {
|
||||
t.Errorf("Validate() error = %q; want message to mention the deny-empty env var name", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_AgentBootstrapTokenDenyEmpty_True_RealTokenPasses(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", AgentBootstrapToken: "a-real-32-byte-token-value-here-x", AgentBootstrapTokenDenyEmpty: true},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with deny-empty=true + real token: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 2 SEC-M4 (2026-05-13) — ACME insecure now requires explicit ACK.
|
||||
func TestValidate_ACMEInsecure_WithoutAck_FailsClosed(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
ACME: ACMEConfig{Insecure: true, InsecureAck: false},
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrACMEInsecureWithoutAck")
|
||||
}
|
||||
if !errors.Is(err, ErrACMEInsecureWithoutAck) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrACMEInsecureWithoutAck", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "CERTCTL_ACME_INSECURE_ACK") {
|
||||
t.Errorf("Validate() error = %q; want message to mention CERTCTL_ACME_INSECURE_ACK", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_ACMEInsecure_WithAck_Passes(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
ACME: ACMEConfig{Insecure: true, InsecureAck: true},
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with Insecure=true + InsecureAck=true: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_ACMEInsecureFalse_IgnoresAck(t *testing.T) {
|
||||
// InsecureAck is irrelevant when Insecure=false. No fail-closed branch.
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
ACME: ACMEConfig{Insecure: false, InsecureAck: false},
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with Insecure=false: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Phase 2 SEC-H3 (2026-05-13) — DemoModeAck now expires after 24h via DemoModeAckTS.
|
||||
// Note: DemoModeAck=true on a loopback bind requires only the timestamp guard;
|
||||
// no HIGH-12 cross-firing because the existing HIGH-12 guard fires only on
|
||||
// non-loopback hosts. All tests here keep the server host as loopback so we
|
||||
// observe ONLY the new SEC-H3 behavior.
|
||||
func TestValidate_DemoModeAck_MissingTS_FailsClosed(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: ""},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with empty TS")
|
||||
}
|
||||
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "CERTCTL_DEMO_MODE_ACK_TS") {
|
||||
t.Errorf("Validate() error = %q; want message to mention CERTCTL_DEMO_MODE_ACK_TS", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_DemoModeAck_StaleTS_FailsClosed(t *testing.T) {
|
||||
// TS older than 24h → expired.
|
||||
staleEpoch := time.Now().Add(-25 * time.Hour).Unix()
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(staleEpoch, 10)},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with 25h-old TS")
|
||||
}
|
||||
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_DemoModeAck_FreshTS_Passes(t *testing.T) {
|
||||
// TS within 24h → passes.
|
||||
freshEpoch := time.Now().Add(-1 * time.Hour).Unix()
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(freshEpoch, 10)},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with 1h-old TS: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_DemoModeAck_NonNumericTS_FailsClosed(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: "yesterday"},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with non-numeric TS")
|
||||
}
|
||||
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "parse") {
|
||||
t.Errorf("Validate() error = %q; want message to mention parse failure", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_DemoModeAck_FutureDatedTS_FailsClosed(t *testing.T) {
|
||||
// > 1m future-dated → clock-skew rejection.
|
||||
futureEpoch := time.Now().Add(10 * time.Minute).Unix()
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: true, DemoModeAckTS: strconv.FormatInt(futureEpoch, 10)},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
err := cfg.Validate()
|
||||
if err == nil {
|
||||
t.Fatal("Validate() returned nil; want ErrDemoModeAckExpired with future-dated TS")
|
||||
}
|
||||
if !errors.Is(err, ErrDemoModeAckExpired) {
|
||||
t.Errorf("Validate() err = %v; want errors.Is to match ErrDemoModeAckExpired", err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "future") {
|
||||
t.Errorf("Validate() error = %q; want message to mention future-dated TS", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_DemoModeAckFalse_IgnoresTS(t *testing.T) {
|
||||
// DemoModeAck=false → TS is irrelevant; no fail-closed branch.
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
|
||||
Log: LogConfig{Level: "info", Format: "json"},
|
||||
Auth: AuthConfig{Type: "api-key", Secret: "test-secret", DemoModeAck: false, DemoModeAckTS: ""},
|
||||
Keygen: KeygenConfig{Mode: "agent"},
|
||||
Scheduler: validSchedulerConfig(),
|
||||
}
|
||||
if err := cfg.Validate(); err != nil {
|
||||
t.Fatalf("Validate() returned error with DemoModeAck=false: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestValidate_ValidConfig(t *testing.T) {
|
||||
cfg := &Config{
|
||||
Server: validServerConfig(t),
|
||||
|
||||
Reference in New Issue
Block a user