feat(security): Sprint 5 ACQ — RED-003 deny-empty flip + SEC-009/RED-005 RFC1918 opt-in

Acquisition-audit Sprint 5 ACQ closure (2026-05-16). Two
independent findings ship together because they share Load() /
main.go wiring; the closure comments tie each line to its finding.

PART A — RED-003 (agent-bootstrap deny-empty cutover)
=====================================================

Phase 2 SEC-H1 closure (2026-05-13) introduced the
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY staged feature flag with
default `false` so v2.1.x operators wouldn't get a surprise
fail-closed on upgrade. This commit flips the default to `true`
(per the staged plan in the existing CHANGELOG "Breaking changes
(scheduled for v2.2.0)" block). Operators who haven't generated a
real bootstrap token yet keep the v2.1.x warn-mode pass-through
for one upgrade window by setting
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false explicitly.

Demo-mode escape hatch: CERTCTL_DEMO_MODE_ACK=true skips the
fail-closed gate so the screenshot/demo path stays one-command-up.
The accompanying boot-banner WARN at cmd/server/main.go:124-126
keeps demo mode visible in every log scraper, so this override
cannot silently re-enable warn-mode in production.

internal/config/config.go
  - Load() default for AgentBootstrapTokenDenyEmpty flipped to true
  - Validate() gate now also checks !c.Auth.DemoModeAck so the demo
    override line up with the boot-banner WARN
  - Closure comment block updated to cross-reference Sprint 5 ACQ
    and the CHANGELOG v2.2.0 entry

cmd/server/main.go
  - Updated boot-time WARN message to reflect the new default
    (deny-empty=true) — the warn now fires only in the two
    explicit override scenarios (warn-mode opt-back or demo mode),
    and explains the operator action either way
  - Info-line on configured-token path unchanged

PART B — SEC-009 + RED-005 (opt-in RFC1918 outbound block)
==========================================================

internal/validation/ssrf.go::IsReservedIP has always intentionally
left RFC 1918 ranges (10/8, 172.16/12, 192.168/16) NOT-reserved
because certctl is designed to manage certificates inside private
networks. For operators on hosted IaaS where RFC1918 IS internal
trust (kubeadm-default 10.96.0.0/12 service CIDR exposes the
Kubernetes API on 10.96.0.1; cloud-provider internal monitoring;
hosted-bastion subnets), this default is a real exposure path.

Add a package-level atomic.Bool toggle in internal/validation/ssrf.go
that, when on, extends IsReservedIP to ALSO return true for the
three RFC1918 ranges. Every IsReservedIP-derived path
(SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
webhook + OIDC + ACME callers) picks up the new policy
transitively without per-call-site changes.

internal/validation/ssrf.go
  - blockRFC1918Outbound atomic.Bool + SetBlockRFC1918Outbound /
    BlockRFC1918OutboundEnabled accessor pair
  - rfc1918Nets pre-parsed at package init (panic on parse failure
    surfaces a misconfigured ssrf package immediately, not via a
    silently disabled toggle)
  - IsReservedIP checks the toggle after the existing reserved-IP
    checks
  - Header comment rewritten to document the toggle + the
    transitive coverage

internal/config/config.go
  - New NetworkConfig sub-config; Config gains a Network field
  - Load() reads CERTCTL_BLOCK_RFC1918_OUTBOUND env var (default
    false; preserves the existing self-hosted threat model)
  - NetworkConfig docstring lists the operator-trap (enabling this
    also blocks RFC1918 from the network scanner) so an operator
    cert-discovering their own RFC1918 space doesn't get a
    silently-empty scan result

cmd/server/main.go
  - Wires validation.SetBlockRFC1918Outbound after config.Load and
    near the demo-mode banner / agent-bootstrap-token block; emits
    a one-shot INFO line when the toggle is enabled so the policy
    is visible in journals

Tests
=====

internal/config/config_test.go
  - TestLoad_AgentBootstrapTokenDenyEmpty_DefaultIsTrue — pins the
    default flip at the boot path (Load returns the flipped value)
  - TestValidate_DenyEmptyDefault_RefusesWithoutToken — pins the
    fail-closed behavior under the new default
  - TestValidate_DenyEmptyExplicitFalse_AllowsEmpty — pins the
    v2.1.x back-compat escape hatch
  - TestValidate_DenyEmpty_DemoModeAckOverride_AllowsEmpty — pins
    the demo-mode override

internal/validation/ssrf_test.go
  - TestIsReservedIP_RFC1918_OptIn — pins toggle-off / toggle-on
    behavior across all three RFC1918 ranges, edge cases
    immediately outside the ranges, and the toggle-back-off path
  - TestSafeHTTPDialContext_RFC1918_OptIn — pins that the toggle
    reaches the dial-time SSRF check transitively (not just
    IsReservedIP in isolation)

Test-helper updates (Sprint-5-induced churn):
  - internal/config/config_test.go::setMinimalValidEnv now sets
    CERTCTL_AGENT_BOOTSTRAP_TOKEN to a placeholder so Load()-based
    tests that don't specifically exercise the empty-token gate
    keep passing under the new fail-closed default. Tests that DO
    exercise the empty-token path explicitly override back to "".
  - internal/config/config_est_profiles_test.go +
    internal/config/config_scep_profiles_test.go: same placeholder
    fix for the four Load()-based EST/SCEP profile tests.
  - cmd/server/main_test.go::TestMain_ServerConfigFromEnvironment +
    TestMain_AuthTypeConfiguration: same fix at the main.go test
    layer with prior-value restore.

Verified locally: gofmt -l clean; go vet clean; staticcheck clean
across internal/config, internal/validation, cmd/server; short
tests green on all three packages; targeted -v run of all six new
test names confirms PASS.
This commit is contained in:
shankar0123
2026-05-16 19:13:52 +00:00
parent 374ec574c5
commit 5ea45a19b9
8 changed files with 403 additions and 32 deletions
+61 -12
View File
@@ -113,6 +113,32 @@ type Config struct {
// + window. The scheduler's userRetentionLoop reads Interval; the
// UserRetentionService reads RetentionWindow + BatchCap.
UserRetention UserRetentionConfig
// Network holds outbound-egress policy tunables. Acquisition-audit
// SEC-009 + RED-005 closure (Sprint 5 ACQ, 2026-05-16). Today the
// only field is BlockRFC1918Outbound; future egress-policy knobs
// (per-host allowlists, max-dial-time overrides) go here.
Network NetworkConfig
}
// NetworkConfig is the outbound-egress policy surface for certctl.
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
// 2026-05-16).
type NetworkConfig struct {
// BlockRFC1918Outbound, when true, extends the SSRF reserved-IP
// gate (internal/validation/ssrf.go::IsReservedIP) to include the
// three RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12,
// 192.168.0.0/16). Default false (preserves the certctl threat-
// model default that RFC1918 is legitimate destination space).
// Operators on hosted IaaS where RFC1918 is internal trust
// (Kubernetes service CIDRs that expose the API server inside
// RFC1918, internal-only monitoring stacks, etc.) opt in via
// CERTCTL_BLOCK_RFC1918_OUTBOUND=true. Wired at boot from
// cmd/server/main.go via validation.SetBlockRFC1918Outbound.
//
// IMPORTANT: enabling this also blocks RFC1918 from the certctl
// network scanner. Operators who scan their own RFC1918 space
// for cert-discovery MUST leave this disabled.
BlockRFC1918Outbound bool
}
// AuditChainConfig configures the audit_events tamper-evidence
@@ -464,10 +490,18 @@ func Load() (*Config, error) {
// NamedKeys is populated from CERTCTL_API_KEYS_NAMED below so Load()
// can surface parse errors alongside other config errors.
// Bundle-5 / Audit H-007: agent-registration bootstrap secret.
// Empty (default) = warn-mode pass-through; v2.2.0 will require it.
// Bundle-5 / Audit H-007 + acquisition-audit RED-003 closure
// (Sprint 5 ACQ, 2026-05-16): agent-registration bootstrap
// secret. The deny-empty default flipped from false → true
// on 2026-05-16. Operators upgrading from v2.1.x can re-
// open the warn-mode escape hatch by explicitly setting
// CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false (one
// upgrade window); see CHANGELOG v2.2.0 for the migration
// note. Demo mode (CERTCTL_DEMO_MODE_ACK=true) keeps the
// pre-flip warn-mode for the screenshot path — see
// Validate() for the override site.
AgentBootstrapToken: getEnv("CERTCTL_AGENT_BOOTSTRAP_TOKEN", ""),
AgentBootstrapTokenDenyEmpty: getEnvBool("CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY", false),
AgentBootstrapTokenDenyEmpty: getEnvBool("CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY", true),
// Bundle 1 Phase 6: one-shot bootstrap token for the
// /v1/auth/bootstrap endpoint that mints the first admin
// key. Empty = bootstrap endpoint disabled (default).
@@ -754,6 +788,15 @@ func Load() (*Config, error) {
RetentionWindow: getEnvDuration("CERTCTL_USER_RETENTION_WINDOW", 30*24*time.Hour),
BatchCap: getEnvInt("CERTCTL_USER_RETENTION_BATCH_CAP", 200),
},
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
// 2026-05-16). Default false preserves the existing threat-model
// default (RFC1918 is legitimate destination space); operators
// on hosted IaaS opt in via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
// Wired into validation.SetBlockRFC1918Outbound at boot from
// cmd/server/main.go.
Network: NetworkConfig{
BlockRFC1918Outbound: getEnvBool("CERTCTL_BLOCK_RFC1918_OUTBOUND", false),
},
}
// Parse CERTCTL_API_KEYS_NAMED for named key authentication (M-002).
@@ -942,15 +985,21 @@ func (c *Config) Validate() error {
return fmt.Errorf("auth secret is required for auth type %s", c.Auth.Type)
}
// Phase 2 SEC-H1 closure (2026-05-13): the AgentBootstrapTokenDenyEmpty
// staged feature flag. When the operator opts in via
// CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=true AND the bootstrap
// token is empty, Validate() returns a fail-closed error. Default
// flag value is false, preserving the existing v2.1.x warn-mode
// pass-through behavior for backward compatibility. The default-flip
// to true is scheduled for v2.2.0 in WORKSPACE-ROADMAP.md — operators
// get one upgrade window to set a real token.
if c.Auth.AgentBootstrapTokenDenyEmpty && c.Auth.AgentBootstrapToken == "" {
// Phase 2 SEC-H1 closure (2026-05-13) + acquisition-audit RED-003
// closure (Sprint 5 ACQ, 2026-05-16): the AgentBootstrapTokenDenyEmpty
// fail-closed gate. The flag flipped default from false → true on
// 2026-05-16; operators upgrading from v2.1.x can reopen the
// warn-mode escape hatch with CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false
// for one upgrade window. CHANGELOG v2.2.0 documents the cutover.
//
// Demo-mode override: a screenshot/demo deploy with
// CERTCTL_DEMO_MODE_ACK=true skips this guard so the demo path
// stays one-command-up. The accompanying boot banner WARN in
// cmd/server/main.go keeps the posture visible — demo deploys
// already log a prominent "DEMO MODE ACTIVE" line at every boot.
// Production deploys never set DemoModeAck, so this override
// cannot inadvertently re-enable warn-mode in production.
if c.Auth.AgentBootstrapTokenDenyEmpty && c.Auth.AgentBootstrapToken == "" && !c.Auth.DemoModeAck {
return fmt.Errorf("phase-2 SEC-H1 fail-closed guard: %w", ErrAgentBootstrapTokenRequired)
}