feat(security): Sprint 5 ACQ — RED-003 deny-empty flip + SEC-009/RED-005 RFC1918 opt-in

Acquisition-audit Sprint 5 ACQ closure (2026-05-16). Two
independent findings ship together because they share Load() /
main.go wiring; the closure comments tie each line to its finding.

PART A — RED-003 (agent-bootstrap deny-empty cutover)
=====================================================

Phase 2 SEC-H1 closure (2026-05-13) introduced the
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY staged feature flag with
default `false` so v2.1.x operators wouldn't get a surprise
fail-closed on upgrade. This commit flips the default to `true`
(per the staged plan in the existing CHANGELOG "Breaking changes
(scheduled for v2.2.0)" block). Operators who haven't generated a
real bootstrap token yet keep the v2.1.x warn-mode pass-through
for one upgrade window by setting
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false explicitly.

Demo-mode escape hatch: CERTCTL_DEMO_MODE_ACK=true skips the
fail-closed gate so the screenshot/demo path stays one-command-up.
The accompanying boot-banner WARN at cmd/server/main.go:124-126
keeps demo mode visible in every log scraper, so this override
cannot silently re-enable warn-mode in production.

internal/config/config.go
  - Load() default for AgentBootstrapTokenDenyEmpty flipped to true
  - Validate() gate now also checks !c.Auth.DemoModeAck so the demo
    override line up with the boot-banner WARN
  - Closure comment block updated to cross-reference Sprint 5 ACQ
    and the CHANGELOG v2.2.0 entry

cmd/server/main.go
  - Updated boot-time WARN message to reflect the new default
    (deny-empty=true) — the warn now fires only in the two
    explicit override scenarios (warn-mode opt-back or demo mode),
    and explains the operator action either way
  - Info-line on configured-token path unchanged

PART B — SEC-009 + RED-005 (opt-in RFC1918 outbound block)
==========================================================

internal/validation/ssrf.go::IsReservedIP has always intentionally
left RFC 1918 ranges (10/8, 172.16/12, 192.168/16) NOT-reserved
because certctl is designed to manage certificates inside private
networks. For operators on hosted IaaS where RFC1918 IS internal
trust (kubeadm-default 10.96.0.0/12 service CIDR exposes the
Kubernetes API on 10.96.0.1; cloud-provider internal monitoring;
hosted-bastion subnets), this default is a real exposure path.

Add a package-level atomic.Bool toggle in internal/validation/ssrf.go
that, when on, extends IsReservedIP to ALSO return true for the
three RFC1918 ranges. Every IsReservedIP-derived path
(SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
webhook + OIDC + ACME callers) picks up the new policy
transitively without per-call-site changes.

internal/validation/ssrf.go
  - blockRFC1918Outbound atomic.Bool + SetBlockRFC1918Outbound /
    BlockRFC1918OutboundEnabled accessor pair
  - rfc1918Nets pre-parsed at package init (panic on parse failure
    surfaces a misconfigured ssrf package immediately, not via a
    silently disabled toggle)
  - IsReservedIP checks the toggle after the existing reserved-IP
    checks
  - Header comment rewritten to document the toggle + the
    transitive coverage

internal/config/config.go
  - New NetworkConfig sub-config; Config gains a Network field
  - Load() reads CERTCTL_BLOCK_RFC1918_OUTBOUND env var (default
    false; preserves the existing self-hosted threat model)
  - NetworkConfig docstring lists the operator-trap (enabling this
    also blocks RFC1918 from the network scanner) so an operator
    cert-discovering their own RFC1918 space doesn't get a
    silently-empty scan result

cmd/server/main.go
  - Wires validation.SetBlockRFC1918Outbound after config.Load and
    near the demo-mode banner / agent-bootstrap-token block; emits
    a one-shot INFO line when the toggle is enabled so the policy
    is visible in journals

Tests
=====

internal/config/config_test.go
  - TestLoad_AgentBootstrapTokenDenyEmpty_DefaultIsTrue — pins the
    default flip at the boot path (Load returns the flipped value)
  - TestValidate_DenyEmptyDefault_RefusesWithoutToken — pins the
    fail-closed behavior under the new default
  - TestValidate_DenyEmptyExplicitFalse_AllowsEmpty — pins the
    v2.1.x back-compat escape hatch
  - TestValidate_DenyEmpty_DemoModeAckOverride_AllowsEmpty — pins
    the demo-mode override

internal/validation/ssrf_test.go
  - TestIsReservedIP_RFC1918_OptIn — pins toggle-off / toggle-on
    behavior across all three RFC1918 ranges, edge cases
    immediately outside the ranges, and the toggle-back-off path
  - TestSafeHTTPDialContext_RFC1918_OptIn — pins that the toggle
    reaches the dial-time SSRF check transitively (not just
    IsReservedIP in isolation)

Test-helper updates (Sprint-5-induced churn):
  - internal/config/config_test.go::setMinimalValidEnv now sets
    CERTCTL_AGENT_BOOTSTRAP_TOKEN to a placeholder so Load()-based
    tests that don't specifically exercise the empty-token gate
    keep passing under the new fail-closed default. Tests that DO
    exercise the empty-token path explicitly override back to "".
  - internal/config/config_est_profiles_test.go +
    internal/config/config_scep_profiles_test.go: same placeholder
    fix for the four Load()-based EST/SCEP profile tests.
  - cmd/server/main_test.go::TestMain_ServerConfigFromEnvironment +
    TestMain_AuthTypeConfiguration: same fix at the main.go test
    layer with prior-value restore.

Verified locally: gofmt -l clean; go vet clean; staticcheck clean
across internal/config, internal/validation, cmd/server; short
tests green on all three packages; targeted -v run of all six new
test names confirms PASS.
This commit is contained in:
shankar0123
2026-05-16 19:13:52 +00:00
parent 374ec574c5
commit 5ea45a19b9
8 changed files with 403 additions and 32 deletions
+90 -9
View File
@@ -9,26 +9,94 @@ import (
"net"
"net/url"
"strings"
"sync/atomic"
"time"
)
// blockRFC1918Outbound is the package-level toggle for the
// acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
// 2026-05-16). When true, IsReservedIP additionally returns true for
// the RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16),
// which by default are NOT reserved (see the IsReservedIP header
// comment for the threat-model rationale). Operators on hosted IaaS
// where RFC1918 IS internal trust (e.g. the kubeadm-default
// 10.96.0.0/12 service CIDR exposes the Kubernetes API server on
// 10.96.0.1) opt in via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
//
// Stored as atomic.Bool so the hot-path SSRF check in
// SafeHTTPDialContext doesn't need a mutex; SetBlockRFC1918Outbound
// is the single writer (called once at boot from
// cmd/server/main.go via the config.Network.BlockRFC1918Outbound
// value) and IsReservedIP is the reader. Because the toggle is
// boot-time wiring rather than per-request runtime, the relaxed
// memory ordering of atomic.Bool is sufficient and adds no
// measurable per-call overhead.
var blockRFC1918Outbound atomic.Bool
// SetBlockRFC1918Outbound flips the package-level RFC1918-block
// toggle. Called once at boot from cmd/server/main.go after
// config.Load. Idempotent — operators can re-flip in tests by
// passing the value they want.
//
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ).
func SetBlockRFC1918Outbound(block bool) {
blockRFC1918Outbound.Store(block)
}
// BlockRFC1918OutboundEnabled reports the current toggle state.
// Exposed so callers (e.g. operator-facing /healthz diagnostics)
// can render the effective SSRF policy without re-reading the env.
func BlockRFC1918OutboundEnabled() bool {
return blockRFC1918Outbound.Load()
}
// rfc1918Nets is the pre-parsed set of RFC 1918 CIDRs, computed once
// at package init so the IsReservedIP hot path doesn't re-parse the
// strings on every call. A `nil` entry would surface a panic at
// startup rather than silently no-op the toggle.
var rfc1918Nets = func() []*net.IPNet {
out := make([]*net.IPNet, 0, 3)
for _, cidr := range []string{"10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"} {
_, n, err := net.ParseCIDR(cidr)
if err != nil || n == nil {
panic("ssrf: failed to pre-parse RFC1918 CIDR " + cidr + ": " + err.Error())
}
out = append(out, n)
}
return out
}()
// IsReservedIP reports whether the given IP falls inside a range that
// outbound HTTP egress (and the network-scanner CIDR expander) MUST treat
// as unreachable: loopback, link-local (including cloud-provider metadata
// endpoints at 169.254.169.254), multicast, and broadcast.
//
// RFC 1918 ranges (10/8, 172.16/12, 192.168/16) are intentionally NOT
// treated as reserved. certctl is designed to manage certificates inside
// private networks and filtering private address space would break the
// primary use case. The threat model here is outbound HTTP to
// cloud-metadata or localhost services, not general network reachability.
// treated as reserved by default. certctl is designed to manage
// certificates inside private networks and filtering private address
// space would break the primary use case. The default threat model is
// outbound HTTP to cloud-metadata or localhost services, not general
// network reachability.
//
// Operators on hosted IaaS where RFC1918 IS internal trust (Kubernetes
// service CIDRs that expose the API server inside RFC1918, internal-
// only monitoring stacks, etc.) can opt in via
// CERTCTL_BLOCK_RFC1918_OUTBOUND=true, which the boot path passes to
// SetBlockRFC1918Outbound. When the toggle is on, the three RFC 1918
// ranges are appended to the reserved set and every code path that
// builds on IsReservedIP (isReservedIPForDial, IsReservedIPForDial,
// SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
// webhook notifier) picks up the policy transitively without per-
// call-site changes. This is acquisition-audit SEC-009 + RED-005
// closure (Sprint 5 ACQ, 2026-05-16).
//
// This function is byte-identical in behaviour to the previous unexported
// copy in internal/service/network_scan.go. It is exported here so both
// the network scanner and the webhook notifier share a single
// authoritative implementation. Broader IPv6 coverage and unspecified-
// address handling live in SafeHTTPDialContext, where stricter policy is
// appropriate for outbound HTTP egress.
// copy in internal/service/network_scan.go (for the default-off case).
// It is exported here so both the network scanner and the webhook
// notifier share a single authoritative implementation. Broader IPv6
// coverage and unspecified- address handling live in
// SafeHTTPDialContext, where stricter policy is appropriate for
// outbound HTTP egress.
func IsReservedIP(ip net.IP) bool {
// Loopback: 127.0.0.0/8 (and ::1 via IsLoopback).
if ip.IsLoopback() {
@@ -58,6 +126,19 @@ func IsReservedIP(ip net.IP) bool {
return true
}
// Acquisition-audit SEC-009 + RED-005 (Sprint 5 ACQ, 2026-05-16).
// Opt-in RFC 1918 block. The toggle is OFF by default — the
// default certctl threat model treats RFC1918 as legitimate
// destination space. Operators on hosted IaaS where RFC1918 is
// internal trust flip this via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
if blockRFC1918Outbound.Load() {
for _, n := range rfc1918Nets {
if n.Contains(ip) {
return true
}
}
}
return false
}