mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:51:30 +00:00
feat(security): Sprint 5 ACQ — RED-003 deny-empty flip + SEC-009/RED-005 RFC1918 opt-in
Acquisition-audit Sprint 5 ACQ closure (2026-05-16). Two
independent findings ship together because they share Load() /
main.go wiring; the closure comments tie each line to its finding.
PART A — RED-003 (agent-bootstrap deny-empty cutover)
=====================================================
Phase 2 SEC-H1 closure (2026-05-13) introduced the
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY staged feature flag with
default `false` so v2.1.x operators wouldn't get a surprise
fail-closed on upgrade. This commit flips the default to `true`
(per the staged plan in the existing CHANGELOG "Breaking changes
(scheduled for v2.2.0)" block). Operators who haven't generated a
real bootstrap token yet keep the v2.1.x warn-mode pass-through
for one upgrade window by setting
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false explicitly.
Demo-mode escape hatch: CERTCTL_DEMO_MODE_ACK=true skips the
fail-closed gate so the screenshot/demo path stays one-command-up.
The accompanying boot-banner WARN at cmd/server/main.go:124-126
keeps demo mode visible in every log scraper, so this override
cannot silently re-enable warn-mode in production.
internal/config/config.go
- Load() default for AgentBootstrapTokenDenyEmpty flipped to true
- Validate() gate now also checks !c.Auth.DemoModeAck so the demo
override line up with the boot-banner WARN
- Closure comment block updated to cross-reference Sprint 5 ACQ
and the CHANGELOG v2.2.0 entry
cmd/server/main.go
- Updated boot-time WARN message to reflect the new default
(deny-empty=true) — the warn now fires only in the two
explicit override scenarios (warn-mode opt-back or demo mode),
and explains the operator action either way
- Info-line on configured-token path unchanged
PART B — SEC-009 + RED-005 (opt-in RFC1918 outbound block)
==========================================================
internal/validation/ssrf.go::IsReservedIP has always intentionally
left RFC 1918 ranges (10/8, 172.16/12, 192.168/16) NOT-reserved
because certctl is designed to manage certificates inside private
networks. For operators on hosted IaaS where RFC1918 IS internal
trust (kubeadm-default 10.96.0.0/12 service CIDR exposes the
Kubernetes API on 10.96.0.1; cloud-provider internal monitoring;
hosted-bastion subnets), this default is a real exposure path.
Add a package-level atomic.Bool toggle in internal/validation/ssrf.go
that, when on, extends IsReservedIP to ALSO return true for the
three RFC1918 ranges. Every IsReservedIP-derived path
(SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
webhook + OIDC + ACME callers) picks up the new policy
transitively without per-call-site changes.
internal/validation/ssrf.go
- blockRFC1918Outbound atomic.Bool + SetBlockRFC1918Outbound /
BlockRFC1918OutboundEnabled accessor pair
- rfc1918Nets pre-parsed at package init (panic on parse failure
surfaces a misconfigured ssrf package immediately, not via a
silently disabled toggle)
- IsReservedIP checks the toggle after the existing reserved-IP
checks
- Header comment rewritten to document the toggle + the
transitive coverage
internal/config/config.go
- New NetworkConfig sub-config; Config gains a Network field
- Load() reads CERTCTL_BLOCK_RFC1918_OUTBOUND env var (default
false; preserves the existing self-hosted threat model)
- NetworkConfig docstring lists the operator-trap (enabling this
also blocks RFC1918 from the network scanner) so an operator
cert-discovering their own RFC1918 space doesn't get a
silently-empty scan result
cmd/server/main.go
- Wires validation.SetBlockRFC1918Outbound after config.Load and
near the demo-mode banner / agent-bootstrap-token block; emits
a one-shot INFO line when the toggle is enabled so the policy
is visible in journals
Tests
=====
internal/config/config_test.go
- TestLoad_AgentBootstrapTokenDenyEmpty_DefaultIsTrue — pins the
default flip at the boot path (Load returns the flipped value)
- TestValidate_DenyEmptyDefault_RefusesWithoutToken — pins the
fail-closed behavior under the new default
- TestValidate_DenyEmptyExplicitFalse_AllowsEmpty — pins the
v2.1.x back-compat escape hatch
- TestValidate_DenyEmpty_DemoModeAckOverride_AllowsEmpty — pins
the demo-mode override
internal/validation/ssrf_test.go
- TestIsReservedIP_RFC1918_OptIn — pins toggle-off / toggle-on
behavior across all three RFC1918 ranges, edge cases
immediately outside the ranges, and the toggle-back-off path
- TestSafeHTTPDialContext_RFC1918_OptIn — pins that the toggle
reaches the dial-time SSRF check transitively (not just
IsReservedIP in isolation)
Test-helper updates (Sprint-5-induced churn):
- internal/config/config_test.go::setMinimalValidEnv now sets
CERTCTL_AGENT_BOOTSTRAP_TOKEN to a placeholder so Load()-based
tests that don't specifically exercise the empty-token gate
keep passing under the new fail-closed default. Tests that DO
exercise the empty-token path explicitly override back to "".
- internal/config/config_est_profiles_test.go +
internal/config/config_scep_profiles_test.go: same placeholder
fix for the four Load()-based EST/SCEP profile tests.
- cmd/server/main_test.go::TestMain_ServerConfigFromEnvironment +
TestMain_AuthTypeConfiguration: same fix at the main.go test
layer with prior-value restore.
Verified locally: gofmt -l clean; go vet clean; staticcheck clean
across internal/config, internal/validation, cmd/server; short
tests green on all three packages; targeted -v run of all six new
test names confirms PASS.
This commit is contained in:
@@ -9,26 +9,94 @@ import (
|
||||
"net"
|
||||
"net/url"
|
||||
"strings"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
)
|
||||
|
||||
// blockRFC1918Outbound is the package-level toggle for the
|
||||
// acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
|
||||
// 2026-05-16). When true, IsReservedIP additionally returns true for
|
||||
// the RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16),
|
||||
// which by default are NOT reserved (see the IsReservedIP header
|
||||
// comment for the threat-model rationale). Operators on hosted IaaS
|
||||
// where RFC1918 IS internal trust (e.g. the kubeadm-default
|
||||
// 10.96.0.0/12 service CIDR exposes the Kubernetes API server on
|
||||
// 10.96.0.1) opt in via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
|
||||
//
|
||||
// Stored as atomic.Bool so the hot-path SSRF check in
|
||||
// SafeHTTPDialContext doesn't need a mutex; SetBlockRFC1918Outbound
|
||||
// is the single writer (called once at boot from
|
||||
// cmd/server/main.go via the config.Network.BlockRFC1918Outbound
|
||||
// value) and IsReservedIP is the reader. Because the toggle is
|
||||
// boot-time wiring rather than per-request runtime, the relaxed
|
||||
// memory ordering of atomic.Bool is sufficient and adds no
|
||||
// measurable per-call overhead.
|
||||
var blockRFC1918Outbound atomic.Bool
|
||||
|
||||
// SetBlockRFC1918Outbound flips the package-level RFC1918-block
|
||||
// toggle. Called once at boot from cmd/server/main.go after
|
||||
// config.Load. Idempotent — operators can re-flip in tests by
|
||||
// passing the value they want.
|
||||
//
|
||||
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ).
|
||||
func SetBlockRFC1918Outbound(block bool) {
|
||||
blockRFC1918Outbound.Store(block)
|
||||
}
|
||||
|
||||
// BlockRFC1918OutboundEnabled reports the current toggle state.
|
||||
// Exposed so callers (e.g. operator-facing /healthz diagnostics)
|
||||
// can render the effective SSRF policy without re-reading the env.
|
||||
func BlockRFC1918OutboundEnabled() bool {
|
||||
return blockRFC1918Outbound.Load()
|
||||
}
|
||||
|
||||
// rfc1918Nets is the pre-parsed set of RFC 1918 CIDRs, computed once
|
||||
// at package init so the IsReservedIP hot path doesn't re-parse the
|
||||
// strings on every call. A `nil` entry would surface a panic at
|
||||
// startup rather than silently no-op the toggle.
|
||||
var rfc1918Nets = func() []*net.IPNet {
|
||||
out := make([]*net.IPNet, 0, 3)
|
||||
for _, cidr := range []string{"10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"} {
|
||||
_, n, err := net.ParseCIDR(cidr)
|
||||
if err != nil || n == nil {
|
||||
panic("ssrf: failed to pre-parse RFC1918 CIDR " + cidr + ": " + err.Error())
|
||||
}
|
||||
out = append(out, n)
|
||||
}
|
||||
return out
|
||||
}()
|
||||
|
||||
// IsReservedIP reports whether the given IP falls inside a range that
|
||||
// outbound HTTP egress (and the network-scanner CIDR expander) MUST treat
|
||||
// as unreachable: loopback, link-local (including cloud-provider metadata
|
||||
// endpoints at 169.254.169.254), multicast, and broadcast.
|
||||
//
|
||||
// RFC 1918 ranges (10/8, 172.16/12, 192.168/16) are intentionally NOT
|
||||
// treated as reserved. certctl is designed to manage certificates inside
|
||||
// private networks and filtering private address space would break the
|
||||
// primary use case. The threat model here is outbound HTTP to
|
||||
// cloud-metadata or localhost services, not general network reachability.
|
||||
// treated as reserved by default. certctl is designed to manage
|
||||
// certificates inside private networks and filtering private address
|
||||
// space would break the primary use case. The default threat model is
|
||||
// outbound HTTP to cloud-metadata or localhost services, not general
|
||||
// network reachability.
|
||||
//
|
||||
// Operators on hosted IaaS where RFC1918 IS internal trust (Kubernetes
|
||||
// service CIDRs that expose the API server inside RFC1918, internal-
|
||||
// only monitoring stacks, etc.) can opt in via
|
||||
// CERTCTL_BLOCK_RFC1918_OUTBOUND=true, which the boot path passes to
|
||||
// SetBlockRFC1918Outbound. When the toggle is on, the three RFC 1918
|
||||
// ranges are appended to the reserved set and every code path that
|
||||
// builds on IsReservedIP (isReservedIPForDial, IsReservedIPForDial,
|
||||
// SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
|
||||
// webhook notifier) picks up the policy transitively without per-
|
||||
// call-site changes. This is acquisition-audit SEC-009 + RED-005
|
||||
// closure (Sprint 5 ACQ, 2026-05-16).
|
||||
//
|
||||
// This function is byte-identical in behaviour to the previous unexported
|
||||
// copy in internal/service/network_scan.go. It is exported here so both
|
||||
// the network scanner and the webhook notifier share a single
|
||||
// authoritative implementation. Broader IPv6 coverage and unspecified-
|
||||
// address handling live in SafeHTTPDialContext, where stricter policy is
|
||||
// appropriate for outbound HTTP egress.
|
||||
// copy in internal/service/network_scan.go (for the default-off case).
|
||||
// It is exported here so both the network scanner and the webhook
|
||||
// notifier share a single authoritative implementation. Broader IPv6
|
||||
// coverage and unspecified- address handling live in
|
||||
// SafeHTTPDialContext, where stricter policy is appropriate for
|
||||
// outbound HTTP egress.
|
||||
func IsReservedIP(ip net.IP) bool {
|
||||
// Loopback: 127.0.0.0/8 (and ::1 via IsLoopback).
|
||||
if ip.IsLoopback() {
|
||||
@@ -58,6 +126,19 @@ func IsReservedIP(ip net.IP) bool {
|
||||
return true
|
||||
}
|
||||
|
||||
// Acquisition-audit SEC-009 + RED-005 (Sprint 5 ACQ, 2026-05-16).
|
||||
// Opt-in RFC 1918 block. The toggle is OFF by default — the
|
||||
// default certctl threat model treats RFC1918 as legitimate
|
||||
// destination space. Operators on hosted IaaS where RFC1918 is
|
||||
// internal trust flip this via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
|
||||
if blockRFC1918Outbound.Load() {
|
||||
for _, n := range rfc1918Nets {
|
||||
if n.Contains(ip) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
|
||||
@@ -228,3 +228,70 @@ func TestSafeHTTPDialContext_DefaultTimeoutWhenZero(t *testing.T) {
|
||||
t.Fatal("expected reserved-address rejection")
|
||||
}
|
||||
}
|
||||
|
||||
// TestIsReservedIP_RFC1918_OptIn pins the Sprint 5 ACQ SEC-009 + RED-005
|
||||
// closure (2026-05-16). With the default-off toggle, RFC1918 stays
|
||||
// allowed (the certctl threat-model default). After
|
||||
// SetBlockRFC1918Outbound(true), the three RFC1918 ranges flip to
|
||||
// reserved and every IsReservedIP-derived path (isReservedIPForDial,
|
||||
// SafeHTTPDialContext, ValidateSafeURL, the network scanner) picks
|
||||
// up the new policy transitively. The defer restores the package-level
|
||||
// state so subsequent tests don't observe the flipped toggle.
|
||||
func TestIsReservedIP_RFC1918_OptIn(t *testing.T) {
|
||||
prior := BlockRFC1918OutboundEnabled()
|
||||
t.Cleanup(func() { SetBlockRFC1918Outbound(prior) })
|
||||
|
||||
// Default-off: RFC1918 stays non-reserved.
|
||||
SetBlockRFC1918Outbound(false)
|
||||
for _, addr := range []string{"10.0.0.1", "172.16.0.1", "192.168.1.1"} {
|
||||
ip := net.ParseIP(addr)
|
||||
if IsReservedIP(ip) {
|
||||
t.Errorf("default-off: IsReservedIP(%s)=true; want false", addr)
|
||||
}
|
||||
}
|
||||
// Toggle on: same three ranges flip to reserved.
|
||||
SetBlockRFC1918Outbound(true)
|
||||
for _, addr := range []string{"10.0.0.1", "10.255.255.254", "172.16.0.1", "172.31.255.254", "192.168.0.1", "192.168.255.254"} {
|
||||
ip := net.ParseIP(addr)
|
||||
if !IsReservedIP(ip) {
|
||||
t.Errorf("toggle-on: IsReservedIP(%s)=false; want true", addr)
|
||||
}
|
||||
}
|
||||
// Edge: a public address right outside RFC1918 (172.32.0.0/12
|
||||
// boundary) must STAY non-reserved with the toggle on.
|
||||
for _, addr := range []string{"172.32.0.1", "11.0.0.1", "192.169.0.1", "9.9.9.9", "8.8.8.8"} {
|
||||
ip := net.ParseIP(addr)
|
||||
if IsReservedIP(ip) {
|
||||
t.Errorf("toggle-on edge: IsReservedIP(%s)=true; want false (just outside RFC1918)", addr)
|
||||
}
|
||||
}
|
||||
// Toggle back off: RFC1918 returns to non-reserved.
|
||||
SetBlockRFC1918Outbound(false)
|
||||
for _, addr := range []string{"10.0.0.1", "172.16.0.1", "192.168.1.1"} {
|
||||
ip := net.ParseIP(addr)
|
||||
if IsReservedIP(ip) {
|
||||
t.Errorf("toggle-off after on: IsReservedIP(%s)=true; want false", addr)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestSafeHTTPDialContext_RFC1918_OptIn pins that the toggle reaches
|
||||
// the SafeHTTPDialContext path transitively (not just IsReservedIP in
|
||||
// isolation). With toggle off, dialing 10.0.0.1 hits the connection-
|
||||
// level error (refused/timeout), NOT the "refusing to dial reserved
|
||||
// address" error. With toggle on, the dial fails closed at the
|
||||
// reserved-address check BEFORE attempting a TCP SYN.
|
||||
func TestSafeHTTPDialContext_RFC1918_OptIn(t *testing.T) {
|
||||
prior := BlockRFC1918OutboundEnabled()
|
||||
t.Cleanup(func() { SetBlockRFC1918Outbound(prior) })
|
||||
|
||||
SetBlockRFC1918Outbound(true)
|
||||
dial := SafeHTTPDialContext(2 * time.Second)
|
||||
_, err := dial(context.Background(), "tcp", "10.0.0.1:1")
|
||||
if err == nil {
|
||||
t.Fatal("toggle-on: expected reserved-address rejection for 10.0.0.1")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "refusing to dial reserved address") {
|
||||
t.Errorf("toggle-on: expected reserved-address message; got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user