Files
certctl/internal/validation/ssrf.go
T
shankar0123 5ea45a19b9 feat(security): Sprint 5 ACQ — RED-003 deny-empty flip + SEC-009/RED-005 RFC1918 opt-in
Acquisition-audit Sprint 5 ACQ closure (2026-05-16). Two
independent findings ship together because they share Load() /
main.go wiring; the closure comments tie each line to its finding.

PART A — RED-003 (agent-bootstrap deny-empty cutover)
=====================================================

Phase 2 SEC-H1 closure (2026-05-13) introduced the
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY staged feature flag with
default `false` so v2.1.x operators wouldn't get a surprise
fail-closed on upgrade. This commit flips the default to `true`
(per the staged plan in the existing CHANGELOG "Breaking changes
(scheduled for v2.2.0)" block). Operators who haven't generated a
real bootstrap token yet keep the v2.1.x warn-mode pass-through
for one upgrade window by setting
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false explicitly.

Demo-mode escape hatch: CERTCTL_DEMO_MODE_ACK=true skips the
fail-closed gate so the screenshot/demo path stays one-command-up.
The accompanying boot-banner WARN at cmd/server/main.go:124-126
keeps demo mode visible in every log scraper, so this override
cannot silently re-enable warn-mode in production.

internal/config/config.go
  - Load() default for AgentBootstrapTokenDenyEmpty flipped to true
  - Validate() gate now also checks !c.Auth.DemoModeAck so the demo
    override line up with the boot-banner WARN
  - Closure comment block updated to cross-reference Sprint 5 ACQ
    and the CHANGELOG v2.2.0 entry

cmd/server/main.go
  - Updated boot-time WARN message to reflect the new default
    (deny-empty=true) — the warn now fires only in the two
    explicit override scenarios (warn-mode opt-back or demo mode),
    and explains the operator action either way
  - Info-line on configured-token path unchanged

PART B — SEC-009 + RED-005 (opt-in RFC1918 outbound block)
==========================================================

internal/validation/ssrf.go::IsReservedIP has always intentionally
left RFC 1918 ranges (10/8, 172.16/12, 192.168/16) NOT-reserved
because certctl is designed to manage certificates inside private
networks. For operators on hosted IaaS where RFC1918 IS internal
trust (kubeadm-default 10.96.0.0/12 service CIDR exposes the
Kubernetes API on 10.96.0.1; cloud-provider internal monitoring;
hosted-bastion subnets), this default is a real exposure path.

Add a package-level atomic.Bool toggle in internal/validation/ssrf.go
that, when on, extends IsReservedIP to ALSO return true for the
three RFC1918 ranges. Every IsReservedIP-derived path
(SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
webhook + OIDC + ACME callers) picks up the new policy
transitively without per-call-site changes.

internal/validation/ssrf.go
  - blockRFC1918Outbound atomic.Bool + SetBlockRFC1918Outbound /
    BlockRFC1918OutboundEnabled accessor pair
  - rfc1918Nets pre-parsed at package init (panic on parse failure
    surfaces a misconfigured ssrf package immediately, not via a
    silently disabled toggle)
  - IsReservedIP checks the toggle after the existing reserved-IP
    checks
  - Header comment rewritten to document the toggle + the
    transitive coverage

internal/config/config.go
  - New NetworkConfig sub-config; Config gains a Network field
  - Load() reads CERTCTL_BLOCK_RFC1918_OUTBOUND env var (default
    false; preserves the existing self-hosted threat model)
  - NetworkConfig docstring lists the operator-trap (enabling this
    also blocks RFC1918 from the network scanner) so an operator
    cert-discovering their own RFC1918 space doesn't get a
    silently-empty scan result

cmd/server/main.go
  - Wires validation.SetBlockRFC1918Outbound after config.Load and
    near the demo-mode banner / agent-bootstrap-token block; emits
    a one-shot INFO line when the toggle is enabled so the policy
    is visible in journals

Tests
=====

internal/config/config_test.go
  - TestLoad_AgentBootstrapTokenDenyEmpty_DefaultIsTrue — pins the
    default flip at the boot path (Load returns the flipped value)
  - TestValidate_DenyEmptyDefault_RefusesWithoutToken — pins the
    fail-closed behavior under the new default
  - TestValidate_DenyEmptyExplicitFalse_AllowsEmpty — pins the
    v2.1.x back-compat escape hatch
  - TestValidate_DenyEmpty_DemoModeAckOverride_AllowsEmpty — pins
    the demo-mode override

internal/validation/ssrf_test.go
  - TestIsReservedIP_RFC1918_OptIn — pins toggle-off / toggle-on
    behavior across all three RFC1918 ranges, edge cases
    immediately outside the ranges, and the toggle-back-off path
  - TestSafeHTTPDialContext_RFC1918_OptIn — pins that the toggle
    reaches the dial-time SSRF check transitively (not just
    IsReservedIP in isolation)

Test-helper updates (Sprint-5-induced churn):
  - internal/config/config_test.go::setMinimalValidEnv now sets
    CERTCTL_AGENT_BOOTSTRAP_TOKEN to a placeholder so Load()-based
    tests that don't specifically exercise the empty-token gate
    keep passing under the new fail-closed default. Tests that DO
    exercise the empty-token path explicitly override back to "".
  - internal/config/config_est_profiles_test.go +
    internal/config/config_scep_profiles_test.go: same placeholder
    fix for the four Load()-based EST/SCEP profile tests.
  - cmd/server/main_test.go::TestMain_ServerConfigFromEnvironment +
    TestMain_AuthTypeConfiguration: same fix at the main.go test
    layer with prior-value restore.

Verified locally: gofmt -l clean; go vet clean; staticcheck clean
across internal/config, internal/validation, cmd/server; short
tests green on all three packages; targeted -v run of all six new
test names confirms PASS.
2026-05-16 19:13:52 +00:00

308 lines
11 KiB
Go

// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package validation
import (
"context"
"fmt"
"net"
"net/url"
"strings"
"sync/atomic"
"time"
)
// blockRFC1918Outbound is the package-level toggle for the
// acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
// 2026-05-16). When true, IsReservedIP additionally returns true for
// the RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16),
// which by default are NOT reserved (see the IsReservedIP header
// comment for the threat-model rationale). Operators on hosted IaaS
// where RFC1918 IS internal trust (e.g. the kubeadm-default
// 10.96.0.0/12 service CIDR exposes the Kubernetes API server on
// 10.96.0.1) opt in via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
//
// Stored as atomic.Bool so the hot-path SSRF check in
// SafeHTTPDialContext doesn't need a mutex; SetBlockRFC1918Outbound
// is the single writer (called once at boot from
// cmd/server/main.go via the config.Network.BlockRFC1918Outbound
// value) and IsReservedIP is the reader. Because the toggle is
// boot-time wiring rather than per-request runtime, the relaxed
// memory ordering of atomic.Bool is sufficient and adds no
// measurable per-call overhead.
var blockRFC1918Outbound atomic.Bool
// SetBlockRFC1918Outbound flips the package-level RFC1918-block
// toggle. Called once at boot from cmd/server/main.go after
// config.Load. Idempotent — operators can re-flip in tests by
// passing the value they want.
//
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ).
func SetBlockRFC1918Outbound(block bool) {
blockRFC1918Outbound.Store(block)
}
// BlockRFC1918OutboundEnabled reports the current toggle state.
// Exposed so callers (e.g. operator-facing /healthz diagnostics)
// can render the effective SSRF policy without re-reading the env.
func BlockRFC1918OutboundEnabled() bool {
return blockRFC1918Outbound.Load()
}
// rfc1918Nets is the pre-parsed set of RFC 1918 CIDRs, computed once
// at package init so the IsReservedIP hot path doesn't re-parse the
// strings on every call. A `nil` entry would surface a panic at
// startup rather than silently no-op the toggle.
var rfc1918Nets = func() []*net.IPNet {
out := make([]*net.IPNet, 0, 3)
for _, cidr := range []string{"10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"} {
_, n, err := net.ParseCIDR(cidr)
if err != nil || n == nil {
panic("ssrf: failed to pre-parse RFC1918 CIDR " + cidr + ": " + err.Error())
}
out = append(out, n)
}
return out
}()
// IsReservedIP reports whether the given IP falls inside a range that
// outbound HTTP egress (and the network-scanner CIDR expander) MUST treat
// as unreachable: loopback, link-local (including cloud-provider metadata
// endpoints at 169.254.169.254), multicast, and broadcast.
//
// RFC 1918 ranges (10/8, 172.16/12, 192.168/16) are intentionally NOT
// treated as reserved by default. certctl is designed to manage
// certificates inside private networks and filtering private address
// space would break the primary use case. The default threat model is
// outbound HTTP to cloud-metadata or localhost services, not general
// network reachability.
//
// Operators on hosted IaaS where RFC1918 IS internal trust (Kubernetes
// service CIDRs that expose the API server inside RFC1918, internal-
// only monitoring stacks, etc.) can opt in via
// CERTCTL_BLOCK_RFC1918_OUTBOUND=true, which the boot path passes to
// SetBlockRFC1918Outbound. When the toggle is on, the three RFC 1918
// ranges are appended to the reserved set and every code path that
// builds on IsReservedIP (isReservedIPForDial, IsReservedIPForDial,
// SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
// webhook notifier) picks up the policy transitively without per-
// call-site changes. This is acquisition-audit SEC-009 + RED-005
// closure (Sprint 5 ACQ, 2026-05-16).
//
// This function is byte-identical in behaviour to the previous unexported
// copy in internal/service/network_scan.go (for the default-off case).
// It is exported here so both the network scanner and the webhook
// notifier share a single authoritative implementation. Broader IPv6
// coverage and unspecified- address handling live in
// SafeHTTPDialContext, where stricter policy is appropriate for
// outbound HTTP egress.
func IsReservedIP(ip net.IP) bool {
// Loopback: 127.0.0.0/8 (and ::1 via IsLoopback).
if ip.IsLoopback() {
return true
}
// Link-local: 169.254.0.0/16 (includes cloud metadata 169.254.169.254).
if linkLocal := net.ParseIP("169.254.0.0"); linkLocal != nil {
if _, linkLocalNet, _ := net.ParseCIDR("169.254.0.0/16"); linkLocalNet != nil {
if linkLocalNet.Contains(ip) {
return true
}
}
}
// Multicast: 224.0.0.0/4.
if multicast := net.ParseIP("224.0.0.0"); multicast != nil {
if _, multicastNet, _ := net.ParseCIDR("224.0.0.0/4"); multicastNet != nil {
if multicastNet.Contains(ip) {
return true
}
}
}
// Broadcast: 255.255.255.255.
if ip.String() == "255.255.255.255" {
return true
}
// Acquisition-audit SEC-009 + RED-005 (Sprint 5 ACQ, 2026-05-16).
// Opt-in RFC 1918 block. The toggle is OFF by default — the
// default certctl threat model treats RFC1918 as legitimate
// destination space. Operators on hosted IaaS where RFC1918 is
// internal trust flip this via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
if blockRFC1918Outbound.Load() {
for _, n := range rfc1918Nets {
if n.Contains(ip) {
return true
}
}
}
return false
}
// IsReservedIPForDial applies IsReservedIP plus additional ranges that are
// meaningful for outbound HTTP egress but were not part of the original
// network-scanner filter: the unspecified address (0.0.0.0 / ::) and IPv6
// link-local / multicast ranges. The Phase 3 ACME HTTP-01 validator
// (internal/api/acme/validators.go) reuses this same gate so HTTP-01
// fetches can't be turned into an SSRF primitive against private-IP
// space.
func IsReservedIPForDial(ip net.IP) bool {
return isReservedIPForDial(ip)
}
// isReservedIPForDial is kept as the package-private implementation so
// every existing call site (the network scanner + ValidateSafeURL +
// the SafeHTTPDial-test helpers) stays byte-identical. The exported
// wrapper IsReservedIPForDial above is the one new callers (Phase 3
// ACME HTTP-01 validator) take.
func isReservedIPForDial(ip net.IP) bool {
if ip == nil {
return true
}
if IsReservedIP(ip) {
return true
}
if ip.IsUnspecified() {
return true
}
// IPv6 link-local fe80::/10.
if _, n, err := net.ParseCIDR("fe80::/10"); err == nil && n.Contains(ip) {
return true
}
// IPv6 multicast ff00::/8.
if _, n, err := net.ParseCIDR("ff00::/8"); err == nil && n.Contains(ip) {
return true
}
return false
}
// ValidateSafeURL parses rawURL and rejects anything that would let an
// attacker aim an outbound HTTP client at a SSRF-sensitive destination
// (CWE-918). Guards enforced:
//
// 1. The scheme must be http or https. Schemes like file://, gopher://,
// ftp://, data:, javascript:, ldap://, and dict:// are rejected outright;
// webhook delivery only speaks HTTP(S).
// 2. A hostname must be present. Empty-host URLs like "http:///foo" are
// rejected to prevent ambiguous defaulting.
// 3. If the host is a literal IP address, the IP must not be reserved
// (see isReservedIPForDial). This stops the obvious 127.0.0.1 / ::1 /
// 169.254.169.254 / 0.0.0.0 attacks at config time.
// 4. If the host is a DNS name and resolution succeeds, every resolved
// A/AAAA record must be non-reserved. A single reserved result is
// enough to reject. Resolution failure is tolerated (offline CI
// environments, short-lived test servers) — the authoritative
// enforcement runs at dial time anyway.
//
// The DNS resolution check here is a best-effort early diagnostic. The
// authoritative, TOCTOU-safe enforcement is SafeHTTPDialContext, which
// re-checks after resolution at dial time and defeats DNS rebinding.
// Callers that need SSRF-safe HTTP egress should use BOTH
// ValidateSafeURL (at config ingestion) AND SafeHTTPDialContext
// (installed on http.Transport).
func ValidateSafeURL(rawURL string) error {
if rawURL == "" {
return fmt.Errorf("url is required")
}
u, err := url.Parse(rawURL)
if err != nil {
return fmt.Errorf("invalid url: %w", err)
}
scheme := strings.ToLower(u.Scheme)
if scheme != "http" && scheme != "https" {
return fmt.Errorf("url scheme %q is not allowed; only http and https are permitted", u.Scheme)
}
host := u.Hostname()
if host == "" {
return fmt.Errorf("url must include a host")
}
// Literal IP? Reject if reserved (strict policy for outbound egress).
if ip := net.ParseIP(host); ip != nil {
if isReservedIPForDial(ip) {
return fmt.Errorf("url host resolves to a reserved address and cannot be used")
}
return nil
}
// DNS name. Resolve and reject if any answer is reserved.
ips, err := net.LookupIP(host)
if err != nil {
// Resolution failure is not itself a SSRF signal; let the dial-time
// DialContext handle the final decision. This keeps the validator
// tolerant of offline validation environments (CI, tests) while
// still blocking clearly-bad literal-IP URLs above.
return nil
}
for _, ip := range ips {
if isReservedIPForDial(ip) {
return fmt.Errorf("url host resolves to a reserved address and cannot be used")
}
}
return nil
}
// SafeHTTPDialContext returns a DialContext function suitable for
// installing on an http.Transport. Every dial attempt resolves the host
// again and rejects any connection whose resolved IP lies inside a
// reserved range. This is the authoritative SSRF / DNS-rebinding guard:
// even if ValidateSafeURL was bypassed, or if DNS changed between
// validation and dial, the outbound connection will fail closed.
//
// The timeout argument bounds both the resolution and the underlying TCP
// dial. Pass 0 to use a sensible default (10s).
func SafeHTTPDialContext(timeout time.Duration) func(ctx context.Context, network, addr string) (net.Conn, error) {
if timeout <= 0 {
timeout = 10 * time.Second
}
dialer := &net.Dialer{
Timeout: timeout,
KeepAlive: 30 * time.Second,
}
return func(ctx context.Context, network, addr string) (net.Conn, error) {
host, port, err := net.SplitHostPort(addr)
if err != nil {
return nil, fmt.Errorf("invalid dial address %q: %w", addr, err)
}
// If the host is already a literal IP, check it directly.
if ip := net.ParseIP(host); ip != nil {
if isReservedIPForDial(ip) {
return nil, fmt.Errorf("refusing to dial reserved address %s", ip.String())
}
return dialer.DialContext(ctx, network, addr)
}
// Resolve and reject any answer that lands in a reserved range.
// We then dial an explicit resolved IP so a racing DNS change
// cannot substitute a different (and possibly reserved) answer
// between our check and the actual TCP dial.
resCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
ips, err := (&net.Resolver{}).LookupIP(resCtx, "ip", host)
if err != nil {
return nil, fmt.Errorf("resolve %s: %w", host, err)
}
if len(ips) == 0 {
return nil, fmt.Errorf("no addresses found for %s", host)
}
for _, ip := range ips {
if isReservedIPForDial(ip) {
return nil, fmt.Errorf("refusing to dial %s: resolves to reserved address %s", host, ip.String())
}
}
// Dial the first non-reserved resolved IP directly, pinning the
// target so later DNS changes cannot redirect us.
pinned := net.JoinHostPort(ips[0].String(), port)
return dialer.DialContext(ctx, network, pinned)
}
}