mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:11:29 +00:00
5ea45a19b9
Acquisition-audit Sprint 5 ACQ closure (2026-05-16). Two
independent findings ship together because they share Load() /
main.go wiring; the closure comments tie each line to its finding.
PART A — RED-003 (agent-bootstrap deny-empty cutover)
=====================================================
Phase 2 SEC-H1 closure (2026-05-13) introduced the
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY staged feature flag with
default `false` so v2.1.x operators wouldn't get a surprise
fail-closed on upgrade. This commit flips the default to `true`
(per the staged plan in the existing CHANGELOG "Breaking changes
(scheduled for v2.2.0)" block). Operators who haven't generated a
real bootstrap token yet keep the v2.1.x warn-mode pass-through
for one upgrade window by setting
CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY=false explicitly.
Demo-mode escape hatch: CERTCTL_DEMO_MODE_ACK=true skips the
fail-closed gate so the screenshot/demo path stays one-command-up.
The accompanying boot-banner WARN at cmd/server/main.go:124-126
keeps demo mode visible in every log scraper, so this override
cannot silently re-enable warn-mode in production.
internal/config/config.go
- Load() default for AgentBootstrapTokenDenyEmpty flipped to true
- Validate() gate now also checks !c.Auth.DemoModeAck so the demo
override line up with the boot-banner WARN
- Closure comment block updated to cross-reference Sprint 5 ACQ
and the CHANGELOG v2.2.0 entry
cmd/server/main.go
- Updated boot-time WARN message to reflect the new default
(deny-empty=true) — the warn now fires only in the two
explicit override scenarios (warn-mode opt-back or demo mode),
and explains the operator action either way
- Info-line on configured-token path unchanged
PART B — SEC-009 + RED-005 (opt-in RFC1918 outbound block)
==========================================================
internal/validation/ssrf.go::IsReservedIP has always intentionally
left RFC 1918 ranges (10/8, 172.16/12, 192.168/16) NOT-reserved
because certctl is designed to manage certificates inside private
networks. For operators on hosted IaaS where RFC1918 IS internal
trust (kubeadm-default 10.96.0.0/12 service CIDR exposes the
Kubernetes API on 10.96.0.1; cloud-provider internal monitoring;
hosted-bastion subnets), this default is a real exposure path.
Add a package-level atomic.Bool toggle in internal/validation/ssrf.go
that, when on, extends IsReservedIP to ALSO return true for the
three RFC1918 ranges. Every IsReservedIP-derived path
(SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
webhook + OIDC + ACME callers) picks up the new policy
transitively without per-call-site changes.
internal/validation/ssrf.go
- blockRFC1918Outbound atomic.Bool + SetBlockRFC1918Outbound /
BlockRFC1918OutboundEnabled accessor pair
- rfc1918Nets pre-parsed at package init (panic on parse failure
surfaces a misconfigured ssrf package immediately, not via a
silently disabled toggle)
- IsReservedIP checks the toggle after the existing reserved-IP
checks
- Header comment rewritten to document the toggle + the
transitive coverage
internal/config/config.go
- New NetworkConfig sub-config; Config gains a Network field
- Load() reads CERTCTL_BLOCK_RFC1918_OUTBOUND env var (default
false; preserves the existing self-hosted threat model)
- NetworkConfig docstring lists the operator-trap (enabling this
also blocks RFC1918 from the network scanner) so an operator
cert-discovering their own RFC1918 space doesn't get a
silently-empty scan result
cmd/server/main.go
- Wires validation.SetBlockRFC1918Outbound after config.Load and
near the demo-mode banner / agent-bootstrap-token block; emits
a one-shot INFO line when the toggle is enabled so the policy
is visible in journals
Tests
=====
internal/config/config_test.go
- TestLoad_AgentBootstrapTokenDenyEmpty_DefaultIsTrue — pins the
default flip at the boot path (Load returns the flipped value)
- TestValidate_DenyEmptyDefault_RefusesWithoutToken — pins the
fail-closed behavior under the new default
- TestValidate_DenyEmptyExplicitFalse_AllowsEmpty — pins the
v2.1.x back-compat escape hatch
- TestValidate_DenyEmpty_DemoModeAckOverride_AllowsEmpty — pins
the demo-mode override
internal/validation/ssrf_test.go
- TestIsReservedIP_RFC1918_OptIn — pins toggle-off / toggle-on
behavior across all three RFC1918 ranges, edge cases
immediately outside the ranges, and the toggle-back-off path
- TestSafeHTTPDialContext_RFC1918_OptIn — pins that the toggle
reaches the dial-time SSRF check transitively (not just
IsReservedIP in isolation)
Test-helper updates (Sprint-5-induced churn):
- internal/config/config_test.go::setMinimalValidEnv now sets
CERTCTL_AGENT_BOOTSTRAP_TOKEN to a placeholder so Load()-based
tests that don't specifically exercise the empty-token gate
keep passing under the new fail-closed default. Tests that DO
exercise the empty-token path explicitly override back to "".
- internal/config/config_est_profiles_test.go +
internal/config/config_scep_profiles_test.go: same placeholder
fix for the four Load()-based EST/SCEP profile tests.
- cmd/server/main_test.go::TestMain_ServerConfigFromEnvironment +
TestMain_AuthTypeConfiguration: same fix at the main.go test
layer with prior-value restore.
Verified locally: gofmt -l clean; go vet clean; staticcheck clean
across internal/config, internal/validation, cmd/server; short
tests green on all three packages; targeted -v run of all six new
test names confirms PASS.
308 lines
11 KiB
Go
308 lines
11 KiB
Go
// Copyright 2026 certctl LLC. All rights reserved.
|
|
// SPDX-License-Identifier: BUSL-1.1
|
|
|
|
package validation
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"net"
|
|
"net/url"
|
|
"strings"
|
|
"sync/atomic"
|
|
"time"
|
|
)
|
|
|
|
// blockRFC1918Outbound is the package-level toggle for the
|
|
// acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ,
|
|
// 2026-05-16). When true, IsReservedIP additionally returns true for
|
|
// the RFC 1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16),
|
|
// which by default are NOT reserved (see the IsReservedIP header
|
|
// comment for the threat-model rationale). Operators on hosted IaaS
|
|
// where RFC1918 IS internal trust (e.g. the kubeadm-default
|
|
// 10.96.0.0/12 service CIDR exposes the Kubernetes API server on
|
|
// 10.96.0.1) opt in via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
|
|
//
|
|
// Stored as atomic.Bool so the hot-path SSRF check in
|
|
// SafeHTTPDialContext doesn't need a mutex; SetBlockRFC1918Outbound
|
|
// is the single writer (called once at boot from
|
|
// cmd/server/main.go via the config.Network.BlockRFC1918Outbound
|
|
// value) and IsReservedIP is the reader. Because the toggle is
|
|
// boot-time wiring rather than per-request runtime, the relaxed
|
|
// memory ordering of atomic.Bool is sufficient and adds no
|
|
// measurable per-call overhead.
|
|
var blockRFC1918Outbound atomic.Bool
|
|
|
|
// SetBlockRFC1918Outbound flips the package-level RFC1918-block
|
|
// toggle. Called once at boot from cmd/server/main.go after
|
|
// config.Load. Idempotent — operators can re-flip in tests by
|
|
// passing the value they want.
|
|
//
|
|
// Acquisition-audit SEC-009 + RED-005 closure (Sprint 5 ACQ).
|
|
func SetBlockRFC1918Outbound(block bool) {
|
|
blockRFC1918Outbound.Store(block)
|
|
}
|
|
|
|
// BlockRFC1918OutboundEnabled reports the current toggle state.
|
|
// Exposed so callers (e.g. operator-facing /healthz diagnostics)
|
|
// can render the effective SSRF policy without re-reading the env.
|
|
func BlockRFC1918OutboundEnabled() bool {
|
|
return blockRFC1918Outbound.Load()
|
|
}
|
|
|
|
// rfc1918Nets is the pre-parsed set of RFC 1918 CIDRs, computed once
|
|
// at package init so the IsReservedIP hot path doesn't re-parse the
|
|
// strings on every call. A `nil` entry would surface a panic at
|
|
// startup rather than silently no-op the toggle.
|
|
var rfc1918Nets = func() []*net.IPNet {
|
|
out := make([]*net.IPNet, 0, 3)
|
|
for _, cidr := range []string{"10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"} {
|
|
_, n, err := net.ParseCIDR(cidr)
|
|
if err != nil || n == nil {
|
|
panic("ssrf: failed to pre-parse RFC1918 CIDR " + cidr + ": " + err.Error())
|
|
}
|
|
out = append(out, n)
|
|
}
|
|
return out
|
|
}()
|
|
|
|
// IsReservedIP reports whether the given IP falls inside a range that
|
|
// outbound HTTP egress (and the network-scanner CIDR expander) MUST treat
|
|
// as unreachable: loopback, link-local (including cloud-provider metadata
|
|
// endpoints at 169.254.169.254), multicast, and broadcast.
|
|
//
|
|
// RFC 1918 ranges (10/8, 172.16/12, 192.168/16) are intentionally NOT
|
|
// treated as reserved by default. certctl is designed to manage
|
|
// certificates inside private networks and filtering private address
|
|
// space would break the primary use case. The default threat model is
|
|
// outbound HTTP to cloud-metadata or localhost services, not general
|
|
// network reachability.
|
|
//
|
|
// Operators on hosted IaaS where RFC1918 IS internal trust (Kubernetes
|
|
// service CIDRs that expose the API server inside RFC1918, internal-
|
|
// only monitoring stacks, etc.) can opt in via
|
|
// CERTCTL_BLOCK_RFC1918_OUTBOUND=true, which the boot path passes to
|
|
// SetBlockRFC1918Outbound. When the toggle is on, the three RFC 1918
|
|
// ranges are appended to the reserved set and every code path that
|
|
// builds on IsReservedIP (isReservedIPForDial, IsReservedIPForDial,
|
|
// SafeHTTPDialContext, ValidateSafeURL, the network scanner, the
|
|
// webhook notifier) picks up the policy transitively without per-
|
|
// call-site changes. This is acquisition-audit SEC-009 + RED-005
|
|
// closure (Sprint 5 ACQ, 2026-05-16).
|
|
//
|
|
// This function is byte-identical in behaviour to the previous unexported
|
|
// copy in internal/service/network_scan.go (for the default-off case).
|
|
// It is exported here so both the network scanner and the webhook
|
|
// notifier share a single authoritative implementation. Broader IPv6
|
|
// coverage and unspecified- address handling live in
|
|
// SafeHTTPDialContext, where stricter policy is appropriate for
|
|
// outbound HTTP egress.
|
|
func IsReservedIP(ip net.IP) bool {
|
|
// Loopback: 127.0.0.0/8 (and ::1 via IsLoopback).
|
|
if ip.IsLoopback() {
|
|
return true
|
|
}
|
|
|
|
// Link-local: 169.254.0.0/16 (includes cloud metadata 169.254.169.254).
|
|
if linkLocal := net.ParseIP("169.254.0.0"); linkLocal != nil {
|
|
if _, linkLocalNet, _ := net.ParseCIDR("169.254.0.0/16"); linkLocalNet != nil {
|
|
if linkLocalNet.Contains(ip) {
|
|
return true
|
|
}
|
|
}
|
|
}
|
|
|
|
// Multicast: 224.0.0.0/4.
|
|
if multicast := net.ParseIP("224.0.0.0"); multicast != nil {
|
|
if _, multicastNet, _ := net.ParseCIDR("224.0.0.0/4"); multicastNet != nil {
|
|
if multicastNet.Contains(ip) {
|
|
return true
|
|
}
|
|
}
|
|
}
|
|
|
|
// Broadcast: 255.255.255.255.
|
|
if ip.String() == "255.255.255.255" {
|
|
return true
|
|
}
|
|
|
|
// Acquisition-audit SEC-009 + RED-005 (Sprint 5 ACQ, 2026-05-16).
|
|
// Opt-in RFC 1918 block. The toggle is OFF by default — the
|
|
// default certctl threat model treats RFC1918 as legitimate
|
|
// destination space. Operators on hosted IaaS where RFC1918 is
|
|
// internal trust flip this via CERTCTL_BLOCK_RFC1918_OUTBOUND=true.
|
|
if blockRFC1918Outbound.Load() {
|
|
for _, n := range rfc1918Nets {
|
|
if n.Contains(ip) {
|
|
return true
|
|
}
|
|
}
|
|
}
|
|
|
|
return false
|
|
}
|
|
|
|
// IsReservedIPForDial applies IsReservedIP plus additional ranges that are
|
|
// meaningful for outbound HTTP egress but were not part of the original
|
|
// network-scanner filter: the unspecified address (0.0.0.0 / ::) and IPv6
|
|
// link-local / multicast ranges. The Phase 3 ACME HTTP-01 validator
|
|
// (internal/api/acme/validators.go) reuses this same gate so HTTP-01
|
|
// fetches can't be turned into an SSRF primitive against private-IP
|
|
// space.
|
|
func IsReservedIPForDial(ip net.IP) bool {
|
|
return isReservedIPForDial(ip)
|
|
}
|
|
|
|
// isReservedIPForDial is kept as the package-private implementation so
|
|
// every existing call site (the network scanner + ValidateSafeURL +
|
|
// the SafeHTTPDial-test helpers) stays byte-identical. The exported
|
|
// wrapper IsReservedIPForDial above is the one new callers (Phase 3
|
|
// ACME HTTP-01 validator) take.
|
|
func isReservedIPForDial(ip net.IP) bool {
|
|
if ip == nil {
|
|
return true
|
|
}
|
|
if IsReservedIP(ip) {
|
|
return true
|
|
}
|
|
if ip.IsUnspecified() {
|
|
return true
|
|
}
|
|
// IPv6 link-local fe80::/10.
|
|
if _, n, err := net.ParseCIDR("fe80::/10"); err == nil && n.Contains(ip) {
|
|
return true
|
|
}
|
|
// IPv6 multicast ff00::/8.
|
|
if _, n, err := net.ParseCIDR("ff00::/8"); err == nil && n.Contains(ip) {
|
|
return true
|
|
}
|
|
return false
|
|
}
|
|
|
|
// ValidateSafeURL parses rawURL and rejects anything that would let an
|
|
// attacker aim an outbound HTTP client at a SSRF-sensitive destination
|
|
// (CWE-918). Guards enforced:
|
|
//
|
|
// 1. The scheme must be http or https. Schemes like file://, gopher://,
|
|
// ftp://, data:, javascript:, ldap://, and dict:// are rejected outright;
|
|
// webhook delivery only speaks HTTP(S).
|
|
// 2. A hostname must be present. Empty-host URLs like "http:///foo" are
|
|
// rejected to prevent ambiguous defaulting.
|
|
// 3. If the host is a literal IP address, the IP must not be reserved
|
|
// (see isReservedIPForDial). This stops the obvious 127.0.0.1 / ::1 /
|
|
// 169.254.169.254 / 0.0.0.0 attacks at config time.
|
|
// 4. If the host is a DNS name and resolution succeeds, every resolved
|
|
// A/AAAA record must be non-reserved. A single reserved result is
|
|
// enough to reject. Resolution failure is tolerated (offline CI
|
|
// environments, short-lived test servers) — the authoritative
|
|
// enforcement runs at dial time anyway.
|
|
//
|
|
// The DNS resolution check here is a best-effort early diagnostic. The
|
|
// authoritative, TOCTOU-safe enforcement is SafeHTTPDialContext, which
|
|
// re-checks after resolution at dial time and defeats DNS rebinding.
|
|
// Callers that need SSRF-safe HTTP egress should use BOTH
|
|
// ValidateSafeURL (at config ingestion) AND SafeHTTPDialContext
|
|
// (installed on http.Transport).
|
|
func ValidateSafeURL(rawURL string) error {
|
|
if rawURL == "" {
|
|
return fmt.Errorf("url is required")
|
|
}
|
|
|
|
u, err := url.Parse(rawURL)
|
|
if err != nil {
|
|
return fmt.Errorf("invalid url: %w", err)
|
|
}
|
|
|
|
scheme := strings.ToLower(u.Scheme)
|
|
if scheme != "http" && scheme != "https" {
|
|
return fmt.Errorf("url scheme %q is not allowed; only http and https are permitted", u.Scheme)
|
|
}
|
|
|
|
host := u.Hostname()
|
|
if host == "" {
|
|
return fmt.Errorf("url must include a host")
|
|
}
|
|
|
|
// Literal IP? Reject if reserved (strict policy for outbound egress).
|
|
if ip := net.ParseIP(host); ip != nil {
|
|
if isReservedIPForDial(ip) {
|
|
return fmt.Errorf("url host resolves to a reserved address and cannot be used")
|
|
}
|
|
return nil
|
|
}
|
|
|
|
// DNS name. Resolve and reject if any answer is reserved.
|
|
ips, err := net.LookupIP(host)
|
|
if err != nil {
|
|
// Resolution failure is not itself a SSRF signal; let the dial-time
|
|
// DialContext handle the final decision. This keeps the validator
|
|
// tolerant of offline validation environments (CI, tests) while
|
|
// still blocking clearly-bad literal-IP URLs above.
|
|
return nil
|
|
}
|
|
for _, ip := range ips {
|
|
if isReservedIPForDial(ip) {
|
|
return fmt.Errorf("url host resolves to a reserved address and cannot be used")
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// SafeHTTPDialContext returns a DialContext function suitable for
|
|
// installing on an http.Transport. Every dial attempt resolves the host
|
|
// again and rejects any connection whose resolved IP lies inside a
|
|
// reserved range. This is the authoritative SSRF / DNS-rebinding guard:
|
|
// even if ValidateSafeURL was bypassed, or if DNS changed between
|
|
// validation and dial, the outbound connection will fail closed.
|
|
//
|
|
// The timeout argument bounds both the resolution and the underlying TCP
|
|
// dial. Pass 0 to use a sensible default (10s).
|
|
func SafeHTTPDialContext(timeout time.Duration) func(ctx context.Context, network, addr string) (net.Conn, error) {
|
|
if timeout <= 0 {
|
|
timeout = 10 * time.Second
|
|
}
|
|
dialer := &net.Dialer{
|
|
Timeout: timeout,
|
|
KeepAlive: 30 * time.Second,
|
|
}
|
|
return func(ctx context.Context, network, addr string) (net.Conn, error) {
|
|
host, port, err := net.SplitHostPort(addr)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("invalid dial address %q: %w", addr, err)
|
|
}
|
|
|
|
// If the host is already a literal IP, check it directly.
|
|
if ip := net.ParseIP(host); ip != nil {
|
|
if isReservedIPForDial(ip) {
|
|
return nil, fmt.Errorf("refusing to dial reserved address %s", ip.String())
|
|
}
|
|
return dialer.DialContext(ctx, network, addr)
|
|
}
|
|
|
|
// Resolve and reject any answer that lands in a reserved range.
|
|
// We then dial an explicit resolved IP so a racing DNS change
|
|
// cannot substitute a different (and possibly reserved) answer
|
|
// between our check and the actual TCP dial.
|
|
resCtx, cancel := context.WithTimeout(ctx, timeout)
|
|
defer cancel()
|
|
ips, err := (&net.Resolver{}).LookupIP(resCtx, "ip", host)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("resolve %s: %w", host, err)
|
|
}
|
|
if len(ips) == 0 {
|
|
return nil, fmt.Errorf("no addresses found for %s", host)
|
|
}
|
|
for _, ip := range ips {
|
|
if isReservedIPForDial(ip) {
|
|
return nil, fmt.Errorf("refusing to dial %s: resolves to reserved address %s", host, ip.String())
|
|
}
|
|
}
|
|
|
|
// Dial the first non-reserved resolved IP directly, pinning the
|
|
// target so later DNS changes cannot redirect us.
|
|
pinned := net.JoinHostPort(ips[0].String(), port)
|
|
return dialer.DialContext(ctx, network, pinned)
|
|
}
|
|
}
|