EST RFC 7030 hardening master bundle Phases 2-4: end-to-end mTLS sibling

route + RFC 9266 channel binding + HTTP Basic enrollment-password +
per-source-IP failed-auth limit + per-(CN, sourceIP) sliding-window cap.

Two new shared packages so EST + Intune share infrastructure:
- internal/cms/ — RFC 9266 tls-exporter extractor (ExtractTLSExporter
  with stdlib-panic recovery for synthetic ConnectionStates) +
  CSR-side channel-binding parser via raw TBSCertificationRequestInfo
  walk (the stdlib's csr.Attributes can't represent the OCTET STRING
  binding value), VerifyChannelBinding composite, EmbedChannel-
  BindingAttribute fixture helper, typed sentinel errors for missing
  / mismatch / not-TLS-1.3 mapped to HTTP 400 / 409 / 426 in handler.
- internal/trustanchor/ — extracted from scep/intune/trust_anchor*.go
  so the EST mTLS sibling route + Intune dispatcher share the same
  SIGHUP-reloadable PEM bundle primitive. intune.TrustAnchorHolder
  is now `= trustanchor.Holder` (type alias) + NewTrustAnchorHolder =
  trustanchor.New (function alias) — every existing call site compiles
  unchanged. Intune's LoadTrustAnchor is a thin wrapper over
  trustanchor.LoadBundle. White-box tests moved to the new package.
- internal/ratelimit/ — extracted from scep/intune/rate_limit.go (this
  was Phase 4.1, in the same bundle). intune.PerDeviceRateLimiter
  is now a thin wrapper preserving the (subject, issuer)→key
  composition; EST handler reaches for SlidingWindowLimiter directly.

ESTHandler grew six optional fields wired by per-profile setters
(SetMTLSTrust / SetChannelBindingRequired / SetEnrollmentPassword /
SetSourceIPRateLimiter / SetPerPrincipalRateLimiter / SetLabelForLog)
plus four new mTLS-route methods (CACertsMTLS / SimpleEnrollMTLS /
SimpleReEnrollMTLS / CSRAttrsMTLS); shared internal pipeline
handleEnrollOrReEnroll(reEnroll, viaMTLS) keeps the auth/binding/
rate-limit gates DRY. New router method RegisterESTMTLSHandlers
registers /.well-known/est-mtls/<PathID>/{cacerts,simpleenroll,
simplereenroll,csrattrs}; AuthExemptDispatchPrefixes extends the
no-auth chain to /.well-known/est-mtls.

cmd/server/main.go's EST loop wires per-profile mTLS holder +
channel-binding policy + per-principal limiter + (when EnrollmentPassword
non-empty) Basic + source-IP limiter; new preflightESTMTLSClientCATrust-
Bundle returns *trustanchor.Holder so SIGHUP rotates the EST mTLS
bundle live without restart. SCEP + EST mTLS profiles now share a
single union mtlsUnionPoolForTLS passed to buildServerTLSConfigWithMTLS
(replaces the protocol-specific scepMTLSUnionPoolForTLS); per-handler
re-verify enforces "cert must chain to THIS profile's bundle" so
cross-protocol bleed is blocked at the application layer even though
the TLS layer trusts certs from either pool's union.

Phase 3.3 source-IP failed-Basic limiter defaults: 10 attempts / 1h
/ 50k tracked IPs (no env var; tunable in a follow-up). Phase 4.2
per-principal limiter cap from CERTCTL_EST_PROFILE_<NAME>_RATE_
LIMIT_PER_PRINCIPAL_24H (existing field, Phase 1 shipped).

New tests:
- internal/cms/channelbinding_test.go: extractor + CSR-side parser +
  composite + TLS-1.3 round-trip end-to-end + EmbedChannelBinding-
  Attribute round-trip
- internal/trustanchor/holder_test.go: parseBundlePEM white-box +
  LoadBundle + Holder Get/Pool/SetLabelForLog/Reload-happy/
  Reload-keeps-old-on-failure/Reload-keeps-old-on-expired/
  WatchSIGHUP-reloads-pool/WatchSIGHUP-stop-clean
- internal/api/handler/est_hardening_test.go: 16 named cases covering
  mTLS no-trust-pool 500 + no-cert 401 + cross-profile cert 401 +
  happy-path 200 + CACertsMTLS auth gate + CSRAttrsMTLS auth gate +
  channel-binding required-absent-rejected + not-required-absent-
  allowed + writeChannelBindingError mapping + Basic no-header 401
  + Basic wrong-password 401 + Basic correct-200 + Basic-no-password
  no-gate + per-IP failed-attempt lockout 429 + per-principal
  blocks-after-cap + different-principals-independent + no-limiter-
  unbounded.

Pre-commit verification (sandbox): gofmt clean, go vet clean
(excluding repository/postgres which the sandbox can't build —
disk-space testcontainers download), staticcheck clean for
cms/trustanchor/api/handler/api/router/scep/intune/ratelimit/
cmd/server, go test -short -count=1 green for cms/trustanchor/
api/handler/api/router/scep/intune/ratelimit/service. G-3
docs-drift guard reproduced locally clean (Phase 1 already
documented every new env var; Phases 2-4 added zero new env vars).
This commit is contained in:
shankar0123
2026-04-29 23:15:35 +00:00
parent 8cc1153bd9
commit aa139ee0d9
17 changed files with 3273 additions and 728 deletions
+46 -152
View File
@@ -1,193 +1,87 @@
package intune
import (
"errors"
"sync"
"time"
"github.com/shankar0123/certctl/internal/ratelimit"
)
// SCEP RFC 8894 + Intune master bundle Phase 8.6.
//
// PerDeviceRateLimiter is the second line of defense behind the replay cache
// from Phase 7. The replay cache catches the same challenge being submitted
// twice (within the challenge TTL); this rate limiter catches a compromised
// Connector signing key (or a stolen key+cert pair) issuing many DIFFERENT
// valid challenges for the same device subject in a short window.
// PerDeviceRateLimiter is the second line of defense behind the replay
// cache from Phase 7. The replay cache catches the same challenge being
// submitted twice (within the challenge TTL); this rate limiter catches a
// compromised Connector signing key (or a stolen key+cert pair) issuing
// many DIFFERENT valid challenges for the same device subject in a short
// window.
//
// Threat model:
//
// - Replay cache (Phase 7): nonce-keyed; catches duplicate submission.
// - This limiter: (Subject, Issuer)-keyed; catches enrollment-flooding.
//
// Default: 3 enrollments per (device GUID, Connector identity) per 24h.
// EST RFC 7030 hardening master bundle Phase 4.1: the implementation that
// used to live in this file was extracted to internal/ratelimit (where it
// can be shared with EST per-principal + EST HTTP-Basic source-IP rate
// limiters). PerDeviceRateLimiter is now a thin wrapper around
// ratelimit.SlidingWindowLimiter that preserves the original
// (subject, issuer) → key composition in the Allow signature so existing
// SCEP/Intune callers don't have to change.
//
// Sizing: 100,000 distinct device entries (matches the replay cache cap).
// At-cap: oldest entry evicted (small janitor pass) to avoid unbounded
// memory growth on a fleet that grows past the cap.
//
// Why a hand-rolled token bucket instead of pulling in golang.org/x/time/rate:
// the rate package is in go.sum as an indirect transitive but NOT a direct
// dep. Adding it would create a new direct dep relationship for ~30 LoC of
// state machine. The hand-rolled version below uses only stdlib (sync.Mutex
// + time.Time arithmetic) and is small enough to fit on one screen.
//
// Algorithm: each (Subject, Issuer) key maps to a bucket holding a window's
// worth of recent enrollment timestamps. On Allow, the bucket prunes
// timestamps older than (now - window) and either appends the current
// timestamp + returns true, or rejects + returns false when the post-prune
// count is already at the cap. This is the "sliding window log" rate
// limiter — exact (no token-leak rounding); O(N_per_key) per-call but N is
// bounded by the cap (3 by default), so effectively O(1).
// New callers SHOULD use ratelimit.SlidingWindowLimiter directly. The
// EST RFC 7030 Phase 4.2 EST per-principal cap uses the shared package.
// ErrRateLimited is the typed error returned when the per-device rate limit
// fires. The handler maps this to a CertRep FAILURE with badRequest failInfo
// + the `rate_limited` metric label.
var ErrRateLimited = errors.New("intune: per-device rate limit exceeded for this (subject, issuer) within the configured window")
// ErrRateLimited is the typed error returned when the per-device rate
// limit fires. Aliased to ratelimit.ErrRateLimited so errors.Is matches
// against either name (the SCEP audit closure already pinned the
// "rate_limited" metric label against this sentinel; the alias preserves
// sentinel identity across the package boundary).
var ErrRateLimited = ratelimit.ErrRateLimited
// PerDeviceRateLimiter is a sliding-window-log rate limiter keyed by
// (Subject, Issuer) tuples derived from a parsed challenge claim.
//
// Concurrency: the limiter is safe for concurrent Allow calls. The internal
// map is guarded by a mutex; the per-key slices are mutated only while the
// mutex is held.
// PerDeviceRateLimiter wraps ratelimit.SlidingWindowLimiter with the
// (subject, issuer)-composed-key Allow signature the Intune dispatcher
// uses. Concurrency-safe (the underlying limiter holds the mutex).
type PerDeviceRateLimiter struct {
mu sync.Mutex
buckets map[string][]time.Time // key → sliding window of timestamps
maxN int // max enrollments per window
window time.Duration // window length (default 24h)
cap int // max keys before LRU eviction kicks in
disabled bool // maxN == 0 → all Allow calls return nil
inner *ratelimit.SlidingWindowLimiter
}
// NewPerDeviceRateLimiter returns a limiter with the given per-key cap +
// window. maxN ≤ 0 disables the limiter (all Allow calls return nil); this
// is operator opt-out for the rare case where the per-device cap is
// window. maxN ≤ 0 disables the limiter (all Allow calls return nil);
// this is operator opt-out for the rare case where the per-device cap is
// undesirable (e.g. test harnesses, sketchpad deploys).
//
// Window defaults to 24h when zero. Map cap defaults to 100,000 when zero
// (matches the replay cache cap; see internal/scep/intune/replay.go).
func NewPerDeviceRateLimiter(maxN int, window time.Duration, mapCap int) *PerDeviceRateLimiter {
if window <= 0 {
window = 24 * time.Hour
}
if mapCap <= 0 {
mapCap = 100_000
}
return &PerDeviceRateLimiter{
buckets: make(map[string][]time.Time),
maxN: maxN,
window: window,
cap: mapCap,
disabled: maxN <= 0,
}
return &PerDeviceRateLimiter{inner: ratelimit.NewSlidingWindowLimiter(maxN, window, mapCap)}
}
// Allow checks whether an enrollment for the given (subject, issuer) tuple
// is permitted right now. Returns nil when allowed (and records the timestamp
// in the bucket) or ErrRateLimited when the bucket is at maxN.
// Allow checks whether an enrollment for the given (subject, issuer)
// tuple is permitted right now. Returns nil when allowed (and records
// the timestamp in the bucket) or ErrRateLimited when the bucket is at
// maxN.
//
// Empty subject is treated as "skip the limiter" — the caller's claim
// validation should have rejected an empty-subject claim already; this is
// belt-and-suspenders to prevent a single empty-subject bucket from
// becoming a fleet-wide chokepoint. The Connector emits non-empty subject
// (device GUID) on every legitimate challenge.
// validation should have rejected an empty-subject claim already; this
// is belt-and-suspenders to prevent a single empty-subject bucket from
// becoming a fleet-wide chokepoint.
func (l *PerDeviceRateLimiter) Allow(subject, issuer string, now time.Time) error {
if l.disabled {
return nil
}
if subject == "" {
// Caller's claim validation should reject empty-subject upstream;
// this short-circuit is defense-in-depth so a misconfigured
// Connector can't DoS us via the rate-limit path.
// Empty-subject early return preserved from the pre-Phase-4.1
// behavior: ratelimit.SlidingWindowLimiter also short-circuits
// on empty key, but the explicit check here documents the
// (subject, issuer) → empty-key contract and saves one call
// frame in the hot path.
return nil
}
key := subject + "|" + issuer
l.mu.Lock()
defer l.mu.Unlock()
// At-cap eviction: when the map is full, drop the oldest entry by
// finding the bucket whose newest timestamp is the smallest. O(N) but
// rarely fires; the prune-on-Allow path keeps most buckets short-lived.
if len(l.buckets) >= l.cap {
l.evictOldestLocked(now)
}
bucket := l.buckets[key]
bucket = pruneOlderThan(bucket, now.Add(-l.window))
if len(bucket) >= l.maxN {
// Don't append; over the limit. Persist the pruned bucket so the
// next call sees the most-recently-pruned state.
l.buckets[key] = bucket
return ErrRateLimited
}
bucket = append(bucket, now)
l.buckets[key] = bucket
return nil
}
// pruneOlderThan returns the slice with all entries strictly before
// `cutoff` removed. Preserves order (timestamps are appended in increasing
// time, so a single linear scan from the front suffices).
func pruneOlderThan(b []time.Time, cutoff time.Time) []time.Time {
i := 0
for i < len(b) && b[i].Before(cutoff) {
i++
}
if i == 0 {
return b
}
// Copy-shrink to release the underlying-array memory eventually
// (otherwise the slice would hold a reference to the older entries
// indefinitely until a re-allocation).
out := make([]time.Time, len(b)-i)
copy(out, b[i:])
return out
}
// evictOldestLocked drops the map entry whose newest timestamp is the
// oldest. Called under l.mu. O(N_keys) per eviction; at-cap is rare in
// practice (caps are sized for fleet steady-state).
func (l *PerDeviceRateLimiter) evictOldestLocked(now time.Time) {
var (
oldestKey string
oldestTs time.Time
first = true
)
for k, b := range l.buckets {
if len(b) == 0 {
// Empty bucket — drop it immediately, no candidate scan needed.
delete(l.buckets, k)
return
}
newest := b[len(b)-1]
if first || newest.Before(oldestTs) {
oldestKey = k
oldestTs = newest
first = false
}
}
if oldestKey != "" {
delete(l.buckets, oldestKey)
}
// Suppress unused-parameter warning for `now` in case the eviction
// strategy changes (e.g. swap to LRU keyed by time of last Allow).
_ = now
return l.inner.Allow(key, now)
}
// Len returns the approximate number of distinct (subject, issuer) keys
// currently tracked. For observability + tests; not load-stable under
// concurrent Allow calls.
func (l *PerDeviceRateLimiter) Len() int {
l.mu.Lock()
defer l.mu.Unlock()
return len(l.buckets)
}
// currently tracked. For observability + tests.
func (l *PerDeviceRateLimiter) Len() int { return l.inner.Len() }
// Disabled reports whether the limiter is in opt-out mode (maxN ≤ 0).
// Useful for handler-side gating + admin-endpoint observability.
func (l *PerDeviceRateLimiter) Disabled() bool {
return l.disabled
}
func (l *PerDeviceRateLimiter) Disabled() bool { return l.inner.Disabled() }
+10 -36
View File
@@ -103,15 +103,11 @@ func TestPerDeviceRateLimiter_EmptySubjectShortCircuits(t *testing.T) {
}
}
func TestPerDeviceRateLimiter_DefaultCapsHonored(t *testing.T) {
l := NewPerDeviceRateLimiter(5, 0, 0) // window=0 → 24h default; cap=0 → 100k default
if l.window != 24*time.Hour {
t.Errorf("default window = %v, want 24h", l.window)
}
if l.cap != 100_000 {
t.Errorf("default cap = %d, want 100000", l.cap)
}
}
// TestPerDeviceRateLimiter_DefaultCapsHonored — moved to
// internal/ratelimit/sliding_window_test.go::TestSlidingWindowLimiter_DefaultCapsHonored
// in EST RFC 7030 hardening Phase 4.1 (the white-box test reads private
// fields that no longer exist on the wrapper). The shared package owns
// the field-default contract.
func TestPerDeviceRateLimiter_MapCapEvictsOldest(t *testing.T) {
// Cap of 3 keys to exercise the eviction branch deterministically.
@@ -161,30 +157,8 @@ func TestPerDeviceRateLimiter_ConcurrentRaceFree(t *testing.T) {
}
}
func TestPruneOlderThan(t *testing.T) {
t0 := time.Now()
in := []time.Time{
t0.Add(-3 * time.Hour), // pruned (older than cutoff)
t0.Add(-2 * time.Hour), // pruned (older than cutoff)
t0.Add(-1 * time.Hour), // survives (-60m is NEWER than the -90m cutoff)
t0.Add(-30 * time.Minute), // survives
t0, // survives
}
out := pruneOlderThan(in, t0.Add(-90*time.Minute))
if len(out) != 3 {
t.Fatalf("len(out) = %d, want 3 (-1h, -30m, t0 all newer than -90m cutoff)", len(out))
}
if !out[0].Equal(t0.Add(-1 * time.Hour)) {
t.Errorf("out[0] = %v, want -1h (oldest surviving entry)", out[0])
}
}
func TestPruneOlderThan_NoOpWhenNothingToPrune(t *testing.T) {
t0 := time.Now()
in := []time.Time{t0.Add(-1 * time.Minute), t0}
out := pruneOlderThan(in, t0.Add(-1*time.Hour))
// Same slice header (no copy needed).
if len(out) != len(in) {
t.Fatalf("len(out) = %d, want %d", len(out), len(in))
}
}
// TestPruneOlderThan + TestPruneOlderThan_NoOpWhenNothingToPrune — moved
// to internal/ratelimit/sliding_window_test.go in EST RFC 7030 hardening
// Phase 4.1. pruneOlderThan is now an unexported helper of the shared
// ratelimit package (the implementation moved there); the white-box
// tests follow.
+35 -63
View File
@@ -1,73 +1,45 @@
package intune
// SCEP RFC 8894 + Intune master bundle Phase 7.2 (originally) +
// EST RFC 7030 hardening master bundle Phase 2.1 (extraction).
//
// LoadTrustAnchor + parseTrustAnchorPEM were extracted to
// internal/trustanchor.LoadBundle + parseBundlePEM so the EST mTLS
// sibling route (Phase 2 of the EST hardening bundle), the Intune
// dispatcher, and any future per-profile-trust-bundle caller can share
// the same PEM-bundle loader + SIGHUP-reload semantics. The shim below
// preserves the original public surface so existing intune callers
// (cmd/server/main.go, scep_intune_e2e_test.go, scep_profile_counter_
// isolation_test.go, scep_intune.go service) compile unchanged.
//
// New callers SHOULD import internal/trustanchor directly — the
// trustanchor.Holder + trustanchor.LoadBundle are the modern API.
//
// Note: the legacy intune error messages ("intune: trust anchor cert
// in %q expired ...") are NOT preserved verbatim across the extraction;
// the shared trustanchor package emits "trustanchor: ..." messages
// instead. The operator-facing log line at cmd/server/main.go's
// preflightSCEPIntuneTrustAnchor wraps the error in its own outer
// ("SCEP profile (PathID=...) INTUNE trust anchor load failed: ...")
// so the prefix change is invisible to log-grep runbooks that filter
// on the outer message.
import (
"crypto/x509"
"encoding/pem"
"fmt"
"os"
"time"
"github.com/shankar0123/certctl/internal/trustanchor"
)
// LoadTrustAnchor reads a PEM bundle of one or more Intune Connector
// signing certificates from the configured path. Returns the slice of
// parsed certs that the validator will accept as challenge issuers.
// signing certificates from the configured path. Delegates to the
// shared trustanchor.LoadBundle (extracted in EST RFC 7030 hardening
// Phase 2.1) so the EST mTLS sibling route + the Intune dispatcher
// + any future per-profile trust-bundle caller share the same
// loader semantics (path-empty refusal, expired-cert refusal,
// non-CERTIFICATE-block tolerance).
//
// SCEP RFC 8894 + Intune master bundle Phase 7.2.
//
// Behavior:
//
// - File must exist + be readable.
// - PEM-decodes the file; non-CERTIFICATE blocks are skipped (so an
// operator can paste a chain that includes a private key by mistake
// without breaking the load — the priv key is just ignored).
// - Returns an error if zero CERTIFICATE blocks parse.
// - Returns an error if any cert is past NotAfter (a stale trust
// anchor would silently reject every Intune challenge at runtime;
// fail loud at startup instead).
//
// Operators rotate Connector signing certs periodically; the trust
// anchor file is reloaded on SIGHUP (handled by the existing config
// watch loop in cmd/server/main.go — see cmd/server/tls.go::watchSIGHUP
// for the precedent).
// Preserved here as a wrapper so existing intune callers compile
// unchanged. New callers SHOULD use trustanchor.LoadBundle directly.
func LoadTrustAnchor(path string) ([]*x509.Certificate, error) {
if path == "" {
return nil, fmt.Errorf("intune: trust anchor path is empty")
}
body, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("intune: read trust anchor %q: %w", path, err)
}
return parseTrustAnchorPEM(body, path, time.Now())
}
// parseTrustAnchorPEM is the file-IO-free core of LoadTrustAnchor. Split
// out so unit tests can hand it byte slices without writing temp files.
// `now` is taken as a parameter so expiry tests can pin a deterministic
// clock.
func parseTrustAnchorPEM(body []byte, sourceLabel string, now time.Time) ([]*x509.Certificate, error) {
var out []*x509.Certificate
rest := body
for {
var block *pem.Block
block, rest = pem.Decode(rest)
if block == nil {
break
}
if block.Type != "CERTIFICATE" {
continue
}
cert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
return nil, fmt.Errorf("intune: parse trust anchor cert in %q: %w", sourceLabel, err)
}
if now.After(cert.NotAfter) {
return nil, fmt.Errorf("intune: trust anchor cert in %q expired at %s (subject=%q) — operator must rotate the Connector signing cert before restart",
sourceLabel, cert.NotAfter.Format(time.RFC3339), cert.Subject.CommonName)
}
out = append(out, cert)
}
if len(out) == 0 {
return nil, fmt.Errorf("intune: trust anchor %q contains no CERTIFICATE PEM blocks", sourceLabel)
}
return out, nil
return trustanchor.LoadBundle(path)
}
+45 -130
View File
@@ -1,143 +1,58 @@
package intune
// SCEP RFC 8894 + Intune master bundle Phase 8.5 (originally) +
// EST RFC 7030 hardening master bundle Phase 2.1 (extraction).
//
// TrustAnchorHolder + NewTrustAnchorHolder were extracted to
// internal/trustanchor.Holder + trustanchor.New so the EST mTLS sibling
// route (Phase 2 of the EST hardening bundle) and the Intune dispatcher
// can share the same SIGHUP-reloadable PEM bundle primitive. A single
// SIGHUP now rotates: server TLS cert (cmd/server/tls.go), every Intune
// trust anchor (this package's existing wiring), AND every EST mTLS
// per-profile client-CA bundle (the new sibling route) — exactly the
// design contract documented in the trustanchor package doc.
//
// The aliases below preserve every existing intune call site unchanged:
// - cmd/server/main.go declares `intuneTrustHolders []*intune.TrustAnchorHolder`
// + invokes `intune.NewTrustAnchorHolder(path, logger)`
// - internal/service/scep.go's SCEPService struct field
// `intuneTrust *intune.TrustAnchorHolder` (the type alias keeps this
// pointer-compatible with the original)
// - internal/scep/intune/trust_anchor_holder_test.go + the e2e tests
// that construct a holder via NewTrustAnchorHolder
//
// New callers SHOULD import internal/trustanchor directly — the
// trustanchor.Holder + trustanchor.New are the modern API. The intune
// aliases are preserved indefinitely for back-compat (no deprecation
// timeline; the cost of the two-line shim is trivial).
import (
"crypto/x509"
"errors"
"log/slog"
"os"
"os/signal"
"sync"
"syscall"
"github.com/shankar0123/certctl/internal/trustanchor"
)
// TrustAnchorHolder is the SIGHUP-reloadable wrapper around a per-profile
// Intune Connector trust anchor pool.
//
// SCEP RFC 8894 + Intune master bundle Phase 8.5.
//
// Mirrors the shape established by `cmd/server/tls.go::certHolder` for the
// server TLS cert: an RWMutex-guarded pool, a Get accessor that's safe for
// concurrent callers from the request path, a Reload that re-reads the file
// and atomically swaps the slice on success (failure leaves the OLD pool in
// place so a bad reload doesn't take Intune enrollment down), and a
// watchSIGHUP goroutine that responds to the same SIGHUP the operator uses
// to rotate the server TLS cert.
//
// Why SIGHUP specifically (vs fsnotify or a polling loop): SIGHUP is the
// repo-established convention (see cmd/server/tls.go). fsnotify would add a
// new direct dep + complicate the cleanup story. The operator's Connector-
// rotation script writes the new PEM bundle then sends SIGHUP — the same
// signal that already rotates the server TLS cert — and both swap atomically.
//
// Concurrency contract:
// - Get returns the pool slice header by value; the slice itself is
// immutable per-snapshot (Reload swaps a fresh slice rather than
// mutating the existing one). Callers may iterate the returned slice
// without holding any lock.
// - Reload acquires a write lock briefly for the swap. Concurrent Get
// calls block only for that swap window (microseconds).
// - watchSIGHUP runs at most one Reload at a time per holder.
type TrustAnchorHolder struct {
mu sync.RWMutex
certs []*x509.Certificate
path string
logger *slog.Logger
}
// Aliased to trustanchor.Holder (extracted in EST RFC 7030 hardening
// Phase 2.1) so the EST mTLS sibling route + the Intune dispatcher share
// the same primitive. Existing callers compile unchanged because Go type
// aliases are pointer-compatible.
type TrustAnchorHolder = trustanchor.Holder
// NewTrustAnchorHolder loads the trust bundle and returns a holder. Returns
// the same fail-loud error LoadTrustAnchor does on initial load — the
// startup gate at cmd/server/main.go is supposed to refuse boot when this
// fails. Subsequent Reload errors are non-fatal (logged + old pool retained).
// NewTrustAnchorHolder loads the trust bundle and returns a holder.
// Aliased to trustanchor.New (extracted in EST RFC 7030 hardening
// Phase 2.1). Returns the same fail-loud error LoadTrustAnchor does on
// initial load — the startup gate at cmd/server/main.go is supposed to
// refuse boot when this fails. Subsequent Reload errors are non-fatal
// (logged + old pool retained).
//
// The logger is required (never nil); the caller passes a per-profile
// scoped logger so SIGHUP-reload events show the PathID for triage.
func NewTrustAnchorHolder(path string, logger *slog.Logger) (*TrustAnchorHolder, error) {
if logger == nil {
return nil, errors.New("intune: TrustAnchorHolder requires a non-nil logger")
}
certs, err := LoadTrustAnchor(path)
if err != nil {
return nil, err
}
return &TrustAnchorHolder{
certs: certs,
path: path,
logger: logger,
}, nil
}
// Get returns the current trust anchor pool. Safe for concurrent callers;
// the slice header is returned by value and the underlying slice is
// immutable per-snapshot (Reload swaps a fresh slice, doesn't mutate in
// place — see Reload).
func (h *TrustAnchorHolder) Get() []*x509.Certificate {
h.mu.RLock()
defer h.mu.RUnlock()
return h.certs
}
// Path returns the on-disk path the holder reloads from. Useful for
// observability (admin endpoints, log lines) without exposing the cert
// pool itself.
func (h *TrustAnchorHolder) Path() string {
return h.path
}
// Reload re-reads the trust anchor file at h.path and atomically swaps the
// pool. Returns the parse error if the new file is invalid; the OLD pool
// stays in place so a bad reload doesn't take Intune enrollment down.
//
// Same fail-safe pattern as cmd/server/tls.go::(*certHolder).Reload — a
// rotation that writes a half-file (operator overwrites the bundle while
// only some of the new certs are in it) would otherwise crash the
// service mid-rotation. Logging + retaining the old pool gives the
// operator a bounded window to fix and re-SIGHUP.
func (h *TrustAnchorHolder) Reload() error {
certs, err := LoadTrustAnchor(h.path)
if err != nil {
return err
}
h.mu.Lock()
h.certs = certs
h.mu.Unlock()
return nil
}
// WatchSIGHUP installs a signal handler that calls Reload on each SIGHUP.
// The returned stop function closes the internal done channel and stops
// signal delivery so the goroutine can exit cleanly during shutdown.
//
// Errors from Reload are logged but do not terminate the watcher — the
// operator can fix the files and send another SIGHUP. Mirrors the
// (*certHolder).watchSIGHUP contract exactly.
//
// Multiple holders can coexist: each registers its own goroutine on the
// same SIGHUP signal. signal.Notify multicasts to every registered
// channel, so a single SIGHUP reloads every per-profile Intune trust
// anchor PLUS the server TLS cert in one operator action — exactly the
// design requirement (one SIGHUP rotates everything).
func (h *TrustAnchorHolder) WatchSIGHUP() (stop func()) {
ch := make(chan os.Signal, 1)
signal.Notify(ch, syscall.SIGHUP)
done := make(chan struct{})
go func() {
for {
select {
case <-ch:
if err := h.Reload(); err != nil {
h.logger.Error("Intune trust anchor reload failed; continuing with previous pool",
"error", err,
"path", h.path)
continue
}
h.logger.Info("Intune trust anchor reloaded via SIGHUP",
"path", h.path,
"certs_loaded", len(h.Get()))
case <-done:
signal.Stop(ch)
return
}
}
}()
return func() { close(done) }
}
// Note: the original intune.NewTrustAnchorHolder set the holder's
// internal log label to "Intune trust anchor"; the extracted
// trustanchor.New defaults to "trust anchor". Existing intune callers
// that need the original label should call .SetLabelForLog("intune
// trust anchor (PathID=…)") on the returned holder. cmd/server/main.go
// does this in the per-profile Intune startup loop.
var NewTrustAnchorHolder = trustanchor.New
+13 -92
View File
@@ -16,6 +16,13 @@ import (
"time"
)
// EST RFC 7030 hardening master bundle Phase 2.1: the white-box parser
// tests (TestParseTrustAnchorPEM_*) moved to internal/trustanchor/holder_test.go
// where parseBundlePEM now lives. The intune package retains a thin
// public-surface test of LoadTrustAnchor — the back-compat shim that
// existing intune callers use — so a future refactor that breaks the
// shim's wire-up to trustanchor.LoadBundle is caught here.
// pemEncodeCert is a small DRY helper for the PEM bundle fixtures.
func pemEncodeCert(t *testing.T, der []byte) []byte {
t.Helper()
@@ -24,7 +31,9 @@ func pemEncodeCert(t *testing.T, der []byte) []byte {
// freshConnectorCertDER returns a freshly-minted EC P-256 cert as raw DER
// + the matching key. Lifetime is parameterised so the same factory drives
// both the happy-path and expired-cert cases.
// both happy-path and expired-cert cases. Kept in this file (not deleted with
// the white-box tests) because trust_anchor_holder_test.go's freshHolderCert
// returns *x509.Certificate while LoadTrustAnchor tests need raw DER + key.
func freshConnectorCertDER(t *testing.T, notAfter time.Time) ([]byte, *ecdsa.PrivateKey) {
t.Helper()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
@@ -44,96 +53,6 @@ func freshConnectorCertDER(t *testing.T, notAfter time.Time) ([]byte, *ecdsa.Pri
return der, key
}
func TestParseTrustAnchorPEM_HappyPath_SingleCert(t *testing.T) {
der, _ := freshConnectorCertDER(t, time.Now().Add(365*24*time.Hour))
body := pemEncodeCert(t, der)
certs, err := parseTrustAnchorPEM(body, "test", time.Now())
if err != nil {
t.Fatalf("parseTrustAnchorPEM: %v", err)
}
if len(certs) != 1 {
t.Fatalf("len(certs) = %d, want 1", len(certs))
}
if certs[0].Subject.CommonName != "intune-connector-test" {
t.Errorf("Subject.CommonName = %q", certs[0].Subject.CommonName)
}
}
func TestParseTrustAnchorPEM_HappyPath_MultiCert(t *testing.T) {
d1, _ := freshConnectorCertDER(t, time.Now().Add(30*24*time.Hour))
d2, _ := freshConnectorCertDER(t, time.Now().Add(60*24*time.Hour))
body := append(pemEncodeCert(t, d1), pemEncodeCert(t, d2)...)
certs, err := parseTrustAnchorPEM(body, "test", time.Now())
if err != nil {
t.Fatalf("parseTrustAnchorPEM: %v", err)
}
if len(certs) != 2 {
t.Fatalf("len(certs) = %d, want 2", len(certs))
}
}
func TestParseTrustAnchorPEM_SkipsNonCertBlocks(t *testing.T) {
der, key := freshConnectorCertDER(t, time.Now().Add(30*24*time.Hour))
keyDER, err := x509.MarshalECPrivateKey(key)
if err != nil {
t.Fatalf("MarshalECPrivateKey: %v", err)
}
keyPEM := pem.EncodeToMemory(&pem.Block{Type: "EC PRIVATE KEY", Bytes: keyDER})
body := append(keyPEM, pemEncodeCert(t, der)...) // priv key first, cert second
certs, err := parseTrustAnchorPEM(body, "test", time.Now())
if err != nil {
t.Fatalf("parseTrustAnchorPEM should ignore non-CERTIFICATE blocks: %v", err)
}
if len(certs) != 1 {
t.Fatalf("len(certs) = %d, want 1 (priv key block must be skipped)", len(certs))
}
}
func TestParseTrustAnchorPEM_EmptyBundleRejected(t *testing.T) {
_, err := parseTrustAnchorPEM([]byte("nothing here"), "test", time.Now())
if err == nil || !strings.Contains(err.Error(), "no CERTIFICATE PEM blocks") {
t.Fatalf("expected 'no CERTIFICATE PEM blocks' error, got %v", err)
}
}
func TestParseTrustAnchorPEM_OnlyKeyBlocksRejected(t *testing.T) {
key, _ := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
keyDER, _ := x509.MarshalECPrivateKey(key)
body := pem.EncodeToMemory(&pem.Block{Type: "EC PRIVATE KEY", Bytes: keyDER})
_, err := parseTrustAnchorPEM(body, "test", time.Now())
if err == nil {
t.Fatalf("expected error for bundle with no certs, got nil")
}
}
func TestParseTrustAnchorPEM_ExpiredCertRejected(t *testing.T) {
der, _ := freshConnectorCertDER(t, time.Now().Add(-1*time.Hour)) // already expired
body := pemEncodeCert(t, der)
_, err := parseTrustAnchorPEM(body, "expired-bundle", time.Now())
if err == nil || !strings.Contains(err.Error(), "expired") {
t.Fatalf("expected expiry error, got %v", err)
}
// Operator-actionable message must include the subject so the audit
// log says exactly which cert to rotate.
if !strings.Contains(err.Error(), "intune-connector-test") {
t.Errorf("error must include subject CN for operator action: %v", err)
}
}
func TestParseTrustAnchorPEM_MalformedCertRejected(t *testing.T) {
bad := pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: []byte("not-a-real-asn1-cert")})
_, err := parseTrustAnchorPEM(bad, "test", time.Now())
if err == nil {
t.Fatalf("expected x509 parse error, got nil")
}
}
func TestLoadTrustAnchor_FromDisk(t *testing.T) {
der, _ := freshConnectorCertDER(t, time.Now().Add(30*24*time.Hour))
body := pemEncodeCert(t, der)
@@ -150,6 +69,9 @@ func TestLoadTrustAnchor_FromDisk(t *testing.T) {
if len(certs) != 1 {
t.Fatalf("len(certs) = %d, want 1", len(certs))
}
if certs[0].Subject.CommonName != "intune-connector-test" {
t.Errorf("Subject.CommonName = %q", certs[0].Subject.CommonName)
}
}
func TestLoadTrustAnchor_EmptyPath(t *testing.T) {
@@ -164,7 +86,6 @@ func TestLoadTrustAnchor_MissingFile(t *testing.T) {
if err == nil {
t.Fatalf("expected file-not-found error, got nil")
}
// Don't string-assert on the OS error — just make sure it's surfaced.
if errors.Is(err, nil) {
t.Fatalf("error must be non-nil")
}