Files
certctl/internal/scep/intune/trust_anchor_holder_test.go
T
shankar0123 7612da783a feat(scep-intune): per-profile dispatcher + SIGHUP reload + per-device rate limit + compliance hook seam
Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the
internal/scep/intune validator from Phase 7 into the SCEPService
dispatch path, with a SIGHUP-reloadable trust anchor holder, a
per-(Subject, Issuer) sliding-window rate limiter, and a nil-default
ComplianceCheck seam for V3-Pro.

Operator-visible surface (per-profile, all default to off):

  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_AUDIENCE=https://certctl.example.com/scep/corp
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CHALLENGE_VALIDITY=60m
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H=3

Per-profile dispatch (Phase 8.8): an operator running corp-laptops
through Intune AND IoT devices through static challenge configures
INTUNE_ENABLED=true on the corp profile only — the IoT profile's
PKCSReq path skips the dispatcher entirely. Mirrors the per-profile
shape established by Phase 1.5.

Wire-in surfaces:

  * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of
    type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/
    ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed
    CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_* env-var family. Per-profile
    Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath
    OR negative PerDeviceRateLimit24h.

  * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor
    helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle
    shape — fail-loud at boot when the trust anchor file is missing /
    unreadable / empty / contains an expired cert. The per-profile loop
    builds the holder + replay cache + rate limiter, calls
    SetIntuneIntegration on the SCEPService, and starts the SIGHUP
    watcher. A deferred sweep stops every watcher at shutdown.

  * internal/scep/intune/trust_anchor_holder.go (Phase 8.5):
    TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex-
    guarded pool + Reload that swaps a fresh slice on success +
    WatchSIGHUP goroutine that responds to the same SIGHUP the existing
    TLS-cert watcher uses. A bad reload (parse error, expired cert)
    keeps the OLD pool in place so a half-rotation doesn't take Intune
    enrollment down — same fail-safe pattern. Operators rotate via the
    on-disk file then 'kill -HUP <certctl-pid>'.

  * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled
    sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry
    map cap (matches replay cache); at-cap drops the bucket whose
    newest timestamp is the oldest. Default 3 enrollments per 24h
    covers legitimate first-cert + recovery + post-wipe re-enrollment
    but blocks bulk enumeration from a compromised Connector signing
    key. maxN <= 0 disables the limiter for tests + the rare operator
    who wants no per-device cap. Empty subject short-circuits to allow
    (defense-in-depth: caller's claim validation rejects empty-subject
    upstream; no shared bucket on '').

    Why hand-rolled instead of golang.org/x/time/rate: the rate
    package is in go.sum as an indirect transitive but not a direct
    dep. ~30 LoC of stdlib avoids creating a new direct dep.

  * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7):
    - SCEPService gains intuneEnabled / intuneTrust / intuneAudience /
      intuneValidity / intuneReplayCache / intuneRateLimiter /
      complianceCheck fields.
    - SetIntuneIntegration() constructor-time injection wires the
      per-profile state. Profiles with INTUNE_ENABLED=false never
      call this method, so they pay zero overhead.
    - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7).
    - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly
      two dots). Allowed to false-positive (validator catches malformed
      → ErrChallengeMalformed); MUST NOT false-negative on real Intune
      challenges.
    - dispatchIntuneChallenge(): the load-bearing core. Runs
      ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay
      cache CheckAndInsert → per-device Allow → optional ComplianceCheck.
      Each failure leg increments a typed metric label and emits an
      audit-friendly Warn log line.
    - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call
      dispatchIntuneChallenge first; on outcome.decided=true they
      either short-circuit (with a typed-error → SCEPFailInfo mapping)
      or call processEnrollment with action='scep_pkcsreq_intune'
      (so audit greps can count Intune-vs-static enrollments).
    - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per
      RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck;
      claim-mismatch → BadRequest; default → BadRequest).
    - intuneFailReason(): typed-error → metric label
      ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default
      'malformed' so a previously-unseen error category still surfaces
      in the metric for follow-up.
    - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs
      in via SetComplianceCheck to call Microsoft Graph's compliance
      API. Returns (compliant, reason, err). nil-err + compliant=false
      → CertRep FAILURE + 'compliance' reason in audit. err != nil →
      fail-safe deny (V3-Pro module is responsible for any 'permit on
      API failure' policy).

  * internal/service/scep.go also gains parseCSRForIntune() — small
    private wrapper around encoding/pem + x509 used by the dispatcher
    for the claim ↔ CSR binding check (separated from the broader
    processEnrollment because we want to bind BEFORE consuming the
    replay-cache slot).

Tests (gates: ≥85% coverage on intune package, ≥70% on service):

  * scep_intune_test.go (in internal/service): 14 dispatcher tests
    covering happy-path Intune enrollment + static-challenge fallback
    + tampered-challenge reject + claim-mismatch reject + replay
    detected + rate-limited + compliance-hook nil-default + compliance-
    hook denies non-compliant + compliance-hook error fails closed +
    IntuneEnabled accessor + 'no IntuneEnabled = static path
    unchanged' regression pin + intuneFailReason mapping for every
    typed error + looksIntuneShaped boundary cases.

  * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle,
    NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath,
    ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe
    semantics that make the SIGHUP path operator-friendly),
    WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap
    pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean
    (does NOT fire SIGHUP after stop — same caveat as the TLS test:
    the Go runtime would otherwise terminate the test runner on the
    next SIGHUP since signal.Stop has removed the handler).

  * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap,
    DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0),
    NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth
    against an empty-subject DoS chokepoint), DefaultCapsHonored,
    MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree
    (50 goroutines × 200 inserts), pruneOlderThan + the no-op case.

Verification:

  * gofmt -l on all touched files: clean
  * go vet ./... : clean
  * staticcheck on intune/service/config/cmd-server: clean
  * go test -count=1 -cover ./internal/scep/intune/...: 94.8%
    (target ≥85%)
  * go test -short across intune+service+config+handler+cmd-server:
    all green
  * G-3 docs-drift CI guard reproduced locally: docs-only filtered=
    empty, config-only=empty. The new env vars match the existing
    CERTCTL_SCEP_ allowlist prefix.

Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8
      cowork/scep-rfc8894-intune/progress.md
      Constitutional rule: 'Always take the complete path, not the
      easy path' (cowork/CLAUDE.md::Operating Rules) — operator can
      flip CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true and observe
      the dispatcher pick up Intune-shaped challenges end-to-end with
      no further code changes. Foundation + plumbing ship together.
2026-04-29 15:34:19 +00:00

235 lines
7.4 KiB
Go

package intune
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"io"
"log/slog"
"math/big"
"os"
"path/filepath"
"strings"
"syscall"
"testing"
"time"
)
// silentLogger returns a logger that drops everything; the SIGHUP watcher
// path emits Info logs we don't want fouling test output.
func silentTestLogger() *slog.Logger {
return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelError + 10}))
}
// writeTestBundle writes a PEM bundle of the given certs at path with mode 0600.
func writeTestBundle(t *testing.T, path string, certs []*x509.Certificate) {
t.Helper()
body := []byte{}
for _, c := range certs {
body = append(body, pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: c.Raw})...)
}
if err := os.WriteFile(path, body, 0o600); err != nil {
t.Fatalf("WriteFile: %v", err)
}
}
// freshHolderCert is a small factory for a self-signed EC cert with a
// caller-controlled CN + lifetime. Used by Reload tests that swap the
// on-disk pool between calls.
func freshHolderCert(t *testing.T, cn string, notAfter time.Time) *x509.Certificate {
t.Helper()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatalf("ecdsa.GenerateKey: %v", err)
}
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(time.Now().UnixNano()),
Subject: pkix.Name{CommonName: cn},
NotBefore: time.Now().Add(-1 * time.Hour),
NotAfter: notAfter,
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key)
if err != nil {
t.Fatalf("x509.CreateCertificate: %v", err)
}
cert, err := x509.ParseCertificate(der)
if err != nil {
t.Fatalf("x509.ParseCertificate: %v", err)
}
return cert
}
func TestTrustAnchorHolder_NewLoadsBundle(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "intune-trust.pem")
cert := freshHolderCert(t, "initial-conn", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{cert})
holder, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatalf("NewTrustAnchorHolder: %v", err)
}
got := holder.Get()
if len(got) != 1 || got[0].Subject.CommonName != "initial-conn" {
t.Fatalf("Get returned %#v, want one cert with CN=initial-conn", got)
}
if holder.Path() != path {
t.Errorf("Path = %q, want %q", holder.Path(), path)
}
}
func TestTrustAnchorHolder_NewRequiresLogger(t *testing.T) {
if _, err := NewTrustAnchorHolder("/nonexistent", nil); err == nil {
t.Fatal("nil logger must error")
}
}
func TestTrustAnchorHolder_NewSurfacesLoadError(t *testing.T) {
if _, err := NewTrustAnchorHolder("/path/that/does/not/exist.pem", silentTestLogger()); err == nil {
t.Fatal("missing file must error")
}
}
func TestTrustAnchorHolder_ReloadHappyPath(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "trust.pem")
c1 := freshHolderCert(t, "rev-1", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{c1})
h, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatal(err)
}
// Rotate on disk and call Reload.
c2 := freshHolderCert(t, "rev-2", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{c2})
if err := h.Reload(); err != nil {
t.Fatalf("Reload: %v", err)
}
got := h.Get()
if len(got) != 1 || got[0].Subject.CommonName != "rev-2" {
t.Errorf("after Reload Get = %#v, want one cert CN=rev-2", got)
}
}
func TestTrustAnchorHolder_ReloadKeepsOldOnFailure(t *testing.T) {
// Mid-rotation half-file: operator overwrites the bundle with garbage
// → Reload errors → holder must still serve the OLD pool. Without this
// fail-safe a single typo would take Intune enrollment down for the
// whole window until a re-rotate.
dir := t.TempDir()
path := filepath.Join(dir, "trust.pem")
good := freshHolderCert(t, "stable", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{good})
h, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatal(err)
}
// Overwrite with content that LoadTrustAnchor will reject (no PEM blocks).
if err := os.WriteFile(path, []byte("garbage"), 0o600); err != nil {
t.Fatal(err)
}
if err := h.Reload(); err == nil {
t.Fatal("Reload from garbage file must error")
}
// Old pool still served.
got := h.Get()
if len(got) != 1 || got[0].Subject.CommonName != "stable" {
t.Errorf("after failed Reload Get should still be the pre-Reload pool; got %#v", got)
}
}
func TestTrustAnchorHolder_ReloadKeepsOldOnExpired(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "trust.pem")
good := freshHolderCert(t, "still-valid", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{good})
h, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatal(err)
}
// Operator rotates to a cert that's already expired (their script
// pulled an old bundle by mistake). Reload should error AND the holder
// should retain the previous good pool — exactly the fail-safe semantics
// LoadTrustAnchor enforces at startup.
expired := freshHolderCert(t, "expired-conn", time.Now().Add(-1*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{expired})
if err := h.Reload(); err == nil {
t.Fatal("Reload with expired cert must error")
}
if !strings.Contains(h.Get()[0].Subject.CommonName, "still-valid") {
t.Errorf("after expired-cert Reload, holder should retain old pool")
}
}
func TestTrustAnchorHolder_WatchSIGHUPReloadsPool(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "trust.pem")
c1 := freshHolderCert(t, "rev-pre-sighup", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{c1})
h, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatal(err)
}
stop := h.WatchSIGHUP()
defer stop()
// Rotate on disk, then send SIGHUP to our own process and poll for the swap.
c2 := freshHolderCert(t, "rev-post-sighup", time.Now().Add(30*24*time.Hour))
writeTestBundle(t, path, []*x509.Certificate{c2})
if err := syscall.Kill(syscall.Getpid(), syscall.SIGHUP); err != nil {
t.Fatalf("send SIGHUP: %v", err)
}
// Poll for up to 2 seconds.
deadline := time.Now().Add(2 * time.Second)
for {
got := h.Get()
if len(got) == 1 && got[0].Subject.CommonName == "rev-post-sighup" {
return
}
if time.Now().After(deadline) {
t.Fatalf("post-SIGHUP pool not swapped in 2s; current CN=%q", got[0].Subject.CommonName)
}
time.Sleep(20 * time.Millisecond)
}
}
func TestTrustAnchorHolder_WatchSIGHUPStopIsClean(t *testing.T) {
// Mirrors cmd/server/tls_test.go::TestCertHolder_WatchSIGHUP_StopExits:
// we do NOT fire a SIGHUP after stop(), because once signal.Stop has
// removed our handler the kernel's default action on SIGHUP is to
// terminate the process — it would kill the test runner. The contract
// we need to pin is "stop() is synchronous and safe", which we
// demonstrate by closing the watcher and verifying the holder still
// serves the original cert without panic.
dir := t.TempDir()
path := filepath.Join(dir, "trust.pem")
writeTestBundle(t, path, []*x509.Certificate{
freshHolderCert(t, "stop-test", time.Now().Add(30*24*time.Hour)),
})
h, err := NewTrustAnchorHolder(path, silentTestLogger())
if err != nil {
t.Fatal(err)
}
stop := h.WatchSIGHUP()
stop()
time.Sleep(50 * time.Millisecond) // let the goroutine fully exit
if cn := h.Get()[0].Subject.CommonName; cn != "stop-test" {
t.Errorf("after stop CN = %q, want unchanged stop-test", cn)
}
}