Files
certctl/internal/scep/intune/rate_limit_test.go
T
Shankar 2263e2886b feat(scep-intune): per-profile dispatcher + SIGHUP reload + per-device rate limit + compliance hook seam
Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the
internal/scep/intune validator from Phase 7 into the SCEPService
dispatch path, with a SIGHUP-reloadable trust anchor holder, a
per-(Subject, Issuer) sliding-window rate limiter, and a nil-default
ComplianceCheck seam for V3-Pro.

Operator-visible surface (per-profile, all default to off):

  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_AUDIENCE=https://certctl.example.com/scep/corp
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CHALLENGE_VALIDITY=60m
  CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H=3

Per-profile dispatch (Phase 8.8): an operator running corp-laptops
through Intune AND IoT devices through static challenge configures
INTUNE_ENABLED=true on the corp profile only — the IoT profile's
PKCSReq path skips the dispatcher entirely. Mirrors the per-profile
shape established by Phase 1.5.

Wire-in surfaces:

  * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of
    type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/
    ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed
    CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_* env-var family. Per-profile
    Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath
    OR negative PerDeviceRateLimit24h.

  * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor
    helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle
    shape — fail-loud at boot when the trust anchor file is missing /
    unreadable / empty / contains an expired cert. The per-profile loop
    builds the holder + replay cache + rate limiter, calls
    SetIntuneIntegration on the SCEPService, and starts the SIGHUP
    watcher. A deferred sweep stops every watcher at shutdown.

  * internal/scep/intune/trust_anchor_holder.go (Phase 8.5):
    TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex-
    guarded pool + Reload that swaps a fresh slice on success +
    WatchSIGHUP goroutine that responds to the same SIGHUP the existing
    TLS-cert watcher uses. A bad reload (parse error, expired cert)
    keeps the OLD pool in place so a half-rotation doesn't take Intune
    enrollment down — same fail-safe pattern. Operators rotate via the
    on-disk file then 'kill -HUP <certctl-pid>'.

  * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled
    sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry
    map cap (matches replay cache); at-cap drops the bucket whose
    newest timestamp is the oldest. Default 3 enrollments per 24h
    covers legitimate first-cert + recovery + post-wipe re-enrollment
    but blocks bulk enumeration from a compromised Connector signing
    key. maxN <= 0 disables the limiter for tests + the rare operator
    who wants no per-device cap. Empty subject short-circuits to allow
    (defense-in-depth: caller's claim validation rejects empty-subject
    upstream; no shared bucket on '').

    Why hand-rolled instead of golang.org/x/time/rate: the rate
    package is in go.sum as an indirect transitive but not a direct
    dep. ~30 LoC of stdlib avoids creating a new direct dep.

  * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7):
    - SCEPService gains intuneEnabled / intuneTrust / intuneAudience /
      intuneValidity / intuneReplayCache / intuneRateLimiter /
      complianceCheck fields.
    - SetIntuneIntegration() constructor-time injection wires the
      per-profile state. Profiles with INTUNE_ENABLED=false never
      call this method, so they pay zero overhead.
    - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7).
    - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly
      two dots). Allowed to false-positive (validator catches malformed
      → ErrChallengeMalformed); MUST NOT false-negative on real Intune
      challenges.
    - dispatchIntuneChallenge(): the load-bearing core. Runs
      ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay
      cache CheckAndInsert → per-device Allow → optional ComplianceCheck.
      Each failure leg increments a typed metric label and emits an
      audit-friendly Warn log line.
    - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call
      dispatchIntuneChallenge first; on outcome.decided=true they
      either short-circuit (with a typed-error → SCEPFailInfo mapping)
      or call processEnrollment with action='scep_pkcsreq_intune'
      (so audit greps can count Intune-vs-static enrollments).
    - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per
      RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck;
      claim-mismatch → BadRequest; default → BadRequest).
    - intuneFailReason(): typed-error → metric label
      ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default
      'malformed' so a previously-unseen error category still surfaces
      in the metric for follow-up.
    - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs
      in via SetComplianceCheck to call Microsoft Graph's compliance
      API. Returns (compliant, reason, err). nil-err + compliant=false
      → CertRep FAILURE + 'compliance' reason in audit. err != nil →
      fail-safe deny (V3-Pro module is responsible for any 'permit on
      API failure' policy).

  * internal/service/scep.go also gains parseCSRForIntune() — small
    private wrapper around encoding/pem + x509 used by the dispatcher
    for the claim ↔ CSR binding check (separated from the broader
    processEnrollment because we want to bind BEFORE consuming the
    replay-cache slot).

Tests (gates: ≥85% coverage on intune package, ≥70% on service):

  * scep_intune_test.go (in internal/service): 14 dispatcher tests
    covering happy-path Intune enrollment + static-challenge fallback
    + tampered-challenge reject + claim-mismatch reject + replay
    detected + rate-limited + compliance-hook nil-default + compliance-
    hook denies non-compliant + compliance-hook error fails closed +
    IntuneEnabled accessor + 'no IntuneEnabled = static path
    unchanged' regression pin + intuneFailReason mapping for every
    typed error + looksIntuneShaped boundary cases.

  * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle,
    NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath,
    ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe
    semantics that make the SIGHUP path operator-friendly),
    WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap
    pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean
    (does NOT fire SIGHUP after stop — same caveat as the TLS test:
    the Go runtime would otherwise terminate the test runner on the
    next SIGHUP since signal.Stop has removed the handler).

  * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap,
    DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0),
    NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth
    against an empty-subject DoS chokepoint), DefaultCapsHonored,
    MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree
    (50 goroutines × 200 inserts), pruneOlderThan + the no-op case.

Verification:

  * gofmt -l on all touched files: clean
  * go vet ./... : clean
  * staticcheck on intune/service/config/cmd-server: clean
  * go test -count=1 -cover ./internal/scep/intune/...: 94.8%
    (target ≥85%)
  * go test -short across intune+service+config+handler+cmd-server:
    all green
  * G-3 docs-drift CI guard reproduced locally: docs-only filtered=
    empty, config-only=empty. The new env vars match the existing
    CERTCTL_SCEP_ allowlist prefix.

Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8
      cowork/scep-rfc8894-intune/progress.md
      Constitutional rule: 'Always take the complete path, not the
      easy path' (cowork/CLAUDE.md::Operating Rules) — operator can
      flip CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true and observe
      the dispatcher pick up Intune-shaped challenges end-to-end with
      no further code changes. Foundation + plumbing ship together.
2026-04-29 15:34:19 +00:00

191 lines
5.9 KiB
Go

package intune
import (
"errors"
"fmt"
"sync"
"testing"
"time"
)
func TestPerDeviceRateLimiter_AllowsUpToCap(t *testing.T) {
l := NewPerDeviceRateLimiter(3, 24*time.Hour, 10)
now := time.Now()
for i := 0; i < 3; i++ {
if err := l.Allow("device-1", "issuer-A", now.Add(time.Duration(i)*time.Minute)); err != nil {
t.Fatalf("call %d should be allowed: %v", i+1, err)
}
}
if err := l.Allow("device-1", "issuer-A", now.Add(4*time.Minute)); !errors.Is(err, ErrRateLimited) {
t.Fatalf("4th call should be rate-limited; got %v", err)
}
}
func TestPerDeviceRateLimiter_DistinctKeysIndependent(t *testing.T) {
l := NewPerDeviceRateLimiter(1, 24*time.Hour, 10)
now := time.Now()
if err := l.Allow("device-1", "issuer-A", now); err != nil {
t.Fatalf("first allow: %v", err)
}
// Different subject — independent bucket.
if err := l.Allow("device-2", "issuer-A", now); err != nil {
t.Fatalf("different subject must have its own bucket: %v", err)
}
// Different issuer — also independent.
if err := l.Allow("device-1", "issuer-B", now); err != nil {
t.Fatalf("different issuer must have its own bucket: %v", err)
}
// Same key as call 1 — must be limited.
if err := l.Allow("device-1", "issuer-A", now.Add(1*time.Second)); !errors.Is(err, ErrRateLimited) {
t.Fatalf("repeat key should be limited; got %v", err)
}
}
func TestPerDeviceRateLimiter_WindowExpiry(t *testing.T) {
l := NewPerDeviceRateLimiter(2, 1*time.Hour, 10)
now := time.Now()
if err := l.Allow("dev", "iss", now); err != nil {
t.Fatal(err)
}
if err := l.Allow("dev", "iss", now.Add(30*time.Minute)); err != nil {
t.Fatal(err)
}
// Inside window — limited.
if err := l.Allow("dev", "iss", now.Add(45*time.Minute)); !errors.Is(err, ErrRateLimited) {
t.Fatalf("inside-window 3rd call should be limited: %v", err)
}
// Past window — slots reopen.
if err := l.Allow("dev", "iss", now.Add(2*time.Hour)); err != nil {
t.Fatalf("past-window call should be allowed (window reset): %v", err)
}
}
func TestPerDeviceRateLimiter_DisabledBypass(t *testing.T) {
l := NewPerDeviceRateLimiter(0, 24*time.Hour, 10) // maxN=0 → disabled
if !l.Disabled() {
t.Fatal("limiter with maxN=0 must report Disabled()=true")
}
now := time.Now()
for i := 0; i < 100; i++ {
if err := l.Allow("dev", "iss", now); err != nil {
t.Fatalf("disabled limiter must allow everything: %v", err)
}
}
// Disabled limiter doesn't track buckets.
if got := l.Len(); got != 0 {
t.Errorf("disabled limiter Len() = %d, want 0", got)
}
}
func TestPerDeviceRateLimiter_NegativeCapDisabled(t *testing.T) {
l := NewPerDeviceRateLimiter(-1, 24*time.Hour, 10)
if !l.Disabled() {
t.Fatal("negative maxN must produce a disabled limiter")
}
}
func TestPerDeviceRateLimiter_EmptySubjectShortCircuits(t *testing.T) {
// Empty subject is the caller's defense-in-depth case (claim validation
// upstream should reject empty-subject claims first). Limiter must not
// build a single shared bucket keyed by empty-subject — that would
// be a fleet-wide chokepoint.
l := NewPerDeviceRateLimiter(1, 24*time.Hour, 10)
now := time.Now()
for i := 0; i < 50; i++ {
if err := l.Allow("", "iss", now); err != nil {
t.Fatalf("empty subject must short-circuit (call %d): %v", i, err)
}
}
if got := l.Len(); got != 0 {
t.Errorf("Len after 50 empty-subject calls = %d, want 0 (no bucket created)", got)
}
}
func TestPerDeviceRateLimiter_DefaultCapsHonored(t *testing.T) {
l := NewPerDeviceRateLimiter(5, 0, 0) // window=0 → 24h default; cap=0 → 100k default
if l.window != 24*time.Hour {
t.Errorf("default window = %v, want 24h", l.window)
}
if l.cap != 100_000 {
t.Errorf("default cap = %d, want 100000", l.cap)
}
}
func TestPerDeviceRateLimiter_MapCapEvictsOldest(t *testing.T) {
// Cap of 3 keys to exercise the eviction branch deterministically.
l := NewPerDeviceRateLimiter(2, 1*time.Hour, 3)
now := time.Now()
// Insert 3 distinct keys with increasing timestamps.
for i := 0; i < 3; i++ {
key := fmt.Sprintf("dev-%d", i)
if err := l.Allow(key, "iss", now.Add(time.Duration(i)*time.Minute)); err != nil {
t.Fatalf("insert %d: %v", i, err)
}
}
if l.Len() != 3 {
t.Fatalf("Len = %d, want 3", l.Len())
}
// 4th key forces eviction of dev-0 (its newest timestamp is oldest).
if err := l.Allow("dev-3", "iss", now.Add(10*time.Minute)); err != nil {
t.Fatalf("4th-key insert: %v", err)
}
if l.Len() != 3 {
t.Errorf("Len after at-cap insert = %d, want 3 (cap honored)", l.Len())
}
}
func TestPerDeviceRateLimiter_ConcurrentRaceFree(t *testing.T) {
if testing.Short() {
t.Skip("race-style test under -short")
}
l := NewPerDeviceRateLimiter(50, 24*time.Hour, 10000)
var wg sync.WaitGroup
for g := 0; g < 20; g++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
now := time.Now()
key := fmt.Sprintf("dev-%d", id)
for i := 0; i < 30; i++ {
_ = l.Allow(key, "iss", now)
}
}(g)
}
wg.Wait()
if got := l.Len(); got != 20 {
t.Errorf("expected 20 distinct keys; got %d", got)
}
}
func TestPruneOlderThan(t *testing.T) {
t0 := time.Now()
in := []time.Time{
t0.Add(-3 * time.Hour), // pruned (older than cutoff)
t0.Add(-2 * time.Hour), // pruned (older than cutoff)
t0.Add(-1 * time.Hour), // survives (-60m is NEWER than the -90m cutoff)
t0.Add(-30 * time.Minute), // survives
t0, // survives
}
out := pruneOlderThan(in, t0.Add(-90*time.Minute))
if len(out) != 3 {
t.Fatalf("len(out) = %d, want 3 (-1h, -30m, t0 all newer than -90m cutoff)", len(out))
}
if !out[0].Equal(t0.Add(-1 * time.Hour)) {
t.Errorf("out[0] = %v, want -1h (oldest surviving entry)", out[0])
}
}
func TestPruneOlderThan_NoOpWhenNothingToPrune(t *testing.T) {
t0 := time.Now()
in := []time.Time{t0.Add(-1 * time.Minute), t0}
out := pruneOlderThan(in, t0.Add(-1*time.Hour))
// Same slice header (no copy needed).
if len(out) != len(in) {
t.Fatalf("len(out) = %d, want %d", len(out), len(in))
}
}