mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-09 20:08:53 +00:00
0861aa9482
Phase 7 of the SCEP RFC 8894 + Intune master bundle. Adds the
internal/scep/intune package that validates Microsoft Intune Certificate
Connector signed challenges embedded in SCEP CSR challengePassword
attributes. This is the parsing/validation foundation; Phase 8 wires it
into the SCEP service dispatcher.
What's included:
* doc.go — package architecture (Intune cloud → Connector → certctl
SCEP server) + 'what this package is NOT' guard rails. We do NOT
implement full JOSE: no JKU / kid / x5c trust, no JWKS fetch.
Trust anchor is operator-supplied at startup and pinned. The
package does NOT call Microsoft's API directly — the Connector
already did that; we validate its signed attestation.
* trust_anchor.go — LoadTrustAnchor(path) reads a PEM bundle of
Intune Connector signing certs. Skips non-CERTIFICATE PEM blocks
(operators sometimes paste chains with the priv key by mistake).
Rejects empty bundles + expired certs at startup with an
operator-actionable message including the cert subject. SIGHUP
reload lands in Phase 8.5; today it's load-once-at-boot.
* claim.go — ChallengeClaim struct + DeviceMatchesCSR helper.
Set-equality semantics for SAN-DNS/SAN-RFC822/SAN-UPN: the CSR
must carry EXACTLY the claim's elements, no extras and no missing.
Empty claim slice = no constraint on that dimension.
Per-dimension typed errors (ErrClaimCNMismatch /
ErrClaimSANDNSMismatch / ErrClaimSANRFC822Mismatch /
ErrClaimSANUPNMismatch) so audit logs surface the failure
dimension without string-matching. extractUPNSans is stubbed to
return nil with documented fail-closed behavior — non-empty UPN
claims fail the equalSets check (correct behavior; the rare deploy
that pins UPN SANs hot-fixes the ASN.1 walker per the inline
comment).
* replay.go — ReplayCache: bounded in-memory cache of seen nonces
with TTL. Sized for 100,000 entries (60-min Connector validity ×
25 RPS Intune fleet steady-state ≈ 90,000 challenges/hour with
headroom). sync.Map for concurrent read/write; janitor goroutine
wakes every TTL/4 to evict expired entries; at-cap O(N)
oldest-eviction (rarely fires; janitor keeps the cache below
cap). Redis-backed variant deferred to V3-Pro.
* challenge.go — the load-bearing piece:
- ParseChallenge(raw) splits the JWT-like compact serialization
into header/payload/signature and base64url-decodes each.
Tolerates both padded + unpadded encodings (some Connector
builds emit padded; RFC 7515 §2 says unpadded; we accept both).
Validates the header parses as JSON before returning so the
malformed-signal lands earlier in the pipeline.
- ValidateChallenge(raw, trust, expectedAudience, now):
1. ParseChallenge
2. JWS signature verify over (segment0 || '.' || segment1)
— re-derived from the raw on-wire bytes, NOT
re-base64-encoded, per RFC 7515 §3.1 (re-encoding could
produce a byte-different input than what was signed)
3. Signature alg dispatch:
RS256: rsa.VerifyPKCS1v15(SHA-256)
ES256: tries fixed-width r||s (JOSE-canonical) first,
falls back to ASN.1 DER (older Connectors)
alg=none: explicit reject with audit-log-friendly
message (RFC 7515 §3.6 attack vector)
HS*/PS*: rejected as 'unsupported alg' (no shared
secret in our threat model)
4. Version-detection prelude (versionedChallenge struct +
versionUnmarshalers map). Today's format is v1 (no
explicit version field; absence IS the v1 signal). Adding
v2 = adding a parser + a registration line; v1 path stays
untouched. Defends against the inevitable Microsoft format
change at ~30 LoC + 2 tests cost vs. a P0 incident.
5. Time bounds (iat / exp); audience pin (skipped when
expectedAudience == "").
Replay protection is the CALLER's job (handler glues parser +
cache; validator stays stateless + testable).
* Typed errors: ErrChallengeMalformed / ErrChallengeSignature /
ErrChallengeExpired / ErrChallengeNotYetValid /
ErrChallengeWrongAudience / ErrChallengeReplay /
ErrChallengeUnknownVersion. errors.Is-friendly so the handler
can audit failure dimension.
Tests (94.8% coverage):
* challenge_test.go (18 tests): happy-path RS256 + ES256
fixed-width + ES256 DER; TamperedSignature; TamperedPayload;
Expired; NotYetValid; WrongAudience; EmptyExpectedAudience
disables check; RotatedTrustAnchor; EmptyTrustBundle;
AlgNoneRejected; UnsupportedAlg (HS256); MissingAlg;
VersionV1ExplicitOK; VersionUnknownRejected;
MixedTrustBundle iter (skip key-type mismatches without
surfacing as Signature err); NonJSONPayloadButValidSignature;
Malformed cases (empty, missing dots, bad base64, non-JSON
header — 9 sub-cases); PaddedBase64Tolerated.
* claim_test.go (13 tests): per-dimension matching across CN +
SAN-DNS + SAN-RFC822 + SAN-UPN; nil guards; case-insensitive DNS
(RFC 4343); dedupe set-equality; empty claim = no constraint;
UPN stub canary; normaliseSet edge cases; equalSets length
mismatch.
* replay_test.go (11 tests): first-fresh; duplicate-rejected;
past-TTL-fresh; Sweep-evicts-expired; empty-nonce
short-circuits; at-cap LRU eviction; default-cap=100k;
Close-idempotent; TTL=0 disables janitor; concurrent-race-free
(50 goroutines × 200 inserts); empty-nonce twice is fresh both
times (we don't cache empties).
* trust_anchor_test.go: HappyPath single + multi cert; SkipsNonCertBlocks
(priv key + cert mix); EmptyBundleRejected; OnlyKeyBlocksRejected;
ExpiredCertRejected (with subject CN in error); MalformedCertRejected;
LoadTrustAnchor disk + EmptyPath + MissingFile.
* fuzz_test.go: FuzzParseChallenge with seed corpus covering both
the well-formed and the obvious-malformed shapes. Survived 187k
execs in 21s without panic on the local burst; CI runs 5 min.
Verification:
* gofmt -l ./internal/scep/intune: clean
* go vet ./internal/scep/intune/...: clean
* staticcheck ./internal/scep/intune/...: clean
* go test -count=1 -cover ./internal/scep/intune/...: 94.8%
(target was ≥85%)
* go vet ./internal/... ./cmd/...: clean (no rest-of-repo regressions)
* No new CERTCTL_* env vars (those land in Phase 8 with the
config gate); G-3 docs-drift CI guard not triggered.
* No new HTTP routes; openapi-parity guard not triggered.
Phase 8 will:
- Add SCEPProfileConfig.Intune* env vars + preflight gate
- Wire the validator into the SCEP service dispatcher
(Intune-shaped challenges → validator; static → existing path)
- Trust-anchor SIGHUP reload mirroring cmd/server/tls.go::watchSIGHUP
- Per-claim rate limit + audit metrics
Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 7
cowork/scep-rfc8894-intune/progress.md
192 lines
5.8 KiB
Go
192 lines
5.8 KiB
Go
package intune
|
|
|
|
import (
|
|
"sync"
|
|
"time"
|
|
)
|
|
|
|
// ReplayCache is a bounded in-memory cache of seen Intune challenge
|
|
// nonces with TTL. Gates against the same Connector-signed challenge
|
|
// being replayed against the SCEP server within its validity window.
|
|
//
|
|
// SCEP RFC 8894 + Intune master bundle Phase 7.4b.
|
|
//
|
|
// Sizing rationale (cap = 100,000 entries):
|
|
//
|
|
// - Microsoft's published Connector defaults give each challenge
|
|
// a 60-minute validity window. A high-volume Intune fleet
|
|
// enrolling at ~25 RPS hits ~90,000 challenges/hour.
|
|
// - Capping at 100,000 covers the steady-state load with headroom.
|
|
// When the cap is hit, the janitor goroutine evicts entries past
|
|
// TTL first; if all entries are still in-window, oldest-first
|
|
// eviction kicks in (LRU semantics) — accepting the small
|
|
// replay-window risk over an OOM crash.
|
|
// - Operators who push beyond this rate should flip to a Redis-
|
|
// backed implementation (deferred to V3-Pro per the master
|
|
// prompt's deferral list); the in-memory variant is V2 default.
|
|
//
|
|
// Concurrency: sync.Map handles concurrent read/write without an
|
|
// explicit lock; the janitor goroutine periodically walks for expired
|
|
// entries. Cap enforcement on Insert is done under a small mutex so
|
|
// the cap check + size update are atomic.
|
|
type ReplayCache struct {
|
|
entries sync.Map // nonce → expiry (time.Time)
|
|
mu sync.Mutex // guards size + janitor lifecycle
|
|
size int // approximate count (sync.Map has no Len)
|
|
cap int // max entries before LRU eviction kicks in
|
|
ttl time.Duration
|
|
stop chan struct{}
|
|
stopOnce sync.Once
|
|
}
|
|
|
|
// NewReplayCache returns a ReplayCache with the given TTL + cap. Starts
|
|
// a janitor goroutine that wakes every TTL/4 to evict expired entries.
|
|
// Caller MUST call Close when done to stop the goroutine.
|
|
//
|
|
// TTL = 0 disables the janitor (useful for tests that drive expiry
|
|
// manually).
|
|
// cap = 0 defaults to 100,000 (the rationale-documented production
|
|
// default).
|
|
func NewReplayCache(ttl time.Duration, capHint int) *ReplayCache {
|
|
if capHint <= 0 {
|
|
capHint = 100_000
|
|
}
|
|
c := &ReplayCache{
|
|
cap: capHint,
|
|
ttl: ttl,
|
|
stop: make(chan struct{}),
|
|
}
|
|
if ttl > 0 {
|
|
go c.janitor()
|
|
}
|
|
return c
|
|
}
|
|
|
|
// CheckAndInsert returns true when the nonce has NOT been seen before
|
|
// (i.e. the challenge is not a replay) AND records the nonce as seen
|
|
// with expiry = now + c.ttl. Returns false when the nonce was already
|
|
// seen and is still within its TTL window — the caller should treat
|
|
// this as a replay attack and reject the challenge.
|
|
//
|
|
// At-cap behavior: when the cache is full, CheckAndInsert evicts the
|
|
// oldest entry (a single Range pass to find min-expiry) before
|
|
// inserting. This is O(N) at the boundary; in practice the janitor
|
|
// keeps the cache below cap so the eviction path rarely fires.
|
|
func (c *ReplayCache) CheckAndInsert(nonce string, now time.Time) bool {
|
|
if nonce == "" {
|
|
// Empty nonce can't be tracked meaningfully; treat as 'fresh'
|
|
// — the caller's claim-validation should reject empty-nonce
|
|
// challenges separately (it's a Connector-emitted-format bug).
|
|
return true
|
|
}
|
|
|
|
if existing, ok := c.entries.Load(nonce); ok {
|
|
if existingExpiry, _ := existing.(time.Time); now.Before(existingExpiry) {
|
|
return false // replay
|
|
}
|
|
// Past TTL; drop + treat as fresh (race-safe: even if two
|
|
// goroutines see the expired entry, both proceed and the second
|
|
// Insert wins).
|
|
c.delete(nonce)
|
|
}
|
|
|
|
// At-cap LRU eviction.
|
|
c.mu.Lock()
|
|
if c.size >= c.cap {
|
|
c.evictOldestLocked()
|
|
}
|
|
c.size++
|
|
c.mu.Unlock()
|
|
|
|
c.entries.Store(nonce, now.Add(c.ttl))
|
|
return true
|
|
}
|
|
|
|
// Close stops the janitor goroutine. Safe to call multiple times.
|
|
func (c *ReplayCache) Close() {
|
|
c.stopOnce.Do(func() {
|
|
close(c.stop)
|
|
})
|
|
}
|
|
|
|
// Sweep walks the entries and evicts any past TTL. Public so tests
|
|
// can drive expiry without waiting for the janitor's tick. Returns
|
|
// the number of entries evicted.
|
|
func (c *ReplayCache) Sweep(now time.Time) int {
|
|
evicted := 0
|
|
c.entries.Range(func(k, v any) bool {
|
|
expiry, _ := v.(time.Time)
|
|
if !now.Before(expiry) {
|
|
c.delete(k.(string))
|
|
evicted++
|
|
}
|
|
return true
|
|
})
|
|
return evicted
|
|
}
|
|
|
|
// delete is the size-tracked counterpart to entries.Delete. The size
|
|
// counter is approximate (sync.Map.Range races with Insert), but the
|
|
// approximation only affects cap enforcement timing — never causes a
|
|
// false replay rejection.
|
|
func (c *ReplayCache) delete(nonce string) {
|
|
if _, loaded := c.entries.LoadAndDelete(nonce); loaded {
|
|
c.mu.Lock()
|
|
if c.size > 0 {
|
|
c.size--
|
|
}
|
|
c.mu.Unlock()
|
|
}
|
|
}
|
|
|
|
// evictOldestLocked is called under c.mu held. Walks entries to find
|
|
// the entry with the minimum expiry (i.e. the oldest entry — closest
|
|
// to its TTL deadline) and removes it. O(N) but rarely hit; the
|
|
// janitor keeps the cache below cap.
|
|
func (c *ReplayCache) evictOldestLocked() {
|
|
var oldestKey string
|
|
var oldestExpiry time.Time
|
|
first := true
|
|
c.entries.Range(func(k, v any) bool {
|
|
expiry, _ := v.(time.Time)
|
|
if first || expiry.Before(oldestExpiry) {
|
|
oldestKey = k.(string)
|
|
oldestExpiry = expiry
|
|
first = false
|
|
}
|
|
return true
|
|
})
|
|
if oldestKey != "" {
|
|
if _, loaded := c.entries.LoadAndDelete(oldestKey); loaded && c.size > 0 {
|
|
c.size--
|
|
}
|
|
}
|
|
}
|
|
|
|
// janitor wakes every ttl/4 and sweeps expired entries. Background-only;
|
|
// the test harness can drive expiry deterministically via Sweep.
|
|
func (c *ReplayCache) janitor() {
|
|
interval := c.ttl / 4
|
|
if interval <= 0 {
|
|
interval = 1 * time.Minute
|
|
}
|
|
t := time.NewTicker(interval)
|
|
defer t.Stop()
|
|
for {
|
|
select {
|
|
case <-c.stop:
|
|
return
|
|
case <-t.C:
|
|
c.Sweep(time.Now())
|
|
}
|
|
}
|
|
}
|
|
|
|
// Len returns the approximate cache size for observability. Not
|
|
// load-stable; use only for metrics + debug logs.
|
|
func (c *ReplayCache) Len() int {
|
|
c.mu.Lock()
|
|
defer c.mu.Unlock()
|
|
return c.size
|
|
}
|