mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-13 18:08:57 +00:00
EST RFC 7030 hardening master bundle Phases 2-4: end-to-end mTLS sibling
route + RFC 9266 channel binding + HTTP Basic enrollment-password +
per-source-IP failed-auth limit + per-(CN, sourceIP) sliding-window cap.
Two new shared packages so EST + Intune share infrastructure:
- internal/cms/ — RFC 9266 tls-exporter extractor (ExtractTLSExporter
with stdlib-panic recovery for synthetic ConnectionStates) +
CSR-side channel-binding parser via raw TBSCertificationRequestInfo
walk (the stdlib's csr.Attributes can't represent the OCTET STRING
binding value), VerifyChannelBinding composite, EmbedChannel-
BindingAttribute fixture helper, typed sentinel errors for missing
/ mismatch / not-TLS-1.3 mapped to HTTP 400 / 409 / 426 in handler.
- internal/trustanchor/ — extracted from scep/intune/trust_anchor*.go
so the EST mTLS sibling route + Intune dispatcher share the same
SIGHUP-reloadable PEM bundle primitive. intune.TrustAnchorHolder
is now `= trustanchor.Holder` (type alias) + NewTrustAnchorHolder =
trustanchor.New (function alias) — every existing call site compiles
unchanged. Intune's LoadTrustAnchor is a thin wrapper over
trustanchor.LoadBundle. White-box tests moved to the new package.
- internal/ratelimit/ — extracted from scep/intune/rate_limit.go (this
was Phase 4.1, in the same bundle). intune.PerDeviceRateLimiter
is now a thin wrapper preserving the (subject, issuer)→key
composition; EST handler reaches for SlidingWindowLimiter directly.
ESTHandler grew six optional fields wired by per-profile setters
(SetMTLSTrust / SetChannelBindingRequired / SetEnrollmentPassword /
SetSourceIPRateLimiter / SetPerPrincipalRateLimiter / SetLabelForLog)
plus four new mTLS-route methods (CACertsMTLS / SimpleEnrollMTLS /
SimpleReEnrollMTLS / CSRAttrsMTLS); shared internal pipeline
handleEnrollOrReEnroll(reEnroll, viaMTLS) keeps the auth/binding/
rate-limit gates DRY. New router method RegisterESTMTLSHandlers
registers /.well-known/est-mtls/<PathID>/{cacerts,simpleenroll,
simplereenroll,csrattrs}; AuthExemptDispatchPrefixes extends the
no-auth chain to /.well-known/est-mtls.
cmd/server/main.go's EST loop wires per-profile mTLS holder +
channel-binding policy + per-principal limiter + (when EnrollmentPassword
non-empty) Basic + source-IP limiter; new preflightESTMTLSClientCATrust-
Bundle returns *trustanchor.Holder so SIGHUP rotates the EST mTLS
bundle live without restart. SCEP + EST mTLS profiles now share a
single union mtlsUnionPoolForTLS passed to buildServerTLSConfigWithMTLS
(replaces the protocol-specific scepMTLSUnionPoolForTLS); per-handler
re-verify enforces "cert must chain to THIS profile's bundle" so
cross-protocol bleed is blocked at the application layer even though
the TLS layer trusts certs from either pool's union.
Phase 3.3 source-IP failed-Basic limiter defaults: 10 attempts / 1h
/ 50k tracked IPs (no env var; tunable in a follow-up). Phase 4.2
per-principal limiter cap from CERTCTL_EST_PROFILE_<NAME>_RATE_
LIMIT_PER_PRINCIPAL_24H (existing field, Phase 1 shipped).
New tests:
- internal/cms/channelbinding_test.go: extractor + CSR-side parser +
composite + TLS-1.3 round-trip end-to-end + EmbedChannelBinding-
Attribute round-trip
- internal/trustanchor/holder_test.go: parseBundlePEM white-box +
LoadBundle + Holder Get/Pool/SetLabelForLog/Reload-happy/
Reload-keeps-old-on-failure/Reload-keeps-old-on-expired/
WatchSIGHUP-reloads-pool/WatchSIGHUP-stop-clean
- internal/api/handler/est_hardening_test.go: 16 named cases covering
mTLS no-trust-pool 500 + no-cert 401 + cross-profile cert 401 +
happy-path 200 + CACertsMTLS auth gate + CSRAttrsMTLS auth gate +
channel-binding required-absent-rejected + not-required-absent-
allowed + writeChannelBindingError mapping + Basic no-header 401
+ Basic wrong-password 401 + Basic correct-200 + Basic-no-password
no-gate + per-IP failed-attempt lockout 429 + per-principal
blocks-after-cap + different-principals-independent + no-limiter-
unbounded.
Pre-commit verification (sandbox): gofmt clean, go vet clean
(excluding repository/postgres which the sandbox can't build —
disk-space testcontainers download), staticcheck clean for
cms/trustanchor/api/handler/api/router/scep/intune/ratelimit/
cmd/server, go test -short -count=1 green for cms/trustanchor/
api/handler/api/router/scep/intune/ratelimit/service. G-3
docs-drift guard reproduced locally clean (Phase 1 already
documented every new env var; Phases 2-4 added zero new env vars).
This commit is contained in:
@@ -1,193 +1,87 @@
|
||||
package intune
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/ratelimit"
|
||||
)
|
||||
|
||||
// SCEP RFC 8894 + Intune master bundle Phase 8.6.
|
||||
//
|
||||
// PerDeviceRateLimiter is the second line of defense behind the replay cache
|
||||
// from Phase 7. The replay cache catches the same challenge being submitted
|
||||
// twice (within the challenge TTL); this rate limiter catches a compromised
|
||||
// Connector signing key (or a stolen key+cert pair) issuing many DIFFERENT
|
||||
// valid challenges for the same device subject in a short window.
|
||||
// PerDeviceRateLimiter is the second line of defense behind the replay
|
||||
// cache from Phase 7. The replay cache catches the same challenge being
|
||||
// submitted twice (within the challenge TTL); this rate limiter catches a
|
||||
// compromised Connector signing key (or a stolen key+cert pair) issuing
|
||||
// many DIFFERENT valid challenges for the same device subject in a short
|
||||
// window.
|
||||
//
|
||||
// Threat model:
|
||||
//
|
||||
// - Replay cache (Phase 7): nonce-keyed; catches duplicate submission.
|
||||
// - This limiter: (Subject, Issuer)-keyed; catches enrollment-flooding.
|
||||
//
|
||||
// Default: 3 enrollments per (device GUID, Connector identity) per 24h.
|
||||
// EST RFC 7030 hardening master bundle Phase 4.1: the implementation that
|
||||
// used to live in this file was extracted to internal/ratelimit (where it
|
||||
// can be shared with EST per-principal + EST HTTP-Basic source-IP rate
|
||||
// limiters). PerDeviceRateLimiter is now a thin wrapper around
|
||||
// ratelimit.SlidingWindowLimiter that preserves the original
|
||||
// (subject, issuer) → key composition in the Allow signature so existing
|
||||
// SCEP/Intune callers don't have to change.
|
||||
//
|
||||
// Sizing: 100,000 distinct device entries (matches the replay cache cap).
|
||||
// At-cap: oldest entry evicted (small janitor pass) to avoid unbounded
|
||||
// memory growth on a fleet that grows past the cap.
|
||||
//
|
||||
// Why a hand-rolled token bucket instead of pulling in golang.org/x/time/rate:
|
||||
// the rate package is in go.sum as an indirect transitive but NOT a direct
|
||||
// dep. Adding it would create a new direct dep relationship for ~30 LoC of
|
||||
// state machine. The hand-rolled version below uses only stdlib (sync.Mutex
|
||||
// + time.Time arithmetic) and is small enough to fit on one screen.
|
||||
//
|
||||
// Algorithm: each (Subject, Issuer) key maps to a bucket holding a window's
|
||||
// worth of recent enrollment timestamps. On Allow, the bucket prunes
|
||||
// timestamps older than (now - window) and either appends the current
|
||||
// timestamp + returns true, or rejects + returns false when the post-prune
|
||||
// count is already at the cap. This is the "sliding window log" rate
|
||||
// limiter — exact (no token-leak rounding); O(N_per_key) per-call but N is
|
||||
// bounded by the cap (3 by default), so effectively O(1).
|
||||
// New callers SHOULD use ratelimit.SlidingWindowLimiter directly. The
|
||||
// EST RFC 7030 Phase 4.2 EST per-principal cap uses the shared package.
|
||||
|
||||
// ErrRateLimited is the typed error returned when the per-device rate limit
|
||||
// fires. The handler maps this to a CertRep FAILURE with badRequest failInfo
|
||||
// + the `rate_limited` metric label.
|
||||
var ErrRateLimited = errors.New("intune: per-device rate limit exceeded for this (subject, issuer) within the configured window")
|
||||
// ErrRateLimited is the typed error returned when the per-device rate
|
||||
// limit fires. Aliased to ratelimit.ErrRateLimited so errors.Is matches
|
||||
// against either name (the SCEP audit closure already pinned the
|
||||
// "rate_limited" metric label against this sentinel; the alias preserves
|
||||
// sentinel identity across the package boundary).
|
||||
var ErrRateLimited = ratelimit.ErrRateLimited
|
||||
|
||||
// PerDeviceRateLimiter is a sliding-window-log rate limiter keyed by
|
||||
// (Subject, Issuer) tuples derived from a parsed challenge claim.
|
||||
//
|
||||
// Concurrency: the limiter is safe for concurrent Allow calls. The internal
|
||||
// map is guarded by a mutex; the per-key slices are mutated only while the
|
||||
// mutex is held.
|
||||
// PerDeviceRateLimiter wraps ratelimit.SlidingWindowLimiter with the
|
||||
// (subject, issuer)-composed-key Allow signature the Intune dispatcher
|
||||
// uses. Concurrency-safe (the underlying limiter holds the mutex).
|
||||
type PerDeviceRateLimiter struct {
|
||||
mu sync.Mutex
|
||||
buckets map[string][]time.Time // key → sliding window of timestamps
|
||||
maxN int // max enrollments per window
|
||||
window time.Duration // window length (default 24h)
|
||||
cap int // max keys before LRU eviction kicks in
|
||||
disabled bool // maxN == 0 → all Allow calls return nil
|
||||
inner *ratelimit.SlidingWindowLimiter
|
||||
}
|
||||
|
||||
// NewPerDeviceRateLimiter returns a limiter with the given per-key cap +
|
||||
// window. maxN ≤ 0 disables the limiter (all Allow calls return nil); this
|
||||
// is operator opt-out for the rare case where the per-device cap is
|
||||
// window. maxN ≤ 0 disables the limiter (all Allow calls return nil);
|
||||
// this is operator opt-out for the rare case where the per-device cap is
|
||||
// undesirable (e.g. test harnesses, sketchpad deploys).
|
||||
//
|
||||
// Window defaults to 24h when zero. Map cap defaults to 100,000 when zero
|
||||
// (matches the replay cache cap; see internal/scep/intune/replay.go).
|
||||
func NewPerDeviceRateLimiter(maxN int, window time.Duration, mapCap int) *PerDeviceRateLimiter {
|
||||
if window <= 0 {
|
||||
window = 24 * time.Hour
|
||||
}
|
||||
if mapCap <= 0 {
|
||||
mapCap = 100_000
|
||||
}
|
||||
return &PerDeviceRateLimiter{
|
||||
buckets: make(map[string][]time.Time),
|
||||
maxN: maxN,
|
||||
window: window,
|
||||
cap: mapCap,
|
||||
disabled: maxN <= 0,
|
||||
}
|
||||
return &PerDeviceRateLimiter{inner: ratelimit.NewSlidingWindowLimiter(maxN, window, mapCap)}
|
||||
}
|
||||
|
||||
// Allow checks whether an enrollment for the given (subject, issuer) tuple
|
||||
// is permitted right now. Returns nil when allowed (and records the timestamp
|
||||
// in the bucket) or ErrRateLimited when the bucket is at maxN.
|
||||
// Allow checks whether an enrollment for the given (subject, issuer)
|
||||
// tuple is permitted right now. Returns nil when allowed (and records
|
||||
// the timestamp in the bucket) or ErrRateLimited when the bucket is at
|
||||
// maxN.
|
||||
//
|
||||
// Empty subject is treated as "skip the limiter" — the caller's claim
|
||||
// validation should have rejected an empty-subject claim already; this is
|
||||
// belt-and-suspenders to prevent a single empty-subject bucket from
|
||||
// becoming a fleet-wide chokepoint. The Connector emits non-empty subject
|
||||
// (device GUID) on every legitimate challenge.
|
||||
// validation should have rejected an empty-subject claim already; this
|
||||
// is belt-and-suspenders to prevent a single empty-subject bucket from
|
||||
// becoming a fleet-wide chokepoint.
|
||||
func (l *PerDeviceRateLimiter) Allow(subject, issuer string, now time.Time) error {
|
||||
if l.disabled {
|
||||
return nil
|
||||
}
|
||||
if subject == "" {
|
||||
// Caller's claim validation should reject empty-subject upstream;
|
||||
// this short-circuit is defense-in-depth so a misconfigured
|
||||
// Connector can't DoS us via the rate-limit path.
|
||||
// Empty-subject early return preserved from the pre-Phase-4.1
|
||||
// behavior: ratelimit.SlidingWindowLimiter also short-circuits
|
||||
// on empty key, but the explicit check here documents the
|
||||
// (subject, issuer) → empty-key contract and saves one call
|
||||
// frame in the hot path.
|
||||
return nil
|
||||
}
|
||||
key := subject + "|" + issuer
|
||||
|
||||
l.mu.Lock()
|
||||
defer l.mu.Unlock()
|
||||
|
||||
// At-cap eviction: when the map is full, drop the oldest entry by
|
||||
// finding the bucket whose newest timestamp is the smallest. O(N) but
|
||||
// rarely fires; the prune-on-Allow path keeps most buckets short-lived.
|
||||
if len(l.buckets) >= l.cap {
|
||||
l.evictOldestLocked(now)
|
||||
}
|
||||
|
||||
bucket := l.buckets[key]
|
||||
bucket = pruneOlderThan(bucket, now.Add(-l.window))
|
||||
|
||||
if len(bucket) >= l.maxN {
|
||||
// Don't append; over the limit. Persist the pruned bucket so the
|
||||
// next call sees the most-recently-pruned state.
|
||||
l.buckets[key] = bucket
|
||||
return ErrRateLimited
|
||||
}
|
||||
|
||||
bucket = append(bucket, now)
|
||||
l.buckets[key] = bucket
|
||||
return nil
|
||||
}
|
||||
|
||||
// pruneOlderThan returns the slice with all entries strictly before
|
||||
// `cutoff` removed. Preserves order (timestamps are appended in increasing
|
||||
// time, so a single linear scan from the front suffices).
|
||||
func pruneOlderThan(b []time.Time, cutoff time.Time) []time.Time {
|
||||
i := 0
|
||||
for i < len(b) && b[i].Before(cutoff) {
|
||||
i++
|
||||
}
|
||||
if i == 0 {
|
||||
return b
|
||||
}
|
||||
// Copy-shrink to release the underlying-array memory eventually
|
||||
// (otherwise the slice would hold a reference to the older entries
|
||||
// indefinitely until a re-allocation).
|
||||
out := make([]time.Time, len(b)-i)
|
||||
copy(out, b[i:])
|
||||
return out
|
||||
}
|
||||
|
||||
// evictOldestLocked drops the map entry whose newest timestamp is the
|
||||
// oldest. Called under l.mu. O(N_keys) per eviction; at-cap is rare in
|
||||
// practice (caps are sized for fleet steady-state).
|
||||
func (l *PerDeviceRateLimiter) evictOldestLocked(now time.Time) {
|
||||
var (
|
||||
oldestKey string
|
||||
oldestTs time.Time
|
||||
first = true
|
||||
)
|
||||
for k, b := range l.buckets {
|
||||
if len(b) == 0 {
|
||||
// Empty bucket — drop it immediately, no candidate scan needed.
|
||||
delete(l.buckets, k)
|
||||
return
|
||||
}
|
||||
newest := b[len(b)-1]
|
||||
if first || newest.Before(oldestTs) {
|
||||
oldestKey = k
|
||||
oldestTs = newest
|
||||
first = false
|
||||
}
|
||||
}
|
||||
if oldestKey != "" {
|
||||
delete(l.buckets, oldestKey)
|
||||
}
|
||||
// Suppress unused-parameter warning for `now` in case the eviction
|
||||
// strategy changes (e.g. swap to LRU keyed by time of last Allow).
|
||||
_ = now
|
||||
return l.inner.Allow(key, now)
|
||||
}
|
||||
|
||||
// Len returns the approximate number of distinct (subject, issuer) keys
|
||||
// currently tracked. For observability + tests; not load-stable under
|
||||
// concurrent Allow calls.
|
||||
func (l *PerDeviceRateLimiter) Len() int {
|
||||
l.mu.Lock()
|
||||
defer l.mu.Unlock()
|
||||
return len(l.buckets)
|
||||
}
|
||||
// currently tracked. For observability + tests.
|
||||
func (l *PerDeviceRateLimiter) Len() int { return l.inner.Len() }
|
||||
|
||||
// Disabled reports whether the limiter is in opt-out mode (maxN ≤ 0).
|
||||
// Useful for handler-side gating + admin-endpoint observability.
|
||||
func (l *PerDeviceRateLimiter) Disabled() bool {
|
||||
return l.disabled
|
||||
}
|
||||
func (l *PerDeviceRateLimiter) Disabled() bool { return l.inner.Disabled() }
|
||||
|
||||
Reference in New Issue
Block a user