mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 00:28:58 +00:00
EST RFC 7030 hardening master bundle Phases 2-4: end-to-end mTLS sibling
route + RFC 9266 channel binding + HTTP Basic enrollment-password +
per-source-IP failed-auth limit + per-(CN, sourceIP) sliding-window cap.
Two new shared packages so EST + Intune share infrastructure:
- internal/cms/ — RFC 9266 tls-exporter extractor (ExtractTLSExporter
with stdlib-panic recovery for synthetic ConnectionStates) +
CSR-side channel-binding parser via raw TBSCertificationRequestInfo
walk (the stdlib's csr.Attributes can't represent the OCTET STRING
binding value), VerifyChannelBinding composite, EmbedChannel-
BindingAttribute fixture helper, typed sentinel errors for missing
/ mismatch / not-TLS-1.3 mapped to HTTP 400 / 409 / 426 in handler.
- internal/trustanchor/ — extracted from scep/intune/trust_anchor*.go
so the EST mTLS sibling route + Intune dispatcher share the same
SIGHUP-reloadable PEM bundle primitive. intune.TrustAnchorHolder
is now `= trustanchor.Holder` (type alias) + NewTrustAnchorHolder =
trustanchor.New (function alias) — every existing call site compiles
unchanged. Intune's LoadTrustAnchor is a thin wrapper over
trustanchor.LoadBundle. White-box tests moved to the new package.
- internal/ratelimit/ — extracted from scep/intune/rate_limit.go (this
was Phase 4.1, in the same bundle). intune.PerDeviceRateLimiter
is now a thin wrapper preserving the (subject, issuer)→key
composition; EST handler reaches for SlidingWindowLimiter directly.
ESTHandler grew six optional fields wired by per-profile setters
(SetMTLSTrust / SetChannelBindingRequired / SetEnrollmentPassword /
SetSourceIPRateLimiter / SetPerPrincipalRateLimiter / SetLabelForLog)
plus four new mTLS-route methods (CACertsMTLS / SimpleEnrollMTLS /
SimpleReEnrollMTLS / CSRAttrsMTLS); shared internal pipeline
handleEnrollOrReEnroll(reEnroll, viaMTLS) keeps the auth/binding/
rate-limit gates DRY. New router method RegisterESTMTLSHandlers
registers /.well-known/est-mtls/<PathID>/{cacerts,simpleenroll,
simplereenroll,csrattrs}; AuthExemptDispatchPrefixes extends the
no-auth chain to /.well-known/est-mtls.
cmd/server/main.go's EST loop wires per-profile mTLS holder +
channel-binding policy + per-principal limiter + (when EnrollmentPassword
non-empty) Basic + source-IP limiter; new preflightESTMTLSClientCATrust-
Bundle returns *trustanchor.Holder so SIGHUP rotates the EST mTLS
bundle live without restart. SCEP + EST mTLS profiles now share a
single union mtlsUnionPoolForTLS passed to buildServerTLSConfigWithMTLS
(replaces the protocol-specific scepMTLSUnionPoolForTLS); per-handler
re-verify enforces "cert must chain to THIS profile's bundle" so
cross-protocol bleed is blocked at the application layer even though
the TLS layer trusts certs from either pool's union.
Phase 3.3 source-IP failed-Basic limiter defaults: 10 attempts / 1h
/ 50k tracked IPs (no env var; tunable in a follow-up). Phase 4.2
per-principal limiter cap from CERTCTL_EST_PROFILE_<NAME>_RATE_
LIMIT_PER_PRINCIPAL_24H (existing field, Phase 1 shipped).
New tests:
- internal/cms/channelbinding_test.go: extractor + CSR-side parser +
composite + TLS-1.3 round-trip end-to-end + EmbedChannelBinding-
Attribute round-trip
- internal/trustanchor/holder_test.go: parseBundlePEM white-box +
LoadBundle + Holder Get/Pool/SetLabelForLog/Reload-happy/
Reload-keeps-old-on-failure/Reload-keeps-old-on-expired/
WatchSIGHUP-reloads-pool/WatchSIGHUP-stop-clean
- internal/api/handler/est_hardening_test.go: 16 named cases covering
mTLS no-trust-pool 500 + no-cert 401 + cross-profile cert 401 +
happy-path 200 + CACertsMTLS auth gate + CSRAttrsMTLS auth gate +
channel-binding required-absent-rejected + not-required-absent-
allowed + writeChannelBindingError mapping + Basic no-header 401
+ Basic wrong-password 401 + Basic correct-200 + Basic-no-password
no-gate + per-IP failed-attempt lockout 429 + per-principal
blocks-after-cap + different-principals-independent + no-limiter-
unbounded.
Pre-commit verification (sandbox): gofmt clean, go vet clean
(excluding repository/postgres which the sandbox can't build —
disk-space testcontainers download), staticcheck clean for
cms/trustanchor/api/handler/api/router/scep/intune/ratelimit/
cmd/server, go test -short -count=1 green for cms/trustanchor/
api/handler/api/router/scep/intune/ratelimit/service. G-3
docs-drift guard reproduced locally clean (Phase 1 already
documented every new env var; Phases 2-4 added zero new env vars).
This commit is contained in:
@@ -0,0 +1,188 @@
|
||||
// Package ratelimit provides shared rate-limit primitives used by
|
||||
// authenticated-but-shared-credential code paths (SCEP/Intune
|
||||
// per-device challenge enrollment, EST per-principal CSR enrollment,
|
||||
// EST HTTP-Basic source-IP failed-auth limiter) where the threat
|
||||
// model is "single legitimate identity could mint enrollments
|
||||
// faster than any human/fleet workflow would."
|
||||
//
|
||||
// Origin: this package was extracted from
|
||||
// internal/scep/intune/rate_limit.go in the EST RFC 7030 hardening
|
||||
// master bundle Phase 4.1 — EST is the third caller after the
|
||||
// Intune dispatcher (per-device-GUID cap on enrollment) and the EST
|
||||
// per-principal cap (Phase 4.2). The original Intune-package type +
|
||||
// constructor + ErrRateLimited sentinel are preserved as type
|
||||
// aliases at internal/scep/intune/rate_limit.go so existing call
|
||||
// sites compile unchanged. New callers SHOULD use this package
|
||||
// directly.
|
||||
//
|
||||
// Algorithm: sliding window log. Each key maps to a bucket holding
|
||||
// timestamps within the configured window. On Allow, the bucket
|
||||
// prunes timestamps older than (now - window) and either appends +
|
||||
// returns nil, or rejects + returns ErrRateLimited when the
|
||||
// post-prune count is already at the cap. Exact (no token-leak
|
||||
// rounding); O(N_per_key) per-call but N is bounded by the cap, so
|
||||
// effectively O(1).
|
||||
//
|
||||
// Concurrency: safe for concurrent Allow calls. Internal map guarded
|
||||
// by sync.Mutex; per-key slices mutated only while the mutex is
|
||||
// held.
|
||||
//
|
||||
// Memory: bounded by the per-instance map cap (default 100,000 keys;
|
||||
// configurable). At-cap eviction drops the oldest entry by newest
|
||||
// timestamp — small janitor pass; rarely fires in practice because
|
||||
// the prune-on-Allow path keeps most buckets short-lived.
|
||||
package ratelimit
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ErrRateLimited is returned by SlidingWindowLimiter.Allow when the
|
||||
// bucket for the given key is already at the cap. Callers can
|
||||
// errors.Is against this sentinel; the underlying message is stable
|
||||
// across the package's lifetime so test assertions can match on it.
|
||||
var ErrRateLimited = errors.New("ratelimit: per-key cap exceeded for the configured window")
|
||||
|
||||
// SlidingWindowLimiter is the sliding-window-log rate limiter.
|
||||
//
|
||||
// Construct via NewSlidingWindowLimiter. The zero value is NOT
|
||||
// usable — the buckets map needs initialisation.
|
||||
type SlidingWindowLimiter struct {
|
||||
mu sync.Mutex
|
||||
buckets map[string][]time.Time // key → sliding window of timestamps
|
||||
maxN int // max enrollments per window
|
||||
window time.Duration // window length (default 24h)
|
||||
cap int // max keys before LRU eviction kicks in
|
||||
disabled bool // maxN <= 0 → all Allow calls return nil
|
||||
}
|
||||
|
||||
// NewSlidingWindowLimiter returns a limiter with the given per-key
|
||||
// cap + window. maxN <= 0 disables the limiter (all Allow calls
|
||||
// return nil); this is operator opt-out for the rare case where the
|
||||
// per-key cap is undesirable (test harnesses, sketchpad deploys).
|
||||
//
|
||||
// Window defaults to 24h when zero. Map cap defaults to 100,000 when
|
||||
// zero (matches the SCEP/Intune replay cache cap).
|
||||
func NewSlidingWindowLimiter(maxN int, window time.Duration, mapCap int) *SlidingWindowLimiter {
|
||||
if window <= 0 {
|
||||
window = 24 * time.Hour
|
||||
}
|
||||
if mapCap <= 0 {
|
||||
mapCap = 100_000
|
||||
}
|
||||
return &SlidingWindowLimiter{
|
||||
buckets: make(map[string][]time.Time),
|
||||
maxN: maxN,
|
||||
window: window,
|
||||
cap: mapCap,
|
||||
disabled: maxN <= 0,
|
||||
}
|
||||
}
|
||||
|
||||
// Allow reports whether an event keyed by `key` is permitted right
|
||||
// now. Returns nil when allowed (and records the timestamp in the
|
||||
// bucket) or ErrRateLimited when the bucket is at maxN.
|
||||
//
|
||||
// Empty key is treated as "skip the limiter" — the caller's
|
||||
// validation should have rejected an empty-key event already; this
|
||||
// is belt-and-suspenders so a single empty-key bucket doesn't
|
||||
// become a chokepoint for every empty-key event. SCEP/Intune
|
||||
// callers compose the key as `subject + "|" + issuer`; EST callers
|
||||
// compose `cn + "|" + sourceIP` or `sourceIP`-alone for the
|
||||
// failed-auth limiter.
|
||||
func (l *SlidingWindowLimiter) Allow(key string, now time.Time) error {
|
||||
if l.disabled {
|
||||
return nil
|
||||
}
|
||||
if key == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
l.mu.Lock()
|
||||
defer l.mu.Unlock()
|
||||
|
||||
// At-cap eviction: when the map is full, drop the oldest entry
|
||||
// by finding the bucket whose newest timestamp is the smallest.
|
||||
// O(N_keys) but rarely fires; the prune-on-Allow path keeps
|
||||
// most buckets short-lived.
|
||||
if len(l.buckets) >= l.cap {
|
||||
l.evictOldestLocked()
|
||||
}
|
||||
|
||||
bucket := l.buckets[key]
|
||||
bucket = pruneOlderThan(bucket, now.Add(-l.window))
|
||||
|
||||
if len(bucket) >= l.maxN {
|
||||
// Don't append; over the limit. Persist the pruned bucket so
|
||||
// the next call sees the most-recently-pruned state.
|
||||
l.buckets[key] = bucket
|
||||
return ErrRateLimited
|
||||
}
|
||||
|
||||
bucket = append(bucket, now)
|
||||
l.buckets[key] = bucket
|
||||
return nil
|
||||
}
|
||||
|
||||
// pruneOlderThan returns the slice with all entries strictly before
|
||||
// `cutoff` removed. Preserves order (timestamps are appended in
|
||||
// increasing time, so a single linear scan from the front suffices).
|
||||
func pruneOlderThan(b []time.Time, cutoff time.Time) []time.Time {
|
||||
i := 0
|
||||
for i < len(b) && b[i].Before(cutoff) {
|
||||
i++
|
||||
}
|
||||
if i == 0 {
|
||||
return b
|
||||
}
|
||||
// Copy-shrink to release the underlying-array memory eventually
|
||||
// (otherwise the slice would hold a reference to the older
|
||||
// entries indefinitely until a re-allocation).
|
||||
out := make([]time.Time, len(b)-i)
|
||||
copy(out, b[i:])
|
||||
return out
|
||||
}
|
||||
|
||||
// evictOldestLocked drops the map entry whose newest timestamp is
|
||||
// the oldest. Called under l.mu. O(N_keys) per eviction; at-cap is
|
||||
// rare in practice (caps are sized for steady-state).
|
||||
func (l *SlidingWindowLimiter) evictOldestLocked() {
|
||||
var (
|
||||
oldestKey string
|
||||
oldestTs time.Time
|
||||
first = true
|
||||
)
|
||||
for k, b := range l.buckets {
|
||||
if len(b) == 0 {
|
||||
// Empty bucket — drop it immediately, no candidate scan needed.
|
||||
delete(l.buckets, k)
|
||||
return
|
||||
}
|
||||
newest := b[len(b)-1]
|
||||
if first || newest.Before(oldestTs) {
|
||||
oldestKey = k
|
||||
oldestTs = newest
|
||||
first = false
|
||||
}
|
||||
}
|
||||
if oldestKey != "" {
|
||||
delete(l.buckets, oldestKey)
|
||||
}
|
||||
}
|
||||
|
||||
// Len returns the approximate number of distinct keys currently
|
||||
// tracked. For observability + tests; not load-stable under
|
||||
// concurrent Allow calls.
|
||||
func (l *SlidingWindowLimiter) Len() int {
|
||||
l.mu.Lock()
|
||||
defer l.mu.Unlock()
|
||||
return len(l.buckets)
|
||||
}
|
||||
|
||||
// Disabled reports whether the limiter is in opt-out mode (maxN <= 0).
|
||||
// Useful for handler-side gating + admin-endpoint observability.
|
||||
func (l *SlidingWindowLimiter) Disabled() bool {
|
||||
return l.disabled
|
||||
}
|
||||
@@ -0,0 +1,197 @@
|
||||
package ratelimit
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"sync"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// EST RFC 7030 hardening master bundle Phase 4.1: this test file holds the
|
||||
// white-box tests for the SlidingWindowLimiter primitives that used to live
|
||||
// in internal/scep/intune/rate_limit_test.go (TestPerDeviceRateLimiter_
|
||||
// DefaultCapsHonored, TestPruneOlderThan, TestPruneOlderThan_NoOpWhen
|
||||
// NothingToPrune). The behavioral coverage in intune/rate_limit_test.go
|
||||
// stays — it exercises the wrapper's (subject, issuer)-composition contract
|
||||
// + the empty-subject short-circuit + concurrent race-freedom.
|
||||
|
||||
func TestSlidingWindowLimiter_AllowsUpToCap(t *testing.T) {
|
||||
l := NewSlidingWindowLimiter(3, 24*time.Hour, 10)
|
||||
now := time.Now()
|
||||
for i := 0; i < 3; i++ {
|
||||
if err := l.Allow("k", now.Add(time.Duration(i)*time.Minute)); err != nil {
|
||||
t.Fatalf("call %d should be allowed: %v", i+1, err)
|
||||
}
|
||||
}
|
||||
if err := l.Allow("k", now.Add(4*time.Minute)); !errors.Is(err, ErrRateLimited) {
|
||||
t.Fatalf("4th call should be rate-limited; got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_DistinctKeysIndependent(t *testing.T) {
|
||||
l := NewSlidingWindowLimiter(1, 24*time.Hour, 10)
|
||||
now := time.Now()
|
||||
|
||||
if err := l.Allow("k-1", now); err != nil {
|
||||
t.Fatalf("first allow: %v", err)
|
||||
}
|
||||
if err := l.Allow("k-2", now); err != nil {
|
||||
t.Fatalf("different key must have its own bucket: %v", err)
|
||||
}
|
||||
if err := l.Allow("k-1", now.Add(1*time.Second)); !errors.Is(err, ErrRateLimited) {
|
||||
t.Fatalf("repeat key should be limited; got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_WindowExpiry(t *testing.T) {
|
||||
l := NewSlidingWindowLimiter(2, 1*time.Hour, 10)
|
||||
now := time.Now()
|
||||
|
||||
if err := l.Allow("k", now); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if err := l.Allow("k", now.Add(30*time.Minute)); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
// Inside window — limited.
|
||||
if err := l.Allow("k", now.Add(45*time.Minute)); !errors.Is(err, ErrRateLimited) {
|
||||
t.Fatalf("inside-window 3rd call should be limited: %v", err)
|
||||
}
|
||||
// Past window — slots reopen.
|
||||
if err := l.Allow("k", now.Add(2*time.Hour)); err != nil {
|
||||
t.Fatalf("past-window call should be allowed (window reset): %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_DisabledBypass(t *testing.T) {
|
||||
l := NewSlidingWindowLimiter(0, 24*time.Hour, 10) // maxN=0 → disabled
|
||||
if !l.Disabled() {
|
||||
t.Fatal("limiter with maxN=0 must report Disabled()=true")
|
||||
}
|
||||
now := time.Now()
|
||||
for i := 0; i < 100; i++ {
|
||||
if err := l.Allow("k", now); err != nil {
|
||||
t.Fatalf("disabled limiter must allow everything: %v", err)
|
||||
}
|
||||
}
|
||||
if got := l.Len(); got != 0 {
|
||||
t.Errorf("disabled limiter Len() = %d, want 0", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_NegativeCapDisabled(t *testing.T) {
|
||||
l := NewSlidingWindowLimiter(-1, 24*time.Hour, 10)
|
||||
if !l.Disabled() {
|
||||
t.Fatal("negative maxN must produce a disabled limiter")
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_EmptyKeyShortCircuits(t *testing.T) {
|
||||
// Empty key is the caller's defense-in-depth case — caller's validation
|
||||
// upstream should reject empty-key events first. Limiter must not build
|
||||
// a single shared bucket keyed by empty-key — that would be a chokepoint
|
||||
// for every empty-key event.
|
||||
l := NewSlidingWindowLimiter(1, 24*time.Hour, 10)
|
||||
now := time.Now()
|
||||
for i := 0; i < 50; i++ {
|
||||
if err := l.Allow("", now); err != nil {
|
||||
t.Fatalf("empty key must short-circuit (call %d): %v", i, err)
|
||||
}
|
||||
}
|
||||
if got := l.Len(); got != 0 {
|
||||
t.Errorf("Len after 50 empty-key calls = %d, want 0 (no bucket created)", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_DefaultCapsHonored(t *testing.T) {
|
||||
// White-box test: exercises the constructor's default-fill branches.
|
||||
// Lives here (not in the intune wrapper test) because the fields
|
||||
// (window + cap) are package-private to ratelimit.
|
||||
l := NewSlidingWindowLimiter(5, 0, 0) // window=0 → 24h default; cap=0 → 100k default
|
||||
if l.window != 24*time.Hour {
|
||||
t.Errorf("default window = %v, want 24h", l.window)
|
||||
}
|
||||
if l.cap != 100_000 {
|
||||
t.Errorf("default cap = %d, want 100000", l.cap)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_MapCapEvictsOldest(t *testing.T) {
|
||||
// Cap of 3 keys to exercise the eviction branch deterministically.
|
||||
l := NewSlidingWindowLimiter(2, 1*time.Hour, 3)
|
||||
now := time.Now()
|
||||
|
||||
for i := 0; i < 3; i++ {
|
||||
key := fmt.Sprintf("k-%d", i)
|
||||
if err := l.Allow(key, now.Add(time.Duration(i)*time.Minute)); err != nil {
|
||||
t.Fatalf("insert %d: %v", i, err)
|
||||
}
|
||||
}
|
||||
if l.Len() != 3 {
|
||||
t.Fatalf("Len = %d, want 3", l.Len())
|
||||
}
|
||||
|
||||
// 4th key forces eviction of k-0 (its newest timestamp is oldest).
|
||||
if err := l.Allow("k-3", now.Add(10*time.Minute)); err != nil {
|
||||
t.Fatalf("4th-key insert: %v", err)
|
||||
}
|
||||
if l.Len() != 3 {
|
||||
t.Errorf("Len after at-cap insert = %d, want 3 (cap honored)", l.Len())
|
||||
}
|
||||
}
|
||||
|
||||
func TestSlidingWindowLimiter_ConcurrentRaceFree(t *testing.T) {
|
||||
if testing.Short() {
|
||||
t.Skip("race-style test under -short")
|
||||
}
|
||||
l := NewSlidingWindowLimiter(50, 24*time.Hour, 10000)
|
||||
var wg sync.WaitGroup
|
||||
for g := 0; g < 20; g++ {
|
||||
wg.Add(1)
|
||||
go func(id int) {
|
||||
defer wg.Done()
|
||||
now := time.Now()
|
||||
key := fmt.Sprintf("k-%d", id)
|
||||
for i := 0; i < 30; i++ {
|
||||
_ = l.Allow(key, now)
|
||||
}
|
||||
}(g)
|
||||
}
|
||||
wg.Wait()
|
||||
if got := l.Len(); got != 20 {
|
||||
t.Errorf("expected 20 distinct keys; got %d", got)
|
||||
}
|
||||
}
|
||||
|
||||
// White-box tests for the unexported pruneOlderThan helper. Live in this
|
||||
// package because the helper is package-private to ratelimit. The test
|
||||
// surface used to live in intune/rate_limit_test.go before the Phase 4.1
|
||||
// extraction.
|
||||
func TestPruneOlderThan(t *testing.T) {
|
||||
t0 := time.Now()
|
||||
in := []time.Time{
|
||||
t0.Add(-3 * time.Hour), // pruned (older than cutoff)
|
||||
t0.Add(-2 * time.Hour), // pruned (older than cutoff)
|
||||
t0.Add(-1 * time.Hour), // survives (-60m is NEWER than the -90m cutoff)
|
||||
t0.Add(-30 * time.Minute), // survives
|
||||
t0, // survives
|
||||
}
|
||||
out := pruneOlderThan(in, t0.Add(-90*time.Minute))
|
||||
if len(out) != 3 {
|
||||
t.Fatalf("len(out) = %d, want 3 (-1h, -30m, t0 all newer than -90m cutoff)", len(out))
|
||||
}
|
||||
if !out[0].Equal(t0.Add(-1 * time.Hour)) {
|
||||
t.Errorf("out[0] = %v, want -1h (oldest surviving entry)", out[0])
|
||||
}
|
||||
}
|
||||
|
||||
func TestPruneOlderThan_NoOpWhenNothingToPrune(t *testing.T) {
|
||||
t0 := time.Now()
|
||||
in := []time.Time{t0.Add(-1 * time.Minute), t0}
|
||||
out := pruneOlderThan(in, t0.Add(-1*time.Hour))
|
||||
// Same slice header (no copy needed).
|
||||
if len(out) != len(in) {
|
||||
t.Fatalf("len(out) = %d, want %d", len(out), len(in))
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user