fix(middleware): SEC-006 — TTL-evict idle token-bucket rate-limiter entries

Sprint 2 unified-master-audit closure. Pre-fix the keyed rate
limiter's bucket map had no eviction. The package-level comment
explicitly noted the leak: high-cardinality unauthenticated traffic
(CGNAT churn, Tor exit lists, botnets, infinite-cardinality scanners)
grew process memory unboundedly. Production deploys with millions of
unique IPs would eventually OOM.

Fix:
  - RateLimitConfig.BucketTTL (env CERTCTL_RATE_LIMIT_BUCKET_TTL,
    default 1h, clamp-floor 1m). 1h chosen to be well above realistic
    operator IP churn windows (returning clients keep their bucket)
    and well below the unbounded-leak window the pre-fix code
    allowed.
  - tokenBucket gains a lastAccess field updated on every allow()
    call via touch(); reading via lastAccessTime() under the bucket's
    own mutex.
  - keyedRateLimiter.sweepLoop runs in a single goroutine per
    limiter (production wires 2: default + no-auth fallback), waking
    every BucketTTL/4. sweep() removes any bucket whose lastAccess
    is older than the cutoff and bumps evictedTotal atomically.
  - Both NewRateLimiter call sites in cmd/server/main.go (default
    stack and no-auth fallback) now thread cfg.RateLimit.BucketTTL.

Regression coverage:
  - TestKeyedRateLimiter_SweepEvictsIdleBuckets: 1000 synthetic IP
    keys populate the map, advance past TTL, call sweep() directly,
    assert map drained to 0 + evictedTotal=1000 + fresh key creates
    new bucket (map not poisoned).
  - TestKeyedRateLimiter_SweepKeepsActiveBuckets: inverse — a bucket
    touched within the TTL window survives the sweep. Catches a
    future regression that inverts the cutoff comparison.

Closes SEC-006.
This commit is contained in:
shankar0123
2026-05-16 04:01:18 +00:00
parent 037876fa0f
commit 8f2e5771db
5 changed files with 246 additions and 11 deletions
+10 -5
View File
@@ -446,11 +446,16 @@ func Load() (*Config, error) {
},
},
RateLimit: RateLimitConfig{
Enabled: getEnvBool("CERTCTL_RATE_LIMIT_ENABLED", true),
RPS: getEnvFloat("CERTCTL_RATE_LIMIT_RPS", 50),
BurstSize: getEnvInt("CERTCTL_RATE_LIMIT_BURST", 100),
PerUserRPS: getEnvFloat("CERTCTL_RATE_LIMIT_PER_USER_RPS", 0),
PerUserBurstSize: getEnvInt("CERTCTL_RATE_LIMIT_PER_USER_BURST", 0),
Enabled: getEnvBool("CERTCTL_RATE_LIMIT_ENABLED", true),
RPS: getEnvFloat("CERTCTL_RATE_LIMIT_RPS", 50),
BurstSize: getEnvInt("CERTCTL_RATE_LIMIT_BURST", 100),
PerUserRPS: getEnvFloat("CERTCTL_RATE_LIMIT_PER_USER_RPS", 0),
PerUserBurstSize: getEnvInt("CERTCTL_RATE_LIMIT_PER_USER_BURST", 0),
// SEC-006 closure (Sprint 2, 2026-05-16): bounded unused-bucket
// lifetime. 1h chosen to be well above realistic operator IP
// churn (returning clients keep their bucket) and well below
// the unbounded-leak window the pre-fix code allowed.
BucketTTL: getEnvDuration("CERTCTL_RATE_LIMIT_BUCKET_TTL", 1*time.Hour),
SlidingWindowBackend: getEnv("CERTCTL_RATE_LIMIT_BACKEND", "memory"),
SlidingWindowJanitorInterval: getEnvDuration("CERTCTL_RATE_LIMIT_JANITOR_INTERVAL", 5*time.Minute),
},
+11
View File
@@ -342,6 +342,17 @@ type RateLimitConfig struct {
// Setting: CERTCTL_RATE_LIMIT_PER_USER_BURST environment variable.
PerUserBurstSize int
// BucketTTL bounds the unused-bucket lifetime in the token-bucket
// map. Idle buckets older than BucketTTL are reclaimed by a
// background sweeper running every (BucketTTL/4). Default 1 hour;
// values < 1 minute are clamped up to 1 minute in the limiter
// constructor. Set this lower if the server faces high-cardinality
// unauthenticated traffic (CGNAT churn, Tor exit lists, scanners)
// and the map RSS becomes a concern.
// SEC-006 closure (Sprint 2, 2026-05-16).
// Setting: CERTCTL_RATE_LIMIT_BUCKET_TTL environment variable.
BucketTTL time.Duration
// SlidingWindowBackend selects which backend implements the
// per-key sliding-window-log limiters wired in cmd/server/main.go
// (break-glass login, OCSP per-IP, cert-export per-actor, EST