Files
certctl/internal/ratelimit/postgres_sliding_window.go
T
shankar0123 c8347d742d feat(ratelimit): Phase 13 Sprint 13.2 — postgres-backed sliding window + multi-replica test
Phase 13 Sprint 13.2 closure (architecture diligence audit ARCH-M1):
ships the infrastructure half of the ARCH-M1 substantive close. Adds a
postgres-backed sliding-window rate limiter that satisfies the same
interface as the in-memory primitive — cross-replica-consistent rather
than per-process. Sprint 13.3 wires the 5 call sites through a
backend selector (`CERTCTL_RATELIMIT_BACKEND={memory,postgres}`); this
commit deliberately changes ZERO call sites. The infrastructure +
migration ship as their own review window, mirroring the Phase 9
Sprint 8a/8b pattern.

Substantive close, not document-and-defer
=========================================
The audit recommended "document the per-process limit + defer the
distributed backend to v3." The operator chose Option M1-A (postgres-
backed; zero new infra) over the document-and-defer path. Postgres
is already a hard dependency for certctl; no new operator burden. The
multi-replica integration test in this commit is the falsifiable
closure proof — cap-N enforced exactly across N replicas hitting the
same key concurrently.

Signature ground-truth
======================
The Sprint 13.2 prompt template specified `Allow(key string) error` as
the signature to match. The actual repo signature has been
`Allow(key string, now time.Time) error` since the EST RFC 7030
hardening master bundle Phase 4.1 — the `now` parameter is what makes
the memory limiter testable against synthetic time without an
indirection through clock-injection. The new `Limiter` interface +
`PostgresSlidingWindowLimiter` match the actual repo signature
(`Allow(key string, now time.Time) error`) byte-for-byte. Per CLAUDE.md
"the repo is truth" — the prompt is framing, the code is ground-truth.

Files added
===========

migrations/000046_rate_limit_buckets.up.sql + .down.sql:
  - rate_limit_buckets(bucket_key TEXT PRIMARY KEY, timestamps
    TIMESTAMPTZ[] NOT NULL DEFAULT '{}', updated_at TIMESTAMPTZ NOT
    NULL DEFAULT NOW()).
  - btree index on updated_at supports the Sprint 13.3 janitor sweep.
  - All statements IF NOT EXISTS / DROP IF EXISTS per CLAUDE.md
    "Idempotent migrations" rule.

internal/ratelimit/limiter.go (NEW, 53 LOC):
  - Defines the `Limiter` interface with `Allow(key string,
    now time.Time) error`.
  - Compile-time satisfaction checks for both backends.
  - Doc-comment documents the prompt-vs-repo signature reconciliation
    + the Sprint 13.3 backend-selector plan + why the interface stays
    minimal (Disabled/Len are non-portable cross-backend; keeping them
    off the interface avoids leaking implementation detail).

internal/ratelimit/postgres_sliding_window.go (NEW, 178 LOC):
  - PostgresSlidingWindowLimiter struct + NewPostgresSlidingWindowLimiter
    constructor + Allow + Disabled methods.
  - Algorithm: BEGIN tx → INSERT ON CONFLICT DO NOTHING (ensures the
    row exists) → SELECT ... FOR UPDATE (per-key row lock acquired
    across the cluster) → prune in Go via the shared pruneOlderThan
    helper (single source of truth for prune semantics) → decide
    rate-limited or append → UPDATE → COMMIT.
  - SELECT FOR UPDATE is what arbitrates across replicas. Replicas A
    and B firing simultaneous Allow("k") never race because Postgres
    serializes the row-lock; the memory backend's sync.Mutex only
    arbitrates within a process.
  - Same `maxN <= 0 → disabled` opt-out semantics as the memory
    backend.
  - Empty-key short-circuit (chokepoint avoidance) matches the memory
    backend.
  - Uses pq.Array for TIMESTAMPTZ[] marshalling (lib/pq is the
    existing project driver).

internal/ratelimit/equivalence_test.go (NEW, 304 LOC):
  - Backend-equivalence suite that runs the same scenario set against
    both backends via the `Limiter` interface. 7 scenarios per
    backend: AllowsUpToCap, DistinctKeysIndependent, WindowExpiry,
    DisabledBypass, NegativeCapDisabled, EmptyKeyShortCircuits,
    ConcurrentRaceFree.
  - Memory half: TestSlidingWindowLimiter_Equivalence_Memory — runs
    on every `go test ./...`.
  - Postgres half: TestSlidingWindowLimiter_Equivalence_Postgres —
    gated by `testing.Short()`; runs only when -short is omitted, so
    `go test -race -short ./...` keeps fast.
  - Schema-per-test isolation via testcontainers-go (mirrors the
    pattern in internal/repository/postgres/testutil_test.go: setup
    one container, fresh schema per subtest, search_path-pinned DSN).
  - Memory equivalence half re-verifies the same behaviors pinned in
    the pre-existing sliding_window_test.go but through the interface
    — catches drift if SlidingWindowLimiter.Allow ever changes shape.

internal/integration/ratelimit_multi_replica_test.go (NEW, 159 LOC):
  - The falsifiable ARCH-M1 closure proof, gated by //go:build
    integration matching the rest of internal/integration/.
  - Scenario: 1 postgres container shared across N=3 independent
    *PostgresSlidingWindowLimiter instances (each replica's process
    has its own *sql.DB pool to the same database, just like a real
    HA deployment). 100 concurrent Allow("test-key") calls round-
    robin across the 3 limiters via sync.WaitGroup. Cap = 10,
    window = 1m, shared now-timestamp so the scenario is
    deterministic.
  - Assert: exactly 10 succeed + 90 return ErrRateLimited. If the
    cross-replica row lock weren't arbitrating, each replica would
    independently let through ~3-4 requests (10/3), giving 12-15
    successes. The hard-pass on exactly-10 is what makes ARCH-M1
    substantive.

What did NOT change
===================
- internal/ratelimit/sliding_window.go (the memory backend) is
  byte-identical to its pre-Sprint-13.2 state. Same Mutex, same
  Allow signature, same Len/Disabled/pruneOlderThan/evictOldestLocked.
  Compile-time check in limiter.go pins that the memory backend
  still satisfies the new interface.
- No call site in cmd/server, internal/api/handler, internal/service
  changed. Sprint 13.3 owns the 5-site migration + the
  CERTCTL_RATELIMIT_BACKEND env-var selector.
- No new operator dependency. Postgres is already required for
  certctl-server to boot. Redis (Option M1-B) was declined by the
  operator and is not introduced here.

Verification
============

  $ ls migrations/000046_rate_limit_buckets.up.sql migrations/000046_rate_limit_buckets.down.sql
  $ ls internal/ratelimit/limiter.go internal/ratelimit/postgres_sliding_window.go

  $ grep -nE 'sync\.Mutex|sync\.RWMutex' internal/ratelimit/sliding_window.go
    30:// by sync.Mutex; per-key slices mutated only while the mutex is
    56:	mu       sync.Mutex
    (memory backend untouched)

  $ gofmt -l internal/ratelimit/ internal/integration/  → clean
  $ go vet ./internal/ratelimit/...                      → clean
  $ go vet -tags=integration ./internal/integration/...  → clean
  $ staticcheck ./internal/ratelimit/...                 → clean
  $ go build ./...                                       → clean
  $ go build -tags=integration ./internal/integration/...→ clean

  $ go test -race -short -count=1 ./internal/ratelimit/...
    ok  github.com/certctl-io/certctl/internal/ratelimit  1.028s
    (memory equivalence + sliding_window_test.go both pass; postgres
    equivalence skipped under -short as designed)

  $ go doc ./internal/ratelimit/
    type Limiter interface{ ... }
    type PostgresSlidingWindowLimiter struct{ ... }
        func NewPostgresSlidingWindowLimiter(db *sql.DB, maxN int,
            window time.Duration) *PostgresSlidingWindowLimiter
    type SlidingWindowLimiter struct{ ... }
        func NewSlidingWindowLimiter(maxN int, window time.Duration,
            mapCap int) *SlidingWindowLimiter
    var ErrRateLimited = ...
    (public surface matches the Sprint 13.2 prompt's required diff)

Sandbox note: the multi-replica integration test + the postgres
equivalence half run under testcontainers-go which requires docker-
in-docker. The CI integration job exercises both; local CI-equivalent
verification was build + vet + staticcheck + memory equivalence (the
sandbox /sessions partition is full so spinning a postgres container
locally isn't viable in this session). The Sprint 13.3 commit will
re-verify against the live integration job.

Next: Sprint 13.3 wires every call site through
ratelimit.NewLimiter(cfg.Server.RateLimitBackend, db, ...) +
introduces the scheduler janitor loop + rewrites the
docs/operator/observability.md "per-process" paragraph to describe
the configurable backend.

Refs: ARCH-M1 (HA / scale — rate limits per-process), Phase 13
Sprint 13.2.
2026-05-14 11:30:44 +00:00

205 lines
7.3 KiB
Go

// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package ratelimit
import (
"context"
"database/sql"
"errors"
"fmt"
"time"
"github.com/lib/pq"
)
// Phase 13 Sprint 13.2 closure (2026-05-14, architecture diligence audit
// ARCH-M1): the cross-replica-consistent rate-limit backend. Same
// algorithm as SlidingWindowLimiter (prune-on-Allow sliding-window log)
// but the state lives in postgres so N replicas see the same per-key
// bucket. Replaces the per-process in-memory limit when the operator
// sets CERTCTL_RATELIMIT_BACKEND=postgres (wired in Sprint 13.3).
//
// Algorithm
// =========
// Each Allow call runs a single BEGIN/COMMIT transaction:
//
// 1. INSERT ... ON CONFLICT (bucket_key) DO NOTHING — ensure the
// row exists so the SELECT FOR UPDATE below has something to lock.
// 2. SELECT timestamps FROM rate_limit_buckets WHERE bucket_key=$1
// FOR UPDATE — acquire the per-key row lock for the rest of the
// transaction.
// 3. Prune timestamps older than (now - window) in Go (reusing the
// unexported pruneOlderThan helper shared with SlidingWindowLimiter
// — single source of truth for the prune semantics).
// 4. If cardinality(pruned) >= maxN: persist the pruned state without
// appending, COMMIT, return ErrRateLimited.
// 5. Else: append `now`, persist, COMMIT, return nil.
//
// SELECT FOR UPDATE serializes Allow calls for the same key across
// replicas: replicas A and B firing simultaneous Allow("k") never
// race because Postgres' row-lock arbitrates. This is the entire
// reason for the close — the memory backend's sync.Mutex only
// arbitrates within a process; pg's row lock arbitrates the cluster.
//
// Why a transaction (not a single CTE)
// ====================================
// A "compute everything in one SQL statement" approach using
// INSERT ... ON CONFLICT DO UPDATE SET timestamps = CASE WHEN ... is
// possible but the conditional logic to gate the append on the
// pruned-cardinality requires nested CTEs whose check-then-act
// semantics are hard to read + harder to convince yourself are
// race-free across all isolation levels. The explicit transaction
// version above is correct under READ COMMITTED (Postgres' default),
// matches the memory backend's read-decide-write shape line-for-line,
// and shares the same prune helper. Two extra round-trips per Allow
// vs one is acceptable for the rate-limit hot path — the operation
// is gated anyway.
//
// Sprint 13.3 will wire the scheduler janitor loop that GCs rows
// whose updated_at is older than the longest configured window; the
// migration ships the supporting btree index on updated_at.
// PostgresSlidingWindowLimiter implements Limiter against the
// rate_limit_buckets table introduced in migration 000046.
//
// Constructed via NewPostgresSlidingWindowLimiter. The zero value is
// NOT usable — the db handle is required.
//
// Concurrency: safe for concurrent Allow calls across goroutines AND
// across N replicas (the underlying SELECT FOR UPDATE serializes
// per-key access across the cluster).
type PostgresSlidingWindowLimiter struct {
db *sql.DB
maxN int
window time.Duration
disabled bool // maxN <= 0 → all Allow calls return nil
}
// NewPostgresSlidingWindowLimiter returns a limiter with the given
// per-key cap + window. maxN <= 0 disables the limiter (all Allow
// calls return nil); matches the memory backend's opt-out semantics
// for test harnesses + sketchpad deploys.
//
// Window defaults to 24h when zero, mirroring SlidingWindowLimiter.
//
// The db argument is required + must outlive the limiter. Construction
// itself does NOT touch the database — DDL is owned by migration
// 000046_rate_limit_buckets.up.sql which runs at boot via
// cmd/server's RunMigrations path.
func NewPostgresSlidingWindowLimiter(db *sql.DB, maxN int, window time.Duration) *PostgresSlidingWindowLimiter {
if window <= 0 {
window = 24 * time.Hour
}
disabled := maxN <= 0
return &PostgresSlidingWindowLimiter{
db: db,
maxN: maxN,
window: window,
disabled: disabled,
}
}
// Allow records a request at the given (key, now) and returns
// ErrRateLimited if the configured cap is exceeded inside the
// configured window. Matches SlidingWindowLimiter.Allow byte-for-byte
// in caller-visible semantics so Sprint 13.3's backend-selector swap
// is signature-clean.
//
// The `now` argument is the timestamp the call is "happening at".
// Used as the prune cutoff (entries older than now-window are dropped)
// and as the new appended entry. Tests pass synthetic `now` values
// to exercise window-expiry deterministically; production call sites
// pass time.Now() (matching how SlidingWindowLimiter is invoked
// today — see internal/api/handler/{est,export,certificates,
// auth_breakglass}.go).
//
// Empty `key` short-circuits to nil (matches the memory backend's
// chokepoint-avoidance contract).
func (l *PostgresSlidingWindowLimiter) Allow(key string, now time.Time) error {
if l.disabled {
return nil
}
if key == "" {
return nil
}
ctx := context.Background()
tx, err := l.db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelReadCommitted})
if err != nil {
return fmt.Errorf("ratelimit: begin tx: %w", err)
}
defer func() {
// Rollback is a no-op once the tx is committed; safe to defer
// unconditionally for the error paths.
_ = tx.Rollback()
}()
// Step 1: ensure the row exists so SELECT FOR UPDATE has something
// to lock. ON CONFLICT DO NOTHING is a no-op when the row already
// exists.
if _, err := tx.ExecContext(ctx, `
INSERT INTO rate_limit_buckets (bucket_key, timestamps, updated_at)
VALUES ($1, '{}', $2)
ON CONFLICT (bucket_key) DO NOTHING
`, key, now); err != nil {
return fmt.Errorf("ratelimit: ensure row: %w", err)
}
// Step 2: lock the row + read current state.
var existing pq.GenericArray
var ts []time.Time
existing.A = &ts
if err := tx.QueryRowContext(ctx, `
SELECT COALESCE(timestamps, '{}'::timestamptz[])
FROM rate_limit_buckets
WHERE bucket_key = $1
FOR UPDATE
`, key).Scan(&existing); err != nil {
// Shouldn't happen — step 1 ensured the row exists. Treat
// the sql.ErrNoRows path as a no-op (be conservative; never
// over-limit on transient DB weirdness).
if errors.Is(err, sql.ErrNoRows) {
return nil
}
return fmt.Errorf("ratelimit: select-for-update: %w", err)
}
// Step 3: prune in Go via the shared helper. Same prune semantics
// as SlidingWindowLimiter — single source of truth.
cutoff := now.Add(-l.window)
pruned := pruneOlderThan(ts, cutoff)
// Step 4: decide.
rateLimited := len(pruned) >= l.maxN
if !rateLimited {
pruned = append(pruned, now)
}
// Step 5: persist.
if _, err := tx.ExecContext(ctx, `
UPDATE rate_limit_buckets
SET timestamps = $2, updated_at = $3
WHERE bucket_key = $1
`, key, pq.Array(pruned), now); err != nil {
return fmt.Errorf("ratelimit: update: %w", err)
}
if err := tx.Commit(); err != nil {
return fmt.Errorf("ratelimit: commit: %w", err)
}
if rateLimited {
return ErrRateLimited
}
return nil
}
// Disabled reports whether the limiter is in opt-out mode (maxN <= 0).
// Mirrors SlidingWindowLimiter.Disabled() so handler-side gating +
// admin-endpoint observability can ask the same question of either
// backend.
func (l *PostgresSlidingWindowLimiter) Disabled() bool {
return l.disabled
}