mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:51:36 +00:00
fix(ratelimit): Hotfix #5 — Postgres timestamptz[] scan + skip-inventory drift
Two CI hotfixes surfaced by master CI on 29cb13e7 (Sprint 13.6 tip
before the Sprint 13.7 closure landed):
1. TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas failed with
"pq: scanning to time.Time is not implemented; only sql.Scanner".
Root cause: time.Time does not implement sql.Scanner, and lib/pq's
pq.GenericArray scan path calls element-Scan() directly rather than
database/sql's convertAssign (which DOES support time conversions).
So `pq.Array(&[]time.Time{})` reliably fails on read even though
the symmetric write `pq.Array([]time.Time{...})` works (the write
path uses driver.Value() which time.Time implements).
Fix: cast the timestamptz[] to a text[] of canonical ISO 8601 UTC
strings at the SQL boundary via to_char(t AT TIME ZONE 'UTC',
'YYYY-MM-DD"T"HH24:MI:SS.US"Z"'), read via pq.StringArray (well-
supported), and parse Go-side with layout "2006-01-02T15:04:05.000000Z".
The format is fully deterministic regardless of the session's
DateStyle or TimeZone settings.
Touched: internal/ratelimit/postgres_sliding_window.go (Step 2 of
the Allow() transaction — locking + read).
Falsifiable proof on CI: the failing test
TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas
(100 concurrent Allow calls / 3 replicas / cap=10) must now produce
exactly 10 succeed / 90 ErrRateLimited. Pre-fix it produced 1 / 0
because every Allow after the first crashed on Scan.
2. skip-inventory-drift.sh CI guard turned red because Sprint 13.2
added two new t.Skip sites:
internal/ratelimit/equivalence_test.go:80
t.Skip("race-style test under -short")
internal/ratelimit/equivalence_test.go:88
t.Skip("postgres equivalence tests require testcontainers;
skipped under -short")
The inventory at docs/testing/skip-inventory.md is auto-generated
by scripts/skip-inventory.sh and must be re-generated alongside
any t.Skip churn. Sprint 13.2 missed the regeneration.
Fix: re-ran scripts/skip-inventory.sh. Totals walked
142 → 144 sites; testing.Short() guards 76 → 78. The two new
entries land in the internal/ratelimit section.
Verification (local sandbox, all clean):
$ bash scripts/ci-guards/skip-inventory-drift.sh
skip-inventory-drift guard OK: docs/testing/skip-inventory.md
matches the live tree
$ bash scripts/ci-guards/openapi-handler-parity.sh
openapi-handler-parity: clean.
$ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
openapi-rest-deferred-monotonic: clean — rest-deferred = 0,
baseline = 0.
$ gofmt -l internal/ratelimit/postgres_sliding_window.go
(no output)
$ go vet ./internal/ratelimit/
(no output)
The Postgres rate-limit fix's full falsifiable proof
(TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas) cannot be
exercised in the sandbox (no docker for testcontainers); CI on the
amd64 runner will re-run it on this push. The diagnosis is verified
against lib/pq source semantics and the fix uses only well-supported
primitives (pq.StringArray + canonical to_char output + time.Parse).
This commit is contained in:
@@ -146,16 +146,32 @@ func (l *PostgresSlidingWindowLimiter) Allow(key string, now time.Time) error {
|
||||
return fmt.Errorf("ratelimit: ensure row: %w", err)
|
||||
}
|
||||
|
||||
// Step 2: lock the row + read current state.
|
||||
var existing pq.GenericArray
|
||||
var ts []time.Time
|
||||
existing.A = &ts
|
||||
// Step 2: lock the row + read current state. lib/pq cannot scan a
|
||||
// TIMESTAMPTZ[] column back into []time.Time directly: time.Time
|
||||
// does not implement sql.Scanner, and pq.GenericArray's per-element
|
||||
// scan path calls Scan() (not database/sql's convertAssign), so the
|
||||
// inner Scan fails with
|
||||
// "pq: scanning to time.Time is not implemented; only sql.Scanner".
|
||||
// Workaround: ask Postgres to format each timestamp as a canonical
|
||||
// ISO 8601 UTC string via to_char(... AT TIME ZONE 'UTC', ...), read
|
||||
// the column as text[] via pq.StringArray (well-supported), and
|
||||
// parse Go-side. The to_char format is fully deterministic (6-digit
|
||||
// microseconds, "T" separator, "Z" suffix) regardless of the
|
||||
// session's DateStyle / TimeZone settings.
|
||||
const pgTimestampLayout = "2006-01-02T15:04:05.000000Z"
|
||||
var tsStrings pq.StringArray
|
||||
if err := tx.QueryRowContext(ctx, `
|
||||
SELECT COALESCE(timestamps, '{}'::timestamptz[])
|
||||
SELECT COALESCE(
|
||||
ARRAY(
|
||||
SELECT to_char(t AT TIME ZONE 'UTC', 'YYYY-MM-DD"T"HH24:MI:SS.US"Z"')
|
||||
FROM unnest(timestamps) AS t
|
||||
),
|
||||
ARRAY[]::text[]
|
||||
)
|
||||
FROM rate_limit_buckets
|
||||
WHERE bucket_key = $1
|
||||
FOR UPDATE
|
||||
`, key).Scan(&existing); err != nil {
|
||||
`, key).Scan(&tsStrings); err != nil {
|
||||
// Shouldn't happen — step 1 ensured the row exists. Treat
|
||||
// the sql.ErrNoRows path as a no-op (be conservative; never
|
||||
// over-limit on transient DB weirdness).
|
||||
@@ -164,6 +180,14 @@ func (l *PostgresSlidingWindowLimiter) Allow(key string, now time.Time) error {
|
||||
}
|
||||
return fmt.Errorf("ratelimit: select-for-update: %w", err)
|
||||
}
|
||||
ts := make([]time.Time, 0, len(tsStrings))
|
||||
for _, s := range tsStrings {
|
||||
parsed, err := time.Parse(pgTimestampLayout, s)
|
||||
if err != nil {
|
||||
return fmt.Errorf("ratelimit: parse stored timestamp %q: %w", s, err)
|
||||
}
|
||||
ts = append(ts, parsed.UTC())
|
||||
}
|
||||
|
||||
// Step 3: prune in Go via the shared helper. Same prune semantics
|
||||
// as SlidingWindowLimiter — single source of truth.
|
||||
|
||||
Reference in New Issue
Block a user