Files
certctl/internal/auth/session/bench_test.go
T
shankar0123 ba0959ddc7 feat(auth/sessions): list-all gate + revoke-all-except-current (MED-1/2/3)
Audit 2026-05-10 Fix 13 Phase A — close MED-1, MED-2, MED-3.

MED-1 (verification only): Fix 01's CRIT-1 router-gate sweep already
wraps every read endpoint with rbacGate(reg.Checker, '<resource>.read',
...). Verified post-sweep that GET /api/v1/certificates, /profiles,
/issuers, /targets, /agents, /audit all carry the corresponding
*.read permission gate.

MED-2: ListSessions now gates ?actor_id=<other> on auth.session.list.all
via the new permissionChecker projection installed by
WithPermissionChecker. cmd/server/main.go threads the existing
authCheckerAdapter into the handler. When caller's actor_id !=
caller.ActorID AND the handler has a checker, an inline
CheckPermission(..., 'auth.session.list.all', 'global', nil) call
fires; on false → 403 with explanatory message; on repository error
→ 500. Defense-in-depth: the router-level rbacGate enforces
auth.session.list as the floor; the .list.all re-check is the
privilege-elevation guard for cross-actor queries that the rbacGate
can't express (it can't see the query parameter).

MED-3: ship DELETE /api/v1/auth/sessions?except=current — the
'sign out all other sessions' flow. Gated by auth.session.revoke;
the handler reads the caller's current session ID from
session.SessionFromContext(ctx) (cookie-mode); empty for Bearer-mode
callers (in which case ALL the actor's sessions revoke, matching
'log me out everywhere' semantic for API-key users).

New repository method SessionRepository.RevokeAllExceptForActor:
  UPDATE sessions SET revoked_at = NOW()
   WHERE actor_id =  AND actor_type =  AND tenant_id =
     AND revoked_at IS NULL
     AND id !=
returning rowcount. Added to the interface in internal/repository/session.go,
wired into postgres impl, and added to all SessionRepo test stubs
(handler stubSessionRepo, service-test stubSessionRepo, benchmark
slowSessionRepo). The session.SessionRepo internal interface also
gains the method so the bench_test.go forwarder compiles.

Audit row records the count for compliance evidence (one summary row
per invocation per the existing audit policy).

OpenAPI parity exception added for the new route — the
unbounded-DELETE-with-query-flag shape doesn't fit standard REST CRUD
operations cleanly; matches the documented-inline pattern set by the
streaming audit-export endpoint.

GUI button (SessionsPage 'Sign out all other sessions') deferred to
Phase D.

Refs: cowork/auth-bundles-audit-2026-05-10.md MED-1, MED-2, MED-3
Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase A
2026-05-10 21:49:35 +00:00

263 lines
9.6 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
package session
import (
"context"
"sort"
"testing"
"time"
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
)
// =============================================================================
// Bundle 2 Phase 14 — session validation benchmarks.
//
// Two paths matter:
//
// BenchmarkSession_SteadyState (target: p99 < 1ms)
// Warm process, signing key already loaded into the in-memory key
// repo, session row already in the in-memory session repo. Measures
// the cost of: parseCookie + signing-key lookup + HMAC-verify +
// session-row lookup + idle/absolute/revoke checks. No network
// round-trips.
//
// BenchmarkSession_ColdProcess (target: p99 < 10ms)
// "First request after server boot" — the underlying repo paths
// are slower because a real Postgres connection is doing index +
// row work the OS has not yet faulted into memory. The benchmark
// simulates this via a configurable per-call repo delay so the
// measurement is bounded above the steady-state path by a known
// amount; the absolute number depends on the operator's Postgres
// setup. The 10ms target accommodates a single round-trip to a
// Postgres on the same host (typical: 1-3ms) plus query-plan-not-
// yet-cached overhead (typical: 1-2ms) plus the Go HMAC verify
// cost (typical: 10-50µs).
//
// The percentile reporting:
// We capture a per-iteration timing into a slice, sort, and report
// p50 / p95 / p99 / max via b.ReportMetric. Go's testing.B does NOT
// surface percentiles natively; the metric labels are explicit so
// the recorded result is unambiguous about which statistic was
// measured.
//
// Run via:
// go test -bench BenchmarkSession_ -benchmem -run='^$' \
// ./internal/auth/session/
//
// The full Phase 14 result table lives at docs/operator/auth-benchmarks.md.
// =============================================================================
// benchSessionConfig caps b.N to keep the benchmark tractable; for
// p99 we want at least ~1000 samples but not so many that the
// benchmark takes >10s on a CI runner. Go's default benchmark scaling
// already handles this.
const (
benchSessionMinSamples = 1000
)
// setupBenchSession boots a session.Service with a warm in-memory
// repo + a single active signing key, mints one session row, and
// returns the service + the cookie value the benchmark calls
// Validate against.
//
// The slowSessionRepo and slowKeyRepo wrappers add a configurable
// delay per call; steady-state uses zero delay, cold-process uses a
// non-zero delay simulating a Postgres round-trip.
func setupBenchSession(b *testing.B, sessionRepoDelay, keyRepoDelay time.Duration) (svc *Service, cookieValue string) {
b.Helper()
keys := newStubKeyRepo()
plaintext := make([]byte, 32)
for i := range plaintext {
plaintext[i] = byte(i)
}
if err := keys.Add(context.Background(), &sessiondomain.SessionSigningKey{
ID: "sk-bench-1",
TenantID: "t-default",
KeyMaterialEncrypted: plaintext,
CreatedAt: time.Now().UTC(),
}); err != nil {
b.Fatalf("keys.Add: %v", err)
}
sessions := newStubSessionRepo()
cfg := DefaultConfig()
var keyRepo SigningKeyRepo = keys
var sessionRepo SessionRepo = sessions
if keyRepoDelay > 0 {
keyRepo = &slowKeyRepo{inner: keys, delay: keyRepoDelay}
}
if sessionRepoDelay > 0 {
sessionRepo = &slowSessionRepo{inner: sessions, delay: sessionRepoDelay}
}
svc = NewService(sessionRepo, keyRepo, nil, "t-default", cfg, "")
res, err := svc.Create(context.Background(), "actor-bench", "User", "10.0.0.1", "bench/1.0")
if err != nil {
b.Fatalf("svc.Create: %v", err)
}
return svc, res.CookieValue
}
// slowSessionRepo wraps a SessionRepo with a per-call delay.
type slowSessionRepo struct {
inner SessionRepo
delay time.Duration
}
func (r *slowSessionRepo) Create(ctx context.Context, s *sessiondomain.Session) error {
time.Sleep(r.delay)
return r.inner.Create(ctx, s)
}
func (r *slowSessionRepo) Get(ctx context.Context, id string) (*sessiondomain.Session, error) {
time.Sleep(r.delay)
return r.inner.Get(ctx, id)
}
func (r *slowSessionRepo) ListByActor(ctx context.Context, actorID, actorType, tenantID string) ([]*sessiondomain.Session, error) {
time.Sleep(r.delay)
return r.inner.ListByActor(ctx, actorID, actorType, tenantID)
}
func (r *slowSessionRepo) UpdateLastSeen(ctx context.Context, id string) error {
time.Sleep(r.delay)
return r.inner.UpdateLastSeen(ctx, id)
}
func (r *slowSessionRepo) UpdateCSRFTokenHash(ctx context.Context, id, hash string) error {
time.Sleep(r.delay)
return r.inner.UpdateCSRFTokenHash(ctx, id, hash)
}
func (r *slowSessionRepo) Revoke(ctx context.Context, id string) error {
time.Sleep(r.delay)
return r.inner.Revoke(ctx, id)
}
func (r *slowSessionRepo) RevokeAllForActor(ctx context.Context, actorID, actorType, exceptID string) error {
time.Sleep(r.delay)
return r.inner.RevokeAllForActor(ctx, actorID, actorType, exceptID)
}
func (r *slowSessionRepo) RevokeAllExceptForActor(ctx context.Context, actorID, actorType, tenantID, exceptID string) (int, error) {
time.Sleep(r.delay)
return r.inner.RevokeAllExceptForActor(ctx, actorID, actorType, tenantID, exceptID)
}
func (r *slowSessionRepo) GarbageCollectExpired(ctx context.Context) (int, error) {
time.Sleep(r.delay)
return r.inner.GarbageCollectExpired(ctx)
}
// slowKeyRepo wraps a SigningKeyRepo with a per-call delay.
type slowKeyRepo struct {
inner SigningKeyRepo
delay time.Duration
}
func (r *slowKeyRepo) GetActive(ctx context.Context, tenantID string) (*sessiondomain.SessionSigningKey, error) {
time.Sleep(r.delay)
return r.inner.GetActive(ctx, tenantID)
}
func (r *slowKeyRepo) Get(ctx context.Context, id string) (*sessiondomain.SessionSigningKey, error) {
time.Sleep(r.delay)
return r.inner.Get(ctx, id)
}
func (r *slowKeyRepo) Add(ctx context.Context, k *sessiondomain.SessionSigningKey) error {
time.Sleep(r.delay)
return r.inner.Add(ctx, k)
}
func (r *slowKeyRepo) Retire(ctx context.Context, id string) error {
time.Sleep(r.delay)
return r.inner.Retire(ctx, id)
}
func (r *slowKeyRepo) List(ctx context.Context, tenantID string) ([]*sessiondomain.SessionSigningKey, error) {
time.Sleep(r.delay)
return r.inner.List(ctx, tenantID)
}
func (r *slowKeyRepo) Delete(ctx context.Context, id string) error {
time.Sleep(r.delay)
return r.inner.Delete(ctx, id)
}
// reportPercentiles sorts the samples and reports p50/p95/p99/max via
// b.ReportMetric in microseconds. Go's testing.B reports ns/op as the
// default; we add explicit percentile labels so the operator-facing
// table at auth-benchmarks.md can copy them verbatim.
func reportPercentiles(b *testing.B, samples []time.Duration) {
b.Helper()
if len(samples) == 0 {
return
}
sort.Slice(samples, func(i, j int) bool { return samples[i] < samples[j] })
p := func(pct float64) time.Duration {
idx := int(float64(len(samples)) * pct / 100.0)
if idx >= len(samples) {
idx = len(samples) - 1
}
return samples[idx]
}
b.ReportMetric(float64(p(50).Microseconds()), "p50_us/op")
b.ReportMetric(float64(p(95).Microseconds()), "p95_us/op")
b.ReportMetric(float64(p(99).Microseconds()), "p99_us/op")
b.ReportMetric(float64(samples[len(samples)-1].Microseconds()), "max_us/op")
}
// BenchmarkSession_SteadyState measures Validate cost when the
// underlying repos are in-memory + warm. Pure CPU: parseCookie +
// HMAC-verify + map lookups + sentinel checks.
//
// Phase 14 target: p99 < 1ms.
func BenchmarkSession_SteadyState(b *testing.B) {
svc, cookieValue := setupBenchSession(b, 0, 0)
in := ValidateInput{CookieValue: cookieValue, ClientIP: "10.0.0.1", UserAgent: "bench/1.0"}
ctx := context.Background()
samples := make([]time.Duration, 0, b.N)
b.ResetTimer()
for i := 0; i < b.N; i++ {
start := time.Now()
if _, err := svc.Validate(ctx, in); err != nil {
b.Fatalf("Validate: %v", err)
}
samples = append(samples, time.Since(start))
}
b.StopTimer()
reportPercentiles(b, samples)
}
// BenchmarkSession_ColdProcess simulates the Postgres-cold path where
// the signing-key repo + session-row repo each take ~2ms to respond
// (a typical local-network Postgres round-trip with the query plan
// not yet cached). This is a worst-case CI-runner approximation; real
// production numbers depend on the operator's Postgres setup +
// connection-pool warmup state.
//
// Phase 14 target: p99 < 10ms.
//
// Why not testcontainers Postgres directly: testcontainers adds 30+
// seconds of container boot to the benchmark, which is incompatible
// with `go test -bench` per-iteration timing. The simulated-delay
// approach captures the same upper bound (parseCookie + HMAC + 2 RTTs
// + decision logic) and produces a stable, CI-runnable number.
func BenchmarkSession_ColdProcess(b *testing.B) {
// 1ms × 2 RTTs (signing-key fetch + session-row fetch) = 2ms
// minimum. Go's time.Sleep granularity on most platforms adds
// ~1-2ms of jitter; combined with parseCookie + HMAC + decision
// logic, the p99 lands ~6-8ms in practice — comfortably under
// the 10ms target. A real testcontainers-Postgres path would
// produce different numbers depending on the docker-network
// layout; documented in docs/operator/auth-benchmarks.md.
const simulatedPostgresRTT = 1 * time.Millisecond
svc, cookieValue := setupBenchSession(b, simulatedPostgresRTT, simulatedPostgresRTT)
in := ValidateInput{CookieValue: cookieValue, ClientIP: "10.0.0.1", UserAgent: "bench/1.0"}
ctx := context.Background()
samples := make([]time.Duration, 0, b.N)
b.ResetTimer()
for i := 0; i < b.N; i++ {
start := time.Now()
if _, err := svc.Validate(ctx, in); err != nil {
b.Fatalf("Validate: %v", err)
}
samples = append(samples, time.Since(start))
}
b.StopTimer()
reportPercentiles(b, samples)
}