Commit Graph

3 Commits

Author SHA1 Message Date
shankar0123 ba0959ddc7 feat(auth/sessions): list-all gate + revoke-all-except-current (MED-1/2/3)
Audit 2026-05-10 Fix 13 Phase A — close MED-1, MED-2, MED-3.

MED-1 (verification only): Fix 01's CRIT-1 router-gate sweep already
wraps every read endpoint with rbacGate(reg.Checker, '<resource>.read',
...). Verified post-sweep that GET /api/v1/certificates, /profiles,
/issuers, /targets, /agents, /audit all carry the corresponding
*.read permission gate.

MED-2: ListSessions now gates ?actor_id=<other> on auth.session.list.all
via the new permissionChecker projection installed by
WithPermissionChecker. cmd/server/main.go threads the existing
authCheckerAdapter into the handler. When caller's actor_id !=
caller.ActorID AND the handler has a checker, an inline
CheckPermission(..., 'auth.session.list.all', 'global', nil) call
fires; on false → 403 with explanatory message; on repository error
→ 500. Defense-in-depth: the router-level rbacGate enforces
auth.session.list as the floor; the .list.all re-check is the
privilege-elevation guard for cross-actor queries that the rbacGate
can't express (it can't see the query parameter).

MED-3: ship DELETE /api/v1/auth/sessions?except=current — the
'sign out all other sessions' flow. Gated by auth.session.revoke;
the handler reads the caller's current session ID from
session.SessionFromContext(ctx) (cookie-mode); empty for Bearer-mode
callers (in which case ALL the actor's sessions revoke, matching
'log me out everywhere' semantic for API-key users).

New repository method SessionRepository.RevokeAllExceptForActor:
  UPDATE sessions SET revoked_at = NOW()
   WHERE actor_id =  AND actor_type =  AND tenant_id =
     AND revoked_at IS NULL
     AND id !=
returning rowcount. Added to the interface in internal/repository/session.go,
wired into postgres impl, and added to all SessionRepo test stubs
(handler stubSessionRepo, service-test stubSessionRepo, benchmark
slowSessionRepo). The session.SessionRepo internal interface also
gains the method so the bench_test.go forwarder compiles.

Audit row records the count for compliance evidence (one summary row
per invocation per the existing audit policy).

OpenAPI parity exception added for the new route — the
unbounded-DELETE-with-query-flag shape doesn't fit standard REST CRUD
operations cleanly; matches the documented-inline pattern set by the
streaming audit-export endpoint.

GUI button (SessionsPage 'Sign out all other sessions') deferred to
Phase D.

Refs: cowork/auth-bundles-audit-2026-05-10.md MED-1, MED-2, MED-3
Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase A
2026-05-10 21:49:35 +00:00
shankar0123 1697845493 fix(auth): wire RevokeAllForActor + RotateCSRFToken to mutation paths
Closes HIGH-1 + HIGH-2 of the 2026-05-10 audit.

HIGH-1: breakglass.Service.SetPassword and RemoveCredential now call
sessions.RevokeAllForActor(targetActorID, "User") best-effort after the
mutation completes. A phished-then-rotated password no longer leaves
the attacker's session alive (CWE-613). Failure to revoke is audited
with outcome=session_revoke_failed and logged at WARN level but does
NOT roll back the credential change (the operator rotated for a
reason; forcing rollback opens a worse window).

- breakglass.SessionMinter interface extended with RevokeAllForActor.
- cmd/server/main.go::breakglassSessionMinterAdapter gains the bridge
  to session.Service.RevokeAllForActor.
- stubSessions in service_test.go tracks revokeAllIDs / revokeAllTypes
  / revokeAllErr.
- Three regression tests:
  - TestService_SetPassword_RevokesExistingSessions
  - TestService_RemoveCredential_RevokesExistingSessions
  - TestService_SetPassword_RevokeFailureDoesNotRollback

HIGH-2: New session.Service.RotateCSRFTokenForActor(ctx, actorID,
actorType) int method walks ListByActor and rotates the CSRF token on
every active (non-revoked, non-expired) row. Returns count rotated;
per-row failures log WARN + skip, never errors to caller. New
handler.CSRFRotator interface + AuthHandler.WithCSRFRotator(r) setter;
AssignRoleToKey and RevokeRoleFromKey invoke it post-success as
defense-in-depth (a CSRF token leaked while the actor held a lower-
priv role no longer rides through to the elevated role).

- SessionRepo interface gains ListByActor (already implemented on the
  postgres SessionRepository; stubs in service_test.go + bench_test.go
  updated to match).
- cmd/server/main.go calls .WithCSRFRotator(sessionService) on the
  AuthHandler.
- Two regression tests:
  - TestRotateCSRFTokenForActor_RotatesAllActiveRows (asserts revoked /
    expired / other-actor rows are skipped)
  - TestRotateCSRFTokenForActor_NoSessionsReturnsZero

Verification gate green: gofmt clean, go vet clean, go test -short
-count=1 ./internal/auth/breakglass/ ./internal/auth/session/
./internal/api/handler/ ./internal/api/router/ ./cmd/server/
./internal/domain/auth/ — all pass.

CRIT-1..CRIT-5 + HIGH-1 + HIGH-2 of the 2026-05-10 audit now closed
on this branch. Spec at
cowork/auth-bundles-fixes-2026-05-10/06-high-1-2-revoke-and-rotate.md.

Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-1 HIGH-2
2026-05-10 20:43:45 +00:00
shankar0123 9b6294e83d auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets
Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four
benchmarks producing four numbers + the operator-doc table; three
default-tag benchmarks runnable on every CI runner, the fourth
(cold-cache OIDC) runnable on operator-side Docker hosts via the
new make target.

Files
=====

internal/auth/session/bench_test.go (NEW):
* BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs).
  Warm in-memory repo + warm session row. Pure CPU: parseCookie +
  HMAC verify + map lookup + sentinel checks.
* BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms).
  Same pipeline but with a configurable per-call delay simulating
  a 1ms Postgres RTT on each repo call. Two repo calls per
  Validate (signing-key fetch + session-row fetch) = 2ms minimum;
  Go time.Sleep granularity adds ~1-2ms jitter. Documented why
  testcontainers Postgres isn't viable inside b.N: 30+ second
  container boot incompatible with per-iteration timing.
* slowSessionRepo + slowKeyRepo wrappers add the per-call delay
  via time.Sleep; they delegate to the existing in-memory stubs.
* reportPercentiles helper sorts + reports p50/p95/p99/max via
  b.ReportMetric (Go testing.B doesn't surface percentiles
  natively).

internal/auth/oidc/bench_test.go (NEW):
* BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms).
  Drives full HandleCallback against an in-process mockIdP
  (httptest.Server localhost loopback). Pre-warmed JWKS cache via
  RefreshKeys at setup. Pipeline: pre-login consume + state
  compare + token exchange (localhost ~50-200µs) + go-oidc
  Verify (RSA-2048 sig verify + alg pin) + service-layer iss/
  aud/azp/at_hash/exp/iat/nonce re-checks + group-claim
  resolution + group→role mapping + user upsert + session mint.
* The localhost-loopback /token call adds ~100-500µs of TCP
  overhead vs pure crypto; the prompt's "no network calls"
  steady-state framing accommodates this since the localhost
  loopback is the closest practical proxy for a same-region
  IdP /token call (which adds 5-15ms in production).

internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration):
* BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs).
  Drives RefreshKeys against a live Keycloak container from the
  Phase 10 testfixtures harness. Each iteration evicts the
  in-process cache + re-fetches discovery + re-fetches JWKS over
  real HTTP + re-runs the IdP-downgrade-attack defense.
* Network-bounded: the cold path is dominated by HTTPS RTT to
  the IdP discovery endpoint, NOT crypto. The 200ms cap
  accommodates a geographically-distant IdP (~150ms RTT) plus
  the in-process JWKS fetch + downgrade-defense logic (~5ms
  locally).
* Reuses the sharedKeycloak fixture from
  integration_keycloak_test.go (Phase 10) so the benchmark
  doesn't pay the 60-90s container boot cost separately. Skips
  with a clear message if invoked without the integration test
  setup.
* Reports p50/p95/p99/max in MILLISECONDS (vs the
  microsecond-granularity steady-state benchmarks) since the
  cold path is two orders of magnitude slower.

internal/auth/oidc/service_test.go (MODIFIED):
* Refactored newMockIdP(t *testing.T) to delegate to a new
  newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern
  for sharing test fixtures between *testing.T and *testing.B.
  No behavior change for existing service_test.go tests; the
  benchmark file in bench_test.go calls newMockIdPWithTB(b)
  to get the same fixture.

docs/operator/auth-benchmarks.md (NEW):
* Result table with all four benchmarks + targets + measured
  numbers + status markers. Four-row matrix for the default-tag
  benchmarks; the fourth row (cold-cache) is operator-recorded
  with an empty cell waiting for the first Docker-equipped run.
* Hardware floor section pinning the 4 vCPU / 8 GiB RAM /
  Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners
  satisfy this; operators on weaker hardware re-record.
* "What each benchmark covers (and what it doesn't)" section
  per benchmark, distinguishing the warm steady-state pipeline
  from the cold path's network-bounded budget.
* "Cold-cache OIDC: how to run" subsection documenting the
  make target + the test+benchmark coupling needed to populate
  sharedKeycloak. Operator-recorded baseline table seeded
  empty for first runs.
* "Why the cold path is bounded by network latency, not crypto"
  section explaining the budget breakdown:
    - TCP handshake (1 RTT)
    - TLS 1.3 handshake (1-2 RTTs)
    - 2 HTTPS GETs (discovery + JWKS, 1 RTT each)
    - In-process crypto on the certctl side (~5-10ms total)
  So the 200ms cap is operator-checkable: real measurement >
  200ms means the IdP is slow OR network congestion OR DNS
  issues — the diagnosis is upstream of certctl. Real
  measurement < 200ms means the IdP is on a fast same-region
  link.
* Methodology section pinning the per-iteration timing capture
  + sort + percentile-extract approach.
* Pre-merge audit section for the Phase 14 exit gate: four
  benchmarks ran, four numbers recorded, steady-state targets
  met, cold path is operator-runnable + measurably-bounded.

Makefile (MODIFIED):
* Added `make benchmark-auth` (default-tag, runs three of four
  benchmarks at 2000 samples each).
* Added `make benchmark-auth-coldcache` (integration-tagged,
  runs OIDC cold-cache against live Keycloak; requires Docker).
* Both targets carry explanatory comment blocks.

docs/README.md (MODIFIED):
* Added the auth-benchmarks.md doc to the Operator nav table
  alongside performance-baselines.md.

Measured baselines at Phase 14 close (linux/arm64, 4 vCPU)
==========================================================

  BenchmarkSession_SteadyState     p99 = 5µs    (target < 1ms)   ✓ 200× under
  BenchmarkSession_ColdProcess     p99 = 7.1ms  (target < 10ms)  ✓
  BenchmarkOIDC_SteadyState        p99 = 1.5ms  (target < 5ms)   ✓ 3× under
  BenchmarkOIDC_ColdCache          operator-runs (Docker required)

Verification
============

* gofmt -l on three new bench files: clean.
* go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean
  (default tag).
* go vet -tags integration ./internal/auth/oidc/...: clean (integration
  tag covers the bench_keycloak_test.go file).
* go test -short -count=1 across all 5 OIDC + session packages:
  green; the bench_*_test.go files compile but don't run under
  -short (testing.Short() guards + benchmarks are not selected
  by -run pattern).
* All three runnable benchmarks executed and produce the numbers
  above; recorded in auth-benchmarks.md.
2026-05-10 16:51:28 +00:00