mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:21:40 +00:00
3c8153139895c2310da3efe8586ce2879ecde374
15 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
80cbd2db59 |
test(coverage): backfill 5 packages to clear v2.1.0 release-gate Phase 3 floors
Phase 3 of /Users/shankar/Desktop/cowork/v2.1.0-release-gate.md surfaced
four packages below their coverage floors. All four are regressions from
new code shipped in the audit-2026-05-10/11 fix bundles that didn't get
per-function tests:
internal/auth/breakglass 87.5% -> 93.3% (floor: 90%)
+ List (was 0%) — 3 tests (disabled, empty+populated, repo err)
+ RemoveCredential, Unlock disabled-branch tests
internal/auth/oidc 89.4% -> 95.4% (floor: 90%)
+ JWKSStatus (was 0%) — 2 tests (unknown provider, after AuthRequest)
+ TestDiscovery (was 0%) — 5 tests (discovery failure, happy path,
HS256 alg-downgrade detected, missing jwks_uri, JWKS 500 fetch)
internal/auth/session 89.9% -> 94.4% (floor: 90%)
+ SetTrustedProxies (was 0%) — round-trip + clear
+ ComputeCookieHMAC (was 0%) — determinism + key/inputs differ
+ DecryptKeyMaterial (was 0%) — round-trip + wrong-passphrase
internal/api/handler 73.2% -> 75.5% (floor: 75%)
+ 6 auth_breakglass handler funcs (were all 0%) — 14 tests
(disabled/404, invalid JSON, empty fields, service err, happy
path with cookies, admin endpoints, ListCredentials no
password_hash on the wire)
+ WithPermissionChecker setter test (was 0%, Bundle 2 MED-2)
+ NewAdminCRLCacheServiceImpl + CacheRows (were 0%) — 3 tests
+ itoaForRetryAfter + challengeURLBuilder ACME helpers (were 0%) —
4 tests
All five coverage gates green:
internal/service 72.7% (floor: 70%)
internal/api/handler 75.5% (floor: 75%)
internal/api/middleware 67.9% (floor: 30%)
internal/auth 93.3% (floor: 85%)
internal/service/auth 91.8% (floor: 85%)
internal/auth/oidc 95.4% (floor: 90%)
internal/auth/oidc/groupclaim 100.0% (floor: 95%)
internal/auth/oidc/domain 97.6% (floor: 90%)
internal/auth/session 94.4% (floor: 90%)
internal/auth/session/domain 98.3% (floor: 90%)
internal/auth/breakglass 93.3% (floor: 90%)
internal/auth/breakglass/domain 100.0% (floor: 90%)
internal/auth/user/domain 96.2% (floor: 90%)
(and 6 more — all green)
Per CLAUDE.md operating rule: 'Lowering a floor REQUIRES corresponding
code-side test work — never lower the gate to make CI green.' The
floors stay at their committed values; the new tests close the gap.
|
||
|
|
8aeeec93c0 |
chore(lint): close 5 golangci-lint v2 findings surfaced by v2.1.0 release-gate Phase 1.3
Five golangci-lint v2 findings surfaced when running the v2.1.0 release gate (auth-bundle-2 → master pre-flight). Each is mechanical: 1. govet/printf-style misuse — internal/auth/oidc/service_test.go used integer literal 501 in http.Error; switched to http.StatusNotImplemented. 2. staticcheck SA1019 — internal/auth/breakglass/reflect_helper_test.go referenced reflect.Ptr; the canonical name since Go 1.18 is reflect.Pointer. 3. staticcheck ST1020 — internal/repository/postgres/auth.go ActorRoleRepository.Revoke had a doc comment that did not begin with the method name. Prepended 'Revoke drops actor_roles rows.' to the comment so it now starts with the method name. 4. staticcheck ST1022 — internal/api/handler/auth_session_oidc.go DefaultBCLVerifierMaxAge docstring was attached to the DefaultBCLVerifier type docstring. Moved the const docstring directly above the const declaration, separated by a blank line. 5. unused — internal/auth/session/bench_test.go declared benchSessionMinSamples and never referenced it; the bench loop relies on Go's default b.N scaling. Replaced the const block with a comment describing the rationale. Lint clean (golangci-lint v2.12.2 with the .golangci.yml config) on the five edited packages. |
||
|
|
874419989d |
harden(auth/cookies): __Host- prefix on all three auth cookies (MED-14, BREAKING)
Audit 2026-05-10 — close MED-14 from the HANDOFF.md backend batch
(item 5). The session, CSRF, and OIDC pre-login cookies all carry
the __Host- prefix; browsers now reject any subdomain attempt to
overwrite them.
Cookie name changes (BREAKING — existing sessions invalidate):
- certctl_session → __Host-certctl_session
- certctl_csrf → __Host-certctl_csrf
- certctl_oidc_pending → __Host-certctl_oidc_pending
The __Host- prefix requires Path=/ + Secure + no Domain attribute.
Post-login session + CSRF cookies already met all three. The pre-login
cookie's Path widened from '/auth/oidc/' to '/' to satisfy the prefix;
the cookie lives 10 minutes and is only consumed by the callback
handler, so the wider path scope is harmless.
Files touched:
- internal/auth/session/domain/types.go — constant rename + comment
- internal/auth/session/domain/types_test.go — assertion update
- internal/api/handler/auth_session_oidc.go — pre-login set + clear
paths widened from /auth/oidc/ to /
- web/src/api/client.ts — readCSRFCookie now compares against
'__Host-certctl_csrf'
- CHANGELOG.md — Unreleased > Security (BREAKING) entry
- docs/migration/oidc-enable.md — operator-facing detail of the
one-time re-authentication window + GUI customization guidance
Operator impact: ONE re-login prompt per active session at the deploy
that lands this change. Subsequent logins issue the __Host-prefixed
cookie automatically. Existing bookmarked deep links work without
modification (cookies are path-scoped, not URL-scoped).
Refs: cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md item 5
cowork/auth-bundles-audit-2026-05-10.md MED-14
|
||
|
|
e7c4654b16 |
harden(auth/session+oidc): 503/401 split + go-oidc string pin (LOW-6 + Nit-2)
Audit 2026-05-10 — close LOW-6 + Nit-2 from the HANDOFF.md backend
batch (items 8 + 9).
LOW-6: introduce ErrSessionTransient sentinel in session.Service.
session.Validate now distinguishes:
- errors.Is(err, repository.ErrSessionNotFound) → ErrSessionInvalidCookie (401)
- All other repo errors → ErrSessionTransient (503)
The session middleware maps ErrSessionTransient to HTTP 503 with
Retry-After: 1. Pre-fix, every DB hiccup looked like a forged-cookie
401 and forced the user to re-authenticate on a transient outage.
Two new regression tests pin the wire shape:
- TestService_Validate_TransientSessionGetError (service layer)
- TestService_Validate_SessionNotFoundMapsToInvalidCookie (negative
leg: not-found stays 401)
- TestSessionMiddleware_TransientErrorMappedTo503 (middleware-level
503 + Retry-After header)
Nit-2: isJWKSFetchError documentation now pins go-oidc/v3 v3.18.0 as
the source-of-truth string set. v3.18.0 exposes only
*oidc.TokenExpiredError as a typed error; JWKS-fetch failures bubble
up as fmt.Errorf-wrapped strings. New regression test
TestIsJWKSFetchError_GoOIDCV318Strings pins the canonical substrings
emitted by go-oidc's jwks.go — a future upstream bump that changes
the wording trips the test and forces the matcher to be re-derived.
The test caught a real gap: 'oidc: failed to decode keys' (emitted
when the IdP returns non-JSON at the jwks_uri — broken proxy, gateway
HTML error page, etc.) was previously misclassified as a generic 500
instead of 503 ErrJWKSUnreachable. Added 'decode keys' substring to
the matcher.
Status: LOW-6 + Nit-2 marked CLOSED in audit-doc table.
Refs: cowork/auth-bundles-fixes-2026-05-10/HANDOFF.md items 8, 9
cowork/auth-bundles-audit-2026-05-10.md LOW-6, Nit-2
|
||
|
|
9cce2ab043 |
harden(auth): LOW + Nit batch — bootstrap audit, crypto/rand, XFF trust, CSRF check, protocol-prefix unify (Batch 1)
Audit 2026-05-10 — close 8 LOWs + 2 Nits in-bundle. Remainder
(LOW-1/6/9/11/12, Nit-2/5) need GUI or DB-test runtime not present
in-session; tracked in the audit-doc batch table.
LOW-2: bootstrap.ValidateAndMint now emits 'bootstrap.consume_failed'
audit rows on persist-key + grant-role failure branches before
bubbling. Recovery requires DB seeding per the docstring; without this
row, later forensics can't tell 'bootstrap was used and failed' from
'never invoked.'
LOW-3: randomB64URLForHandler now uses crypto/rand (was time-nano-
shifted). Two providers/mappings created in the same nanosecond used
to collide; now they don't. Time-nano fallback retained for the
unlikely crypto/rand-broken path.
LOW-4: breakglass.verifyDummy uses s.readRand(salt) for the dummy
Argon2id verify. Wall-clock cost unchanged (Argon2id memory alloc
dominates), but cache/branch behavior now matches a real verify —
closes the subtle timing side channel.
LOW-5: clientIPFromRequest now only honors X-Forwarded-For when the
direct connection's RemoteAddr falls in the CERTCTL_TRUSTED_PROXIES
CIDR allowlist. Default-deny: empty list means XFF is ignored.
SetTrustedProxies wired in cmd/server/main.go from cfg.Auth.TrustedProxies.
LOW-7: internal/auth/protocol_endpoints.go::ProtocolEndpointPrefixes
now carries /scep-mtls + /.well-known/est-mtls (previously only in
router.AuthExemptDispatchPrefixes; the two lists had drifted). The
canonical-prefix coverage test in Phase 12 still pins the set.
LOW-8: docs/operator/rbac.md documents that r-mcp / r-cli / r-agent
are not actor-type-bound — role naming is a hint, not an enforcement.
Operators wanting hard binding must apply periodic audit queries.
Native binding is on the v2 roadmap.
LOW-10: Session.Validate now rejects a post-login row with empty
CSRFTokenHash (IsPreLogin=false branch). validSession test fixture
updated with a valid 64-hex CSRF hash.
Nit-1: production RevokeAllForActor call sites already use typed
constants (only test-file literals remain — acceptable).
Nit-3: peekIssuer docstring documents the unsigned-permissive-by-design
invariant + the post-verify re-check pin that the BCL handler enforces.
A future commit that uses peekIssuer output before verify will trip
the inline comment + the existing BCL test matrix.
Status table updated in cowork/auth-bundles-audit-2026-05-10.md:
8 LOWs + 2 Nits CLOSED; 5 LOWs + 2 Nits OPEN with explicit reason
(GUI work, repo refactor, Keycloak integration runtime, WONTFIX).
Refs: cowork/auth-bundles-audit-2026-05-10.md LOW-2/3/4/5/7/8/10
cowork/auth-bundles-audit-2026-05-10.md Nit-1/3
|
||
|
|
630831aeac |
harden(audit+session): full SHA-256 audit hash + cookie segment length cap (MED-15 + Nit-4)
Audit 2026-05-10 Fix 13 Phase F + Fix 14 Phase F partial — close
MED-15 + Nit-4. Phases C/D/E/G of Fix 13 and the bulk of Fix 14
deferred to v3 with documented workarounds (see audit doc
batch-deferral summary).
MED-15: internal/api/middleware/audit.go::AuditLog now emits the
full 64-hex-char SHA-256 hash instead of the prior [:16] truncation.
The audit_events.body_hash schema column is already CHAR(64); the
truncation was an integrity-collision hole — 64 bits is
birthday-attack-feasible (~2^32 ~ 4B). Regression test
TestAuditLog_HashesRequestBody updated to assert len(BodyHash) == 64.
Nit-4: internal/auth/session/service.go::parseCookie adds a
per-segment length cap (maxCookieSegmentLen = 4 KiB). Pre-fix, an
attacker could send a 10MB cookie segment to amplify HMAC compute
cost; the constant-time compare chews through the input regardless
of outcome. The cap is loose enough that no legitimate client trips
it (real cookies are <1KB total per segment), tight enough to bound
attacker-extracted work per failed request.
Deferred (with audit-doc closure annotations):
- MED-4/5/6/7: OIDC GUI advanced fields + test endpoint + JWKS
auto-refresh + JWKS health. v3 OIDC-operator-experience bundle.
Workarounds documented.
- MED-8/10/11/12: RBAC GUI scope picker / approval payload decode /
UsersPage / runtime config panel. v3 GUI-polish bundle. Backend
already accepts the scope_type/scope_id fields; the gap is GUI.
- MED-13: MCP tools for approvals / break-glass / bootstrap.
v3 MCP-expansion bundle.
- MED-14: __Host- cookie rename. Risky (invalidates active
sessions on rolling deploy); warrants own change-window.
- MED-16/17: Pre-login UA/IP binding + RFC 9207 iss URL check.
v3 OIDC-hardening bundle.
- All 12 LOWs + 4 of 5 Nits: v3 cleanup bundle.
Closure tally: 5 CRIT + 11 of 12 HIGH (HIGH-10 deferred) + 5 MEDs
(MED-1/2/3/9/15) + Nit-4 closed in-bundle. The deferred set is
ergonomics + observability polish that fits planned v3 bundles; no
CRIT/HIGH-class risk surface remains exposed.
Refs: cowork/auth-bundles-audit-2026-05-10.md MED-15, Nit-4
Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase F
cowork/auth-bundles-fixes-2026-05-10/14-low-nit-cleanup.md Phase F
|
||
|
|
ba0959ddc7 |
feat(auth/sessions): list-all gate + revoke-all-except-current (MED-1/2/3)
Audit 2026-05-10 Fix 13 Phase A — close MED-1, MED-2, MED-3.
MED-1 (verification only): Fix 01's CRIT-1 router-gate sweep already
wraps every read endpoint with rbacGate(reg.Checker, '<resource>.read',
...). Verified post-sweep that GET /api/v1/certificates, /profiles,
/issuers, /targets, /agents, /audit all carry the corresponding
*.read permission gate.
MED-2: ListSessions now gates ?actor_id=<other> on auth.session.list.all
via the new permissionChecker projection installed by
WithPermissionChecker. cmd/server/main.go threads the existing
authCheckerAdapter into the handler. When caller's actor_id !=
caller.ActorID AND the handler has a checker, an inline
CheckPermission(..., 'auth.session.list.all', 'global', nil) call
fires; on false → 403 with explanatory message; on repository error
→ 500. Defense-in-depth: the router-level rbacGate enforces
auth.session.list as the floor; the .list.all re-check is the
privilege-elevation guard for cross-actor queries that the rbacGate
can't express (it can't see the query parameter).
MED-3: ship DELETE /api/v1/auth/sessions?except=current — the
'sign out all other sessions' flow. Gated by auth.session.revoke;
the handler reads the caller's current session ID from
session.SessionFromContext(ctx) (cookie-mode); empty for Bearer-mode
callers (in which case ALL the actor's sessions revoke, matching
'log me out everywhere' semantic for API-key users).
New repository method SessionRepository.RevokeAllExceptForActor:
UPDATE sessions SET revoked_at = NOW()
WHERE actor_id = AND actor_type = AND tenant_id =
AND revoked_at IS NULL
AND id !=
returning rowcount. Added to the interface in internal/repository/session.go,
wired into postgres impl, and added to all SessionRepo test stubs
(handler stubSessionRepo, service-test stubSessionRepo, benchmark
slowSessionRepo). The session.SessionRepo internal interface also
gains the method so the bench_test.go forwarder compiles.
Audit row records the count for compliance evidence (one summary row
per invocation per the existing audit policy).
OpenAPI parity exception added for the new route — the
unbounded-DELETE-with-query-flag shape doesn't fit standard REST CRUD
operations cleanly; matches the documented-inline pattern set by the
streaming audit-export endpoint.
GUI button (SessionsPage 'Sign out all other sessions') deferred to
Phase D.
Refs: cowork/auth-bundles-audit-2026-05-10.md MED-1, MED-2, MED-3
Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase A
|
||
|
|
f5ba17114d |
fix(audit): close silence-leg of HIGH-6; emit WARN on audit-write failure
Audit 2026-05-10 HIGH-6 partial closure (silence leg). The audit
identified two distinct gaps in the auth surface's audit-emit pattern:
(1) silence — `_ = audit.RecordEventWithCategory(...)` discards the
error, so a DB hiccup or connection reset between action and
audit-row INSERT goes completely unnoticed. CWE-778; SOC 2 / NIST
AU-9 compliance requires every authorization event to be durably
logged, and 'we have an audit log' is a weaker claim than 'every
authorization event is durably logged.'
(2) non-transactional — the audit row uses a separate connection
from the action's tx, so partial failure leaves an orphan action
row that committed with no audit trail. Decision 8 of the
auth-bundles-index requires action + audit row atomic.
This commit closes leg (1) fully across all six audit-emit call sites
in the auth surface:
- internal/service/auth/actor_role_service.go::recordAudit
- internal/service/auth/role_service.go::recordAudit
- internal/auth/bootstrap/service.go::ValidateAndMint
- internal/auth/breakglass/service.go::recordAudit
- internal/auth/session/service.go::recordAudit
- internal/api/handler/auth_session_oidc.go::recordAudit
- internal/service/profile.go::Update (Phase 9 approval-bypass)
Each `_ = ...` swallow is replaced with:
if err := audit.RecordEventWithCategory(...); err != nil {
slog.WarnContext(ctx, '<surface> audit write failed (action
committed; audit row may be missing)',
'action', action, 'actor_id', actor, 'resource_id', resource,
'err', err)
}
Operators monitoring audit-write failures now see structured WARN
logs with action + actor + resource attribution; missing audit rows
can be cross-referenced against monitoring without manual SELECT-from-
audit-table.
Infrastructure for leg (2) (transactional commit) is also landed in
this commit:
- service.AuditService.RecordEventWithCategoryWithTx (new method;
accepts repository.Querier from postgres.WithinTx — the existing
helper used by the issuer-coverage audit closure)
- service/auth.AuditService interface declares the new method
- test stub fakeAudit.RecordEventWithCategoryWithTx satisfies the
extended interface
The eight per-path WithinTx-refactors documented in
cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
(role grant/revoke, session revoke, breakglass set/remove, approval
submit/approve/reject, OIDC provider CRUD, bootstrap consume) are
deferred to a v3 follow-on bundle. Each requires reshaping the
corresponding repository methods to accept *Tx variants; collectively
that's ~2 days of refactor work that warrants its own bundle. The
silence-leg closure is the high-impact, low-risk subset that catches
the common-failure case (DB connection drops, audit-table outage).
Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-6
Spec: cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
|
||
|
|
0f340beb14 |
fix(auth/ux): cause-aware OIDC + session error surfacing (HIGH-7 + HIGH-8 closure)
Server (HIGH-7): the OIDC callback failure path now 302-redirects to /login?error=oidc_failed&reason=<category> instead of emitting a blank 400. `category` is the existing audit `failure_category` value; classifyOIDCFailure was extended with three new sentinel paths (email_domain_not_allowed, email_missing_but_required, pkce_invalid) so CRIT-5 + PKCE failures get distinguishable GUI rendering. Audit-log observability is unchanged — the same failure_category is written to the auth.oidc_login_failed audit row; the 302 is purely a UX leg layered on top. Server (HIGH-8): SessionMiddleware now stashes a cause classification on the request context when Validate returns an error, mapping the sentinels via classifySessionError (errors.Is-based, so wrapped sentinels still classify) to the stable wire-strings idle_timeout / absolute_timeout / back_channel_revoked / invalid_token. The 401 emit point in bearerSkipIfAuthenticated reads the stashed cause and emits WWW-Authenticate: Bearer realm="certctl", error="invalid_token", error_description=<cause> per RFC 6750 §3. GUI (HIGH-7): LoginPage reads ?error= + ?reason= from the URL via react-router useSearchParams and renders an operator-friendly amber-bordered banner above the form; OIDC_FAILURE_REASON_TEXT maps all 16 known categories with a defensive 'unspecified' fallback for forward-compat with future server-side categories. GUI (HIGH-8): api/client fetchJSON parses the WWW-Authenticate cause via parseWWWAuthenticateCause and attaches it to the 'certctl:auth-required' CustomEvent detail; AuthProvider redirects to /login?session_expired=<cause> on cause-aware 401s; LoginPage renders a blue-bordered session-cause banner. invalid_token stays on the current page (no hard redirect for opaque failures). Misc cleanup: ErrorState now accepts the title/message/data-testid form added by CRIT-4 BreakglassPage (was erroring tsc on master). Regression matrix: - internal/api/handler/oidc_redirect_categories_test.go pins all 16 failure categories to the 302 + reason= location + audit-row leg - internal/auth/session/www_authenticate_test.go pins the 4 stable cause categories on classifySessionError (incl. errors.Is wrapped sentinels) + the WWW-Authenticate emission across all 4 categories + the no-session-context fallback case - internal/api/handler/auth_session_oidc_test.go: 4 pre-existing TestLoginCallback_*Returns400 tests updated to assert 302 + reason= location (the wire shape changed from 400 to 302, but the audit observability and behaviour-equivalent failure-classification are preserved) - web/src/pages/LoginPage.test.tsx: 6 new cases pinning the failure banner, session-cause banner, unknown-reason fallback, and forward-compat 'unspecified' category Spec: cowork/auth-bundles-fixes-2026-05-10/08-high-7-8-error-surfacing.md Closes: HIGH-7, HIGH-8 of cowork/auth-bundles-audit-2026-05-10.md |
||
|
|
1697845493 |
fix(auth): wire RevokeAllForActor + RotateCSRFToken to mutation paths
Closes HIGH-1 + HIGH-2 of the 2026-05-10 audit.
HIGH-1: breakglass.Service.SetPassword and RemoveCredential now call
sessions.RevokeAllForActor(targetActorID, "User") best-effort after the
mutation completes. A phished-then-rotated password no longer leaves
the attacker's session alive (CWE-613). Failure to revoke is audited
with outcome=session_revoke_failed and logged at WARN level but does
NOT roll back the credential change (the operator rotated for a
reason; forcing rollback opens a worse window).
- breakglass.SessionMinter interface extended with RevokeAllForActor.
- cmd/server/main.go::breakglassSessionMinterAdapter gains the bridge
to session.Service.RevokeAllForActor.
- stubSessions in service_test.go tracks revokeAllIDs / revokeAllTypes
/ revokeAllErr.
- Three regression tests:
- TestService_SetPassword_RevokesExistingSessions
- TestService_RemoveCredential_RevokesExistingSessions
- TestService_SetPassword_RevokeFailureDoesNotRollback
HIGH-2: New session.Service.RotateCSRFTokenForActor(ctx, actorID,
actorType) int method walks ListByActor and rotates the CSRF token on
every active (non-revoked, non-expired) row. Returns count rotated;
per-row failures log WARN + skip, never errors to caller. New
handler.CSRFRotator interface + AuthHandler.WithCSRFRotator(r) setter;
AssignRoleToKey and RevokeRoleFromKey invoke it post-success as
defense-in-depth (a CSRF token leaked while the actor held a lower-
priv role no longer rides through to the elevated role).
- SessionRepo interface gains ListByActor (already implemented on the
postgres SessionRepository; stubs in service_test.go + bench_test.go
updated to match).
- cmd/server/main.go calls .WithCSRFRotator(sessionService) on the
AuthHandler.
- Two regression tests:
- TestRotateCSRFTokenForActor_RotatesAllActiveRows (asserts revoked /
expired / other-actor rows are skipped)
- TestRotateCSRFTokenForActor_NoSessionsReturnsZero
Verification gate green: gofmt clean, go vet clean, go test -short
-count=1 ./internal/auth/breakglass/ ./internal/auth/session/
./internal/api/handler/ ./internal/api/router/ ./cmd/server/
./internal/domain/auth/ — all pass.
CRIT-1..CRIT-5 + HIGH-1 + HIGH-2 of the 2026-05-10 audit now closed
on this branch. Spec at
cowork/auth-bundles-fixes-2026-05-10/06-high-1-2-revoke-and-rotate.md.
Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-1 HIGH-2
|
||
|
|
9b6294e83d |
auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets
Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four
benchmarks producing four numbers + the operator-doc table; three
default-tag benchmarks runnable on every CI runner, the fourth
(cold-cache OIDC) runnable on operator-side Docker hosts via the
new make target.
Files
=====
internal/auth/session/bench_test.go (NEW):
* BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs).
Warm in-memory repo + warm session row. Pure CPU: parseCookie +
HMAC verify + map lookup + sentinel checks.
* BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms).
Same pipeline but with a configurable per-call delay simulating
a 1ms Postgres RTT on each repo call. Two repo calls per
Validate (signing-key fetch + session-row fetch) = 2ms minimum;
Go time.Sleep granularity adds ~1-2ms jitter. Documented why
testcontainers Postgres isn't viable inside b.N: 30+ second
container boot incompatible with per-iteration timing.
* slowSessionRepo + slowKeyRepo wrappers add the per-call delay
via time.Sleep; they delegate to the existing in-memory stubs.
* reportPercentiles helper sorts + reports p50/p95/p99/max via
b.ReportMetric (Go testing.B doesn't surface percentiles
natively).
internal/auth/oidc/bench_test.go (NEW):
* BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms).
Drives full HandleCallback against an in-process mockIdP
(httptest.Server localhost loopback). Pre-warmed JWKS cache via
RefreshKeys at setup. Pipeline: pre-login consume + state
compare + token exchange (localhost ~50-200µs) + go-oidc
Verify (RSA-2048 sig verify + alg pin) + service-layer iss/
aud/azp/at_hash/exp/iat/nonce re-checks + group-claim
resolution + group→role mapping + user upsert + session mint.
* The localhost-loopback /token call adds ~100-500µs of TCP
overhead vs pure crypto; the prompt's "no network calls"
steady-state framing accommodates this since the localhost
loopback is the closest practical proxy for a same-region
IdP /token call (which adds 5-15ms in production).
internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration):
* BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs).
Drives RefreshKeys against a live Keycloak container from the
Phase 10 testfixtures harness. Each iteration evicts the
in-process cache + re-fetches discovery + re-fetches JWKS over
real HTTP + re-runs the IdP-downgrade-attack defense.
* Network-bounded: the cold path is dominated by HTTPS RTT to
the IdP discovery endpoint, NOT crypto. The 200ms cap
accommodates a geographically-distant IdP (~150ms RTT) plus
the in-process JWKS fetch + downgrade-defense logic (~5ms
locally).
* Reuses the sharedKeycloak fixture from
integration_keycloak_test.go (Phase 10) so the benchmark
doesn't pay the 60-90s container boot cost separately. Skips
with a clear message if invoked without the integration test
setup.
* Reports p50/p95/p99/max in MILLISECONDS (vs the
microsecond-granularity steady-state benchmarks) since the
cold path is two orders of magnitude slower.
internal/auth/oidc/service_test.go (MODIFIED):
* Refactored newMockIdP(t *testing.T) to delegate to a new
newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern
for sharing test fixtures between *testing.T and *testing.B.
No behavior change for existing service_test.go tests; the
benchmark file in bench_test.go calls newMockIdPWithTB(b)
to get the same fixture.
docs/operator/auth-benchmarks.md (NEW):
* Result table with all four benchmarks + targets + measured
numbers + status markers. Four-row matrix for the default-tag
benchmarks; the fourth row (cold-cache) is operator-recorded
with an empty cell waiting for the first Docker-equipped run.
* Hardware floor section pinning the 4 vCPU / 8 GiB RAM /
Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners
satisfy this; operators on weaker hardware re-record.
* "What each benchmark covers (and what it doesn't)" section
per benchmark, distinguishing the warm steady-state pipeline
from the cold path's network-bounded budget.
* "Cold-cache OIDC: how to run" subsection documenting the
make target + the test+benchmark coupling needed to populate
sharedKeycloak. Operator-recorded baseline table seeded
empty for first runs.
* "Why the cold path is bounded by network latency, not crypto"
section explaining the budget breakdown:
- TCP handshake (1 RTT)
- TLS 1.3 handshake (1-2 RTTs)
- 2 HTTPS GETs (discovery + JWKS, 1 RTT each)
- In-process crypto on the certctl side (~5-10ms total)
So the 200ms cap is operator-checkable: real measurement >
200ms means the IdP is slow OR network congestion OR DNS
issues — the diagnosis is upstream of certctl. Real
measurement < 200ms means the IdP is on a fast same-region
link.
* Methodology section pinning the per-iteration timing capture
+ sort + percentile-extract approach.
* Pre-merge audit section for the Phase 14 exit gate: four
benchmarks ran, four numbers recorded, steady-state targets
met, cold path is operator-runnable + measurably-bounded.
Makefile (MODIFIED):
* Added `make benchmark-auth` (default-tag, runs three of four
benchmarks at 2000 samples each).
* Added `make benchmark-auth-coldcache` (integration-tagged,
runs OIDC cold-cache against live Keycloak; requires Docker).
* Both targets carry explanatory comment blocks.
docs/README.md (MODIFIED):
* Added the auth-benchmarks.md doc to the Operator nav table
alongside performance-baselines.md.
Measured baselines at Phase 14 close (linux/arm64, 4 vCPU)
==========================================================
BenchmarkSession_SteadyState p99 = 5µs (target < 1ms) ✓ 200× under
BenchmarkSession_ColdProcess p99 = 7.1ms (target < 10ms) ✓
BenchmarkOIDC_SteadyState p99 = 1.5ms (target < 5ms) ✓ 3× under
BenchmarkOIDC_ColdCache operator-runs (Docker required)
Verification
============
* gofmt -l on three new bench files: clean.
* go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean
(default tag).
* go vet -tags integration ./internal/auth/oidc/...: clean (integration
tag covers the bench_keycloak_test.go file).
* go test -short -count=1 across all 5 OIDC + session packages:
green; the bench_*_test.go files compile but don't run under
-short (testing.Short() guards + benchmarks are not selected
by -run pattern).
* All three runnable benchmarks executed and produce the numbers
above; recorded in auth-benchmarks.md.
|
||
|
|
3189f3cd71 |
auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +
chained-auth combinator + AuthInfo OIDC providers extension + 2 CI
guards (Bundle-1-compat + Bundle-1-to-2-upgrade)
Phase 6 wires the Phase 4 session service + Phase 5 OIDC handlers into
the request path. Three middlewares + one combinator land in
internal/auth/session/middleware.go:
1. SessionMiddleware reads `certctl_session` cookie, validates via
SessionService.Validate, populates the legacy UserKey/AdminKey
+ Phase 3 RBAC context keys (ActorIDKey/ActorTypeKey/TenantIDKey)
so downstream RequirePermission + audit-attribution see a
consistent caller. Best-effort UpdateLastSeen keeps the idle-
expiry sliding window fresh. CRITICALLY: never 401s on validate
failure — defers to the next middleware so the chained-auth
combinator can fall back to Bearer.
2. CSRFMiddleware gates state-changing methods (POST/PUT/DELETE/
PATCH) for session-authenticated requests. API-key actors are
EXEMPT (no session row in context => CSRF doesn't apply; they're
not browser-driven). Constant-time-compares SHA-256(X-CSRF-Token
header) against the session row's stored hash via
SessionService.ValidateCSRF. Mismatch returns 403.
3. ChainAuthSessionThenBearer is the load-bearing chained-auth
combinator: tries the session cookie first; on miss/invalid,
falls back to the API-key Bearer middleware; if neither
authenticates, 401. The composition uses bearerSkipIfAuthenticated
so a request with both a valid session AND a valid Bearer uses
the session (cookie wins per the Bundle 2 contract).
Middleware chain order in cmd/server/main.go (per Phase 6 spec):
RequestID → Logging → Recovery → CORS → RateLimit → AUTH (chained:
session → Bearer) → CSRF (state-changing only; API-key exempt) →
Audit → Handler
The chained authMiddleware replaces the bare Bundle-1 bearerMiddleware
at the chain entry point; csrfMiddleware lands immediately after so
session-authenticated requests pass through CSRF before audit. Both
new middlewares are pass-throughs when sessionService is nil
(pre-Phase-4 builds).
AuthInfo extension (Category E): GET /api/v1/auth/info now returns the
list of configured OIDC providers (id + display_name + login_url
where login_url = `/auth/oidc/login?provider=<id>`) so the GUI Login
page renders the correct "Sign in with X" buttons. Endpoint stays
auth-exempt; the providers list is public configuration. Wired via
HealthHandler.OIDCProvidersResolver + a new OIDCProvidersListResolver
projection interface; the cmd/server adapter
oidcProvidersListAdapter projects the postgres OIDCProviderRepository
into the public-safe shape. Resolver lookups are best-effort: failures
fall back to the minimal payload rather than 500-ing the GUI's auth
probe. Nil resolver preserves the pre-Phase-6 minimal shape so test
fixtures + no-db deploys keep compiling.
Bypass list preserved (Category E): the existing public-route
allowlist in router.AuthExemptRouterRoutes is preserved by virtue of
those routes registering via direct r.mux.Handle (they bypass the
entire chain). The protocol-endpoint allowlist (ACME/SCEP/EST/OCSP/
CRL) bypasses via cmd/server/main.go::buildFinalHandler URL-prefix
dispatch — those routes never reach the auth middleware at all. Both
preservations are pinned by the Bundle-1 compat CI guard below.
Tests (internal/auth/session/middleware_test.go):
All 7 Phase 6 spec-mandated middleware-chain tests pass:
1. Session cookie + correct CSRF → 200.
2. Session cookie + wrong CSRF → 403.
3. Bearer-only (no session) + no CSRF → 200 (API-key actors are
CSRF-exempt by design).
4. No cookie + no Bearer → 401.
5. Expired cookie + valid Bearer → fall back to Bearer succeeds.
6. Tampered cookie → 401 (no Bearer to fall back to).
7. Bypass-list awareness — state-changing method, no auth, no
session row → uniform 401 (NOT a CSRF 403; the CSRF check is
gated on session-row presence and never fires for unauth
requests).
Plus coverage-lift tests covering nil-service pass-through, safe-
methods bypass, SessionFromContext nil + populated, isStateChangingMethod
matrix, clientIPFromRequest variants (RemoteAddr / XFF first-hop /
XFF single / no-port), nil-bearer chain branches.
Coverage on internal/auth/session/middleware.go: 100% per-function
across the 9 entry points (SessionValidator interfaces +
NewSessionMiddleware + NewCSRFMiddleware + ChainAuthSessionThenBearer +
bearerSkipIfAuthenticated + SessionFromContext + isStateChangingMethod
+ clientIPFromRequest + lastIndexByte). Package coverage 94.9%.
Two new CI guards:
scripts/ci-guards/bundle-1-compat-regression.sh — Bundle-1-only
compat invariants. Static-source checks that protect the Bundle-1
path since spinning up docker-compose + running the integration
test suite is sandbox-infeasible:
1. SessionMiddleware MUST defer-to-next on missing/invalid cookie.
2. CSRFMiddleware MUST be pass-through on missing session row.
3. cmd/server/main.go MUST wire ChainAuthSessionThenBearer.
4. The 4 public OIDC routes MUST be in AuthExemptRouterRoutes.
5. AuthInfo MUST guard on OIDCProvidersResolver != nil.
scripts/ci-guards/bundle-1-to-2-upgrade-regression.sh — Bundle-1 →
Bundle-2 upgrade invariants:
1. Migrations 000034..000037 use CREATE TABLE IF NOT EXISTS.
2. Migrations are wrapped in BEGIN; ... COMMIT;.
3. NO DROP TABLE / ALTER ... DROP COLUMN against any of the 19
protected Bundle-1 tables (api_keys, audit_events, certificates,
certificate_versions, profiles, issuers, targets, agents, jobs,
owners, teams, agent_groups, notifications, roles, permissions,
role_permissions, actor_roles, tenants, approvals,
intermediate_cas, issuance_approval_requests).
4. 000037 INSERTs use ON CONFLICT DO NOTHING (idempotent re-apply).
5. ChainAuthSessionThenBearer is wired (Bundle-1 Bearer keys
continue to authenticate post-upgrade).
6. Bootstrap handler is registered (fresh-deployment bootstrap
still works).
Both guards are sandbox-feasible static analysis. When the operator
gets a Linux VM with docker-in-docker, promote both to real `docker
compose up` integration tests against a v2.1.0 baseline DB dump.
Verifications: gofmt clean, go vet ./internal/auth/... ./internal/api/...
./cmd/server/... clean, go test -short -count=1 -race green across
internal/auth/session (94.9% coverage), internal/api/handler,
internal/api/router, no regressions in Bundle 1 packages, both new
ci-guards green.
|
||
|
|
9c679a5960 |
auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints),
pre-login store, OpenID Connect Back-Channel Logout 1.0, cookieAuth
scheme, 7 new auth permissions, CI guard, handler tests
Phase 5 of the bundle puts the Phase 3 OIDC service + Phase 4 session
service on the wire. 13 HTTP endpoints split into three logical groups:
Public OIDC handshake (auth-exempt; protocol-mediated):
GET /auth/oidc/login?provider=<id> -> 302 to IdP authorization URL
+ sets certctl_oidc_pending cookie
(10-min TTL, Path=/auth/oidc/,
SameSite=Lax)
GET /auth/oidc/callback?code=...&state=... -> consume pre-login row,
run Phase 3's 11-step token
validation, mint post-login
session, 302 to dashboard
POST /auth/oidc/back-channel-logout -> OpenID Connect BCL 1.0 — IdP
POSTs logout_token JWT; certctl
validates signature against IdP
JWKS via Phase 3 alg allow-list,
required claims (iss/aud/iat/jti/
events; exactly one of sub/sid;
nonce ABSENT per spec §2.4),
revokes matching sessions,
returns 200 with
Cache-Control: no-store
POST /auth/logout -> revoke caller's session
Session management (RBAC-gated auth.session.*):
GET /api/v1/auth/sessions -> auth.session.list (own / all)
DELETE /api/v1/auth/sessions/{id} -> auth.session.revoke (own bypass)
OIDC provider + group-mapping CRUD (RBAC-gated auth.oidc.*):
GET /api/v1/auth/oidc/providers -> auth.oidc.list
POST /api/v1/auth/oidc/providers -> auth.oidc.create
(client_secret encrypted
at rest via
internal/crypto.EncryptIfKeySet)
PUT /api/v1/auth/oidc/providers/{id} -> auth.oidc.edit
DELETE /api/v1/auth/oidc/providers/{id} -> auth.oidc.delete
(refused via
ErrOIDCProviderInUse → 409
when users authenticated
via this provider)
POST /api/v1/auth/oidc/providers/{id}/refresh -> auth.oidc.edit
(re-runs IdP downgrade
defense via
OIDCService.RefreshKeys)
GET /api/v1/auth/oidc/group-mappings -> auth.oidc.list
POST /api/v1/auth/oidc/group-mappings -> auth.oidc.edit
DELETE /api/v1/auth/oidc/group-mappings/{id} -> auth.oidc.edit
Migration 000037 ships:
- oidc_pre_login_sessions table (10-min absolute TTL, FK CASCADE on
oidc_provider_id, FK RESTRICT on signing_key_id; index on
absolute_expires_at for the GC sweep);
- 7 new permissions seeded into r-admin only:
auth.session.list, auth.session.list.all, auth.session.revoke,
auth.oidc.list, auth.oidc.create, auth.oidc.edit, auth.oidc.delete
CanonicalPermissions extended in lockstep at internal/domain/auth/
validate.go.
Pre-login machinery:
- internal/repository/oidc.go gains PreLoginRepository interface +
PreLoginSession struct + ErrPreLoginNotFound / ErrPreLoginExpired
sentinels.
- internal/repository/postgres/oidc_prelogin.go ships the impl;
LookupAndConsume uses DELETE ... RETURNING for atomic single-use.
- internal/auth/oidc/prelogin.go is the PreLoginAdapter that bridges
the OIDC service's Phase 3 PreLoginStore interface to the new
repository, signing the cookie value under the active
SessionSigningKey via the same v1.<id>.<key>.<HMAC> wire format
Phase 4 uses for post-login cookies. Defense-in-depth: the
pre-login `pl-` prefix is enforced by ParseCookieValue(prefix);
a stolen pre-login cookie cannot be replayed against the
post-login Validate path (pinned by
TestService_Validate_RejectsPreLoginCookieAtPostLoginGate).
Session package extension:
- internal/auth/session/service.go gains exported SignCookieValue,
ParseCookieValue (with caller-supplied id-1 prefix), ComputeCookieHMAC,
DecryptKeyMaterial wrappers so the OIDC pre-login adapter shares
the same length-prefixed HMAC math without code duplication.
- parseCookie no longer hardcodes the `ses-` prefix check (moved to
Validate as defense-in-depth; pre-login cookie verification uses
the `pl-` prefix via ParseCookieValue).
Cookie attributes (all Phase 5 endpoints honor CERTCTL_SESSION_SAMESITE
+ Secure=true via SessionCookieAttrs from Phase 4 config):
- certctl_oidc_pending: Path=/auth/oidc/, MaxAge=600s, SameSite=Lax
(cannot be Strict because the IdP-initiated callback is a top-level
navigation from a different origin).
- certctl_session: Path=/, Expires=8h, SameSite=Lax|Strict, HttpOnly.
- certctl_csrf: Path=/, Expires=8h, HttpOnly=false (intentional —
GUI must read it to echo into X-CSRF-Token header).
Audit logging on every mutating operation (event_category="auth"):
auth.oidc_login_succeeded / failed / unmapped_groups
auth.oidc_back_channel_logout / failed
auth.session_revoked
auth.oidc_provider_{created,updated,deleted,refreshed}
auth.group_mapping_{added,removed}
OpenAPI updates:
- cookieAuth security scheme added to api/openapi.yaml under
components.securitySchemes (apiKey / cookie / certctl_session).
- The 13 Phase 5 routes are added to SpecParityExceptions with a
deferral note: full per-endpoint OpenAPI rows land in a follow-on
commit alongside the GUI work (Phase 8) so the ergonomic shape can
be validated against the live GUI client.
CI guard: scripts/ci-guards/N-bundle-2-security-empty-preserved.sh
asserts api/openapi.yaml has ≥ 14 'security: []' occurrences (the
pre-Bundle-2 baseline). Reducing the count below 14 would silently
force a Bearer-or-cookie requirement onto an endpoint that legitimately
runs without certctl-issued credentials; the guard fires before that
regression lands.
Handler tests (internal/api/handler/auth_session_oidc_test.go):
- All 6 prompt-mandated negative cases:
BCL with missing events claim -> 400
BCL with nonce present -> 400 (per spec §2.4)
BCL with sig signed by an unknown key -> 400
Callback with replayed state -> 400
Callback with PKCE verifier mismatch -> 400
Callback with expired pre-login row -> 400
- Plus happy paths for every endpoint, edge cases (missing-cookie,
duplicate-name, in-use-409, wrong-tenant), and the Helper-function
coverage (peekIssuer, classifyOIDCFailure, defaultIfBlank,
defaultIntIfZero, clientIPFromRequest, encryptClientSecret).
Coverage on internal/api/handler/auth_session_oidc.go: 80.9% per-function
(above the Phase 5 spec's ≥ 80% floor).
Server wiring (cmd/server/main.go):
Wired AFTER sessionService (Phase 4) so the OIDC PreLoginAdapter can
sign pre-login cookies under the active SessionSigningKey:
oidcProviderRepo + oidcMappingRepo + oidcUserRepo + oidcPreLoginRepo
-> preLoginAdapter -> oidcService -> authSessionOIDCHandler.
sessionMinterAdapter shim bridges *session.Service.Create to the
oidcsvc.SessionMinter port the OIDC service consumes.
Router wiring (internal/api/router/router.go):
4 public OIDC routes via direct r.mux.Handle (auth-exempt; pinned in
AuthExemptRouterRoutes); 9 RBAC-gated routes via r.Register +
rbacGate(checker, perm, h). Routes only register when
reg.AuthSessionOIDC != nil so pre-Phase-5 builds skip the block
entirely.
Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/api/handler (74 tests +
new Phase 5 batch), internal/api/router (parity + auth-exempt
allowlist), internal/auth/oidc + session (no regressions), full domain
+ scheduler + config sweeps green, ci-guard
N-bundle-2-security-empty-preserved.sh green (17 ≥ 14 baseline).
|
||
|
|
17b30c1f7f |
auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
|
||
|
|
b0ac24fbf8 |
auth-bundle-2 Phase 1: OIDC + Session + User + Breakglass domain types
Phase 1 ships the persisted-shape types Bundle 2 needs end-to-end.
No DB migrations, no service layer, no HTTP handlers; Phase 2 ships
the SQL, Phase 3+ ship the consumers. Each type has a Validate()
method that enforces the on-disk invariants the schema will mirror,
and a focused _test.go that pins each invariant's failure mode.
Per-package summary:
internal/auth/oidc/domain/ (OIDCProvider + GroupRoleMapping):
* OIDCProvider carries the operator-configured IdP record. Fields
match the prompt's Phase 1 list plus IATWindowSeconds and
JWKSCacheTTLSeconds (Phase 3 references these by name; landing
them in Phase 1's domain type avoids the lying-field gap).
ClientSecretEncrypted is opaque from this layer; it is the v2 blob
produced by internal/crypto/encryption.go and is `json:"-"` so it
never wire-leaks.
* Validate() rejects: invalid id prefix, empty name, non-https
issuer_url (matches Phase 3's "JWKS endpoint MUST be HTTPS"),
empty client_id, empty client_secret_encrypted, non-https
redirect_uri, invalid groups_claim_format, scopes missing openid,
IAT window outside (0, 600], JWKS cache TTL below 60s. Defaults
applied in-place: GroupsClaimPath="groups", GroupsClaimFormat=
"string-array", Scopes=["openid","profile","email"],
IATWindowSeconds=300, JWKSCacheTTLSeconds=3600,
TenantID="t-default".
* GroupRoleMapping carries the operator-configured group-to-role
rule. Validate() pins prefix conventions ("grm-", "op-", "r-")
and non-empty group name.
* 18 tests across happy-path + every negative invariant.
internal/auth/session/domain/ (Session + SessionSigningKey):
* Session covers BOTH the post-login row (full 1h-idle/8h-absolute
cookie lifecycle) AND the Phase 5 pre-login row (10-minute TTL,
carries OIDC state+nonce+PKCE verifier across the IdP redirect).
IsPreLogin discriminates. CSRFTokenHash holds SHA-256 of the
CSRF token plaintext (the plaintext lives in a JS-readable
certctl_csrf cookie; storing only the hash on the row defends
against DB-read leaks per the Phase 4 CSRF contract).
* Validate() pins: id prefix "ses-", non-empty actor id/type,
signing key id prefix "sk-", AbsoluteExpiresAt strictly > Idle,
IdleExpiresAt strictly > CreatedAt, CSRFTokenHash exactly 64
lowercase hex chars when set.
* Cookie naming constants pinned by a separate test
(TestCookieNamingConstants) so a future rename can't silently
break the GUI's web/src/api/client.ts which reads these names by
string.
* SessionSigningKey stores the v2-encrypted HMAC key material; the
retired-before-created invariant catches malformed rows. 14
tests across both types.
internal/auth/user/domain/ (User):
* Federated-human identity for SSO logins. Distinct from Bundle 1's
free-form actor_id strings: actor_roles.actor_id = User.ID for
federated humans (per the prompt's note about how the two
identity systems intersect).
* WebAuthnCredentials JSONB column reserved for v3 (Decision 12);
defaults to "[]" on Validate() so Bundle 2 + v3 share the same
on-disk format from day one.
* Email validation is intentionally loose (basic shape: one @,
non-empty local + domain, no whitespace, dot in domain). RFC 5321
/ 5322 grammars are not enforced; the IdP issued the email and
we trust its shape, only rejecting gross corruption.
* 8 tests across happy-path + invalid-id + empty-email +
malformed-email + invalid-provider-id + tenant defaulting +
WebAuthn-credentials passthrough.
internal/auth/breakglass/domain/ (BreakglassCredential):
* Phase 7.5 type. Argon2id PHC-format password hash; Validate()
pins the Argon2id magic prefix so non-Argon2id formats (bcrypt,
pbkdf2, plaintext) are rejected at the persistence boundary.
* MinPasswordLengthBytes (12) + MaxPasswordLengthBytes (256)
constants pinned by a dedicated test so the operator-facing
password-strength contract can't drift silently.
* IsLocked(now) helper exposes the lockout state machine for the
Phase 7.5 service to consume; the lockout window default is
15min in the service layer.
* 9 tests across happy-path + per-invariant negative + lockout
state machine + tenant defaulting.
Cross-cutting:
* Every type has json:"-" on the encrypted-credential field
(ClientSecretEncrypted, KeyMaterialEncrypted, PasswordHash,
CSRFTokenHash) so even a misconfigured handler that marshals the
domain type directly into a response body cannot leak the
secret. Mirrors Bundle 1's pattern for issuer/target credentials.
* Every type carries TenantID with Validate() defaulting to
authdomain.DefaultTenantID. Forward-compat for the future
managed-service multi-tenant activation; Bundle 2 ships
single-tenant.
Verifications:
* gofmt -l clean across all 8 new files (one round-trip required to
satisfy Go 1.19+ doc-comment list-formatting rules in
session/domain/types.go).
* go vet clean on internal/auth/oidc/... + session/... + user/... +
breakglass/...
* go test -short -count=1 green on all four new domain packages
(49 test functions total).
* go test -short -count=1 still green on Bundle 1 packages
(internal/auth, internal/auth/bootstrap, internal/service/auth,
internal/config).
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.
Phase 1 exit criteria from cowork/auth-bundle-2-prompt.md:
* All types compile: yes.
* Validators have at least 5 test cases each: yes (smallest is
User with 8 tests; OIDCProvider has 13).
* make verify equivalent green: gofmt + vet + go test pass
(golangci-lint deferred to CI per the same operating-rule
pattern Phase 0 used).
|