3 Commits

Author SHA1 Message Date
shankar0123 1697845493 fix(auth): wire RevokeAllForActor + RotateCSRFToken to mutation paths
Closes HIGH-1 + HIGH-2 of the 2026-05-10 audit.

HIGH-1: breakglass.Service.SetPassword and RemoveCredential now call
sessions.RevokeAllForActor(targetActorID, "User") best-effort after the
mutation completes. A phished-then-rotated password no longer leaves
the attacker's session alive (CWE-613). Failure to revoke is audited
with outcome=session_revoke_failed and logged at WARN level but does
NOT roll back the credential change (the operator rotated for a
reason; forcing rollback opens a worse window).

- breakglass.SessionMinter interface extended with RevokeAllForActor.
- cmd/server/main.go::breakglassSessionMinterAdapter gains the bridge
  to session.Service.RevokeAllForActor.
- stubSessions in service_test.go tracks revokeAllIDs / revokeAllTypes
  / revokeAllErr.
- Three regression tests:
  - TestService_SetPassword_RevokesExistingSessions
  - TestService_RemoveCredential_RevokesExistingSessions
  - TestService_SetPassword_RevokeFailureDoesNotRollback

HIGH-2: New session.Service.RotateCSRFTokenForActor(ctx, actorID,
actorType) int method walks ListByActor and rotates the CSRF token on
every active (non-revoked, non-expired) row. Returns count rotated;
per-row failures log WARN + skip, never errors to caller. New
handler.CSRFRotator interface + AuthHandler.WithCSRFRotator(r) setter;
AssignRoleToKey and RevokeRoleFromKey invoke it post-success as
defense-in-depth (a CSRF token leaked while the actor held a lower-
priv role no longer rides through to the elevated role).

- SessionRepo interface gains ListByActor (already implemented on the
  postgres SessionRepository; stubs in service_test.go + bench_test.go
  updated to match).
- cmd/server/main.go calls .WithCSRFRotator(sessionService) on the
  AuthHandler.
- Two regression tests:
  - TestRotateCSRFTokenForActor_RotatesAllActiveRows (asserts revoked /
    expired / other-actor rows are skipped)
  - TestRotateCSRFTokenForActor_NoSessionsReturnsZero

Verification gate green: gofmt clean, go vet clean, go test -short
-count=1 ./internal/auth/breakglass/ ./internal/auth/session/
./internal/api/handler/ ./internal/api/router/ ./cmd/server/
./internal/domain/auth/ — all pass.

CRIT-1..CRIT-5 + HIGH-1 + HIGH-2 of the 2026-05-10 audit now closed
on this branch. Spec at
cowork/auth-bundles-fixes-2026-05-10/06-high-1-2-revoke-and-rotate.md.

Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-1 HIGH-2
2026-05-10 20:43:45 +00:00
shankar0123 f1d97710e1 feat(gui+auth): break-glass admin GUI surface (CRIT-4 closure)
Closes CRIT-4 of the 2026-05-10 audit. Bundle 2 Phase 7.5 shipped the
break-glass backend (Argon2id + lockout + 4 endpoints) but no GUI
surface. Operators recovering during an SSO outage had to hand-craft
curl commands — operationally hostile and the opposite of what
docs/operator/security.md advertised. This commit closes the gap.

Three GUI surfaces:

1. LoginPage.tsx — inline "Use break-glass account (SSO outage
   recovery)" toggle below the API-key form. Clicking reveals an
   amber-bordered inline form (actor-id + password, autocomplete=off).
   Calls breakglassLogin(actor_id, password); on success navigates
   to "/" where AuthProvider re-validates via the session-cookie path.
   Intentionally low-visibility (text-amber-600 small text) — this is
   the deliberate-bypass path, not the everyday-login path.

2. web/src/pages/auth/BreakglassPage.tsx — admin page at /auth/breakglass
   (permission-gated by auth.breakglass.admin). Three sections:
     - Sticky security banner ("every action audited; use only during
       incidents").
     - Set/rotate-password form (≥12-char + confirm-match).
     - Credentialed-actor table with rotate / unlock (disabled when
       not locked) / remove per row. Remove requires type-the-actor-id
       confirmation.

3. Layout.tsx nav — "Break-glass" entry under the auth section. Visible
   to all callers; the page itself permission-gates (server-side 403 is
   the load-bearing defense). Cosmetic hide-when-no-perm is deferred
   to fix 14's LOW bundle.

Backend support (new endpoint required to enumerate credentialed actors):

- internal/repository/breakglass.go — BreakglassCredentialRepository
  gains List(ctx, tenantID) method.
- internal/repository/postgres/breakglass.go — postgres impl; reuses
  the existing breakglassColumns / scanBreakglass helpers.
- internal/auth/breakglass/service.go — Service.List(ctx) method;
  returns ErrDisabled when CERTCTL_BREAKGLASS_ENABLED=false (handler
  maps to 404 for surface invisibility).
- internal/api/handler/auth_breakglass.go — ListCredentials handler;
  password_hash field NEVER serialized to the wire (response shape
  is intentionally limited to actor_id + timestamps + failure_count +
  locked_until).
- internal/api/router/router.go — registers GET
  /api/v1/auth/breakglass/credentials gated by auth.breakglass.admin.
- internal/api/router/openapi_parity_test.go — SpecParityExceptions
  entry for the new endpoint (full OpenAPI row rides along with the
  next OpenAPI sweep).

GUI api/client.ts gains breakglassListCredentials() + the
BreakglassCredentialRow type matching the wire shape.

Six Vitest cases in BreakglassPage.test.tsx pin the contract:
permission gate (forbidden state when caller lacks the perm; admin
surface when they have it), set-password mismatch rejection, set-
password below-threshold-length rejection, unlock-disabled-when-not-
locked, remove-modal type-confirm.

Verification gate green:
- gofmt -l clean on all touched files
- go vet clean
- go test -short -count=1 on internal/api/router (TestRouter_OpenAPIParity
  + TestRouterRBACGateCoverage + TestRouter_AuthExemptAllowlist),
  internal/api/handler (all BCL tests + ListCredentials),
  internal/auth/breakglass (Service.List + stubRepo.List),
  internal/repository/postgres, internal/domain/auth (auditor pin)
  — all pass.

CRIT-1 + CRIT-2 + CRIT-3 from the same audit are already closed on
this branch (commits 68ca42f, ca1e135, 00eace8). CRIT-5 (AllowedEmail-
Domains lying field) remains the last Critical blocker for v2.1.0.
Spec: cowork/auth-bundles-fixes-2026-05-10/04-crit-4-breakglass-gui.md.

Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-4
2026-05-10 20:24:52 +00:00
shankar0123 1d01c87663 auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap +
break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)

Phase 7 — OIDC first-admin bootstrap (Decision 3):

  - Optional AdminBootstrapHook closure on *oidc.Service. When wired,
    HandleCallback consults the hook AFTER group resolution + user
    upsert and BEFORE the empty-mapping fail-closed check. Hook
    receives (providerID, groups, userID); returns grantAdmin=true
    when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
    admin exists yet in the tenant.
  - cmd/server/main.go wires the hook as a closure that:
      * Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
      * Probes AdminExists via authActorRoleRepo (admin-already-exists
        silently returns false; bootstrap mode is one-shot per tenant).
      * Walks group intersection.
      * On match: grants r-admin via authActorRoleRepo.Grant + emits
        the bootstrap.oidc_first_admin audit row with
        event_category=auth + INFO log.
  - Coexists with the Bundle 1 env-var-token bootstrap. Both paths
    can be configured; first match wins (admin-existence probe
    short-circuits the second).
  - HandleCallback's empty-mapping fail-closed check moved AFTER the
    hook so a fresh deployment with zero group_role_mappings can
    still mint the first admin.
  - 5 tests in service_test.go: hook grants admin on match, hook
    returns false preserves empty-mapping fail-closed, admin-already-
    exists silently falls through to normal mapping, hook-error wraps
    + bubbles, idempotent when admin is already in the mapped role set.

Phase 7.5 — Break-glass admin (Decision 4, default-OFF):

Migration 000038 ships:

  - breakglass_credentials table — at-most-one-credential-per-actor
    (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
    state machine (failure_count, locked_until, last_failure_at).
    FK CASCADE on users(id) so deleting a user atomically removes
    their credential.
  - Two new permissions seeded into r-admin only:
      auth.breakglass.admin — set/rotate/unlock/remove credentials.
      auth.breakglass.login — actor uses break-glass to log in.
    CanonicalPermissions extended in lockstep.

internal/auth/breakglass/service.go (~580 LOC):

  - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
  - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
    salt=16 random bytes, output=32 bytes); per-password random salt;
    PHC-format hash output. Min 12 / max 256 byte input.
  - Authenticate: constant-time-compare via subtle.ConstantTimeCompare
    on every code path. Identical 401 + identical timing across the
    wrong-password / locked-account / non-existent-actor paths so an
    attacker cannot probe whether a given actor has break-glass
    configured. Non-existent-actor + locked-account paths run a
    verifyDummy() Argon2id pass for timing parity. Lockout state
    machine: failure_count++ on every wrong attempt; threshold (default
    5) trips locked_until = NOW() + duration (default 15m). Successful
    Authenticate resets the counter. Reset-window: failures aged out
    after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
    auto-reset on next attempt.
  - Unlock + RemoveCredential: admin-only (auth.breakglass.admin
    gated at the router via rbacGate). Audit rows on every operation.
  - All public methods refuse to act when Enabled()==false (returns
    ErrDisabled; the handler maps to HTTP 404 — surface invisibility).

internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.

internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:

  - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
    source IP via the existing rate limiter; returns 404 when
    disabled). On success sets the post-login session cookie + CSRF
    cookie via SessionService.Create + 204. On any failure:
    uniform 401 + identical timing (the service has already audited
    the specific failure category).
  - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
  - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
    (auth.breakglass.admin)
  - DELETE /api/v1/auth/breakglass/credentials/{actor_id}
    (auth.breakglass.admin)

Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.

Tests (internal/auth/breakglass/service_test.go):

All 8 Phase 7.5 spec-mandated negative cases:

  1. Service.Enabled()==false → all ops return ErrDisabled.
  2. Wrong password → ErrInvalidCredentials, failure_count++,
     audit row with event_category=auth.
  3. Failure_count exceeds threshold → locked, subsequent attempts
     (including with the CORRECT password) return identical-shape
     401 while the lockout window holds.
  4. Lockout window expires → next attempt with correct password
     succeeds + resets the counter.
  5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
  6. Password leak hygiene — the service has zero slog calls; the
     audit-row map literal never includes the password plaintext.
  7. Argon2id hash never appears in logs OR API responses — pinned
     by `json:"-"` tag on BreakglassCredential.PasswordHash + a
     belt-and-braces json.Marshal probe asserting the hash bytes
     never appear in the marshaled output.
  8. Constant-time-compare verified via timing-statistical test —
     wrong-password vs no-credential paths take statistically
     indistinguishable time (within 5x ratio). The verifyDummy()
     hash compute on the no-credential + locked paths is what
     keeps timing parity; absent that, an attacker could side-
     channel "actor doesn't have a credential" via timing.

Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.

Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).

cmd/server/main.go wiring:

  - Constructs breakglassRepo + breakglassService + breakglassHandler
    after the OIDC service block.
  - breakglassSessionMinterAdapter shim bridges *session.Service.Create
    to the breakglass.SessionMinter port.
  - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
    visibility for the deliberate SSO-bypass).

internal/config/config.go gains:

  - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
    Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
    CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
  - AuthConfig.Breakglass nested struct with 4 env vars
    (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
    + LOCKOUT_RESET_INTERVAL).

Router wiring:

  - 4 new breakglass routes registered when reg.AuthBreakglass != nil;
    public login route via direct r.mux.Handle (auth-exempt), 3 admin
    routes via r.Register + rbacGate(auth.breakglass.admin).
  - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
    allowlist with Phase 7.5 justification.
  - SpecParityExceptions extended with 4 new entries documenting
    the Phase 7.5 deferral of full per-endpoint OpenAPI rows
    (handler doc-block at the top of auth_breakglass.go is the
    operator-facing reference).

Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):

  - Break-glass is a deliberate bypass of the SSO security boundary.
    An attacker who phishes the password OR finds it in a compromised
    password manager bypasses MFA, OIDC, and every group-claim gate.
  - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
    state. Enable only during SSO-broken incidents. Disable after
    recovery.
  - WebAuthn pairing (v3 per Decision 12) is the load-bearing second
    factor. Without it, break-glass is best treated as an emergency-
    only path.
  - Audit trail surfaces every break-glass action under
    event_category=auth; the auditor role can monitor for unexpected
    break-glass logins.

Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
2026-05-10 06:51:41 +00:00