Commit Graph

2 Commits

Author SHA1 Message Date
shankar0123 1d01c87663 auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap +
break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)

Phase 7 — OIDC first-admin bootstrap (Decision 3):

  - Optional AdminBootstrapHook closure on *oidc.Service. When wired,
    HandleCallback consults the hook AFTER group resolution + user
    upsert and BEFORE the empty-mapping fail-closed check. Hook
    receives (providerID, groups, userID); returns grantAdmin=true
    when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
    admin exists yet in the tenant.
  - cmd/server/main.go wires the hook as a closure that:
      * Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
      * Probes AdminExists via authActorRoleRepo (admin-already-exists
        silently returns false; bootstrap mode is one-shot per tenant).
      * Walks group intersection.
      * On match: grants r-admin via authActorRoleRepo.Grant + emits
        the bootstrap.oidc_first_admin audit row with
        event_category=auth + INFO log.
  - Coexists with the Bundle 1 env-var-token bootstrap. Both paths
    can be configured; first match wins (admin-existence probe
    short-circuits the second).
  - HandleCallback's empty-mapping fail-closed check moved AFTER the
    hook so a fresh deployment with zero group_role_mappings can
    still mint the first admin.
  - 5 tests in service_test.go: hook grants admin on match, hook
    returns false preserves empty-mapping fail-closed, admin-already-
    exists silently falls through to normal mapping, hook-error wraps
    + bubbles, idempotent when admin is already in the mapped role set.

Phase 7.5 — Break-glass admin (Decision 4, default-OFF):

Migration 000038 ships:

  - breakglass_credentials table — at-most-one-credential-per-actor
    (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
    state machine (failure_count, locked_until, last_failure_at).
    FK CASCADE on users(id) so deleting a user atomically removes
    their credential.
  - Two new permissions seeded into r-admin only:
      auth.breakglass.admin — set/rotate/unlock/remove credentials.
      auth.breakglass.login — actor uses break-glass to log in.
    CanonicalPermissions extended in lockstep.

internal/auth/breakglass/service.go (~580 LOC):

  - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
  - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
    salt=16 random bytes, output=32 bytes); per-password random salt;
    PHC-format hash output. Min 12 / max 256 byte input.
  - Authenticate: constant-time-compare via subtle.ConstantTimeCompare
    on every code path. Identical 401 + identical timing across the
    wrong-password / locked-account / non-existent-actor paths so an
    attacker cannot probe whether a given actor has break-glass
    configured. Non-existent-actor + locked-account paths run a
    verifyDummy() Argon2id pass for timing parity. Lockout state
    machine: failure_count++ on every wrong attempt; threshold (default
    5) trips locked_until = NOW() + duration (default 15m). Successful
    Authenticate resets the counter. Reset-window: failures aged out
    after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
    auto-reset on next attempt.
  - Unlock + RemoveCredential: admin-only (auth.breakglass.admin
    gated at the router via rbacGate). Audit rows on every operation.
  - All public methods refuse to act when Enabled()==false (returns
    ErrDisabled; the handler maps to HTTP 404 — surface invisibility).

internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.

internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:

  - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
    source IP via the existing rate limiter; returns 404 when
    disabled). On success sets the post-login session cookie + CSRF
    cookie via SessionService.Create + 204. On any failure:
    uniform 401 + identical timing (the service has already audited
    the specific failure category).
  - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
  - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
    (auth.breakglass.admin)
  - DELETE /api/v1/auth/breakglass/credentials/{actor_id}
    (auth.breakglass.admin)

Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.

Tests (internal/auth/breakglass/service_test.go):

All 8 Phase 7.5 spec-mandated negative cases:

  1. Service.Enabled()==false → all ops return ErrDisabled.
  2. Wrong password → ErrInvalidCredentials, failure_count++,
     audit row with event_category=auth.
  3. Failure_count exceeds threshold → locked, subsequent attempts
     (including with the CORRECT password) return identical-shape
     401 while the lockout window holds.
  4. Lockout window expires → next attempt with correct password
     succeeds + resets the counter.
  5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
  6. Password leak hygiene — the service has zero slog calls; the
     audit-row map literal never includes the password plaintext.
  7. Argon2id hash never appears in logs OR API responses — pinned
     by `json:"-"` tag on BreakglassCredential.PasswordHash + a
     belt-and-braces json.Marshal probe asserting the hash bytes
     never appear in the marshaled output.
  8. Constant-time-compare verified via timing-statistical test —
     wrong-password vs no-credential paths take statistically
     indistinguishable time (within 5x ratio). The verifyDummy()
     hash compute on the no-credential + locked paths is what
     keeps timing parity; absent that, an attacker could side-
     channel "actor doesn't have a credential" via timing.

Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.

Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).

cmd/server/main.go wiring:

  - Constructs breakglassRepo + breakglassService + breakglassHandler
    after the OIDC service block.
  - breakglassSessionMinterAdapter shim bridges *session.Service.Create
    to the breakglass.SessionMinter port.
  - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
    visibility for the deliberate SSO-bypass).

internal/config/config.go gains:

  - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
    Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
    CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
  - AuthConfig.Breakglass nested struct with 4 env vars
    (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
    + LOCKOUT_RESET_INTERVAL).

Router wiring:

  - 4 new breakglass routes registered when reg.AuthBreakglass != nil;
    public login route via direct r.mux.Handle (auth-exempt), 3 admin
    routes via r.Register + rbacGate(auth.breakglass.admin).
  - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
    allowlist with Phase 7.5 justification.
  - SpecParityExceptions extended with 4 new entries documenting
    the Phase 7.5 deferral of full per-endpoint OpenAPI rows
    (handler doc-block at the top of auth_breakglass.go is the
    operator-facing reference).

Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):

  - Break-glass is a deliberate bypass of the SSO security boundary.
    An attacker who phishes the password OR finds it in a compromised
    password manager bypasses MFA, OIDC, and every group-claim gate.
  - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
    state. Enable only during SSO-broken incidents. Disable after
    recovery.
  - WebAuthn pairing (v3 per Decision 12) is the load-bearing second
    factor. Without it, break-glass is best treated as an emergency-
    only path.
  - Audit trail surfaces every break-glass action under
    event_category=auth; the auditor role can monitor for unexpected
    break-glass logins.

Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
2026-05-10 06:51:41 +00:00
shankar0123 b0ac24fbf8 auth-bundle-2 Phase 1: OIDC + Session + User + Breakglass domain types
Phase 1 ships the persisted-shape types Bundle 2 needs end-to-end.
No DB migrations, no service layer, no HTTP handlers; Phase 2 ships
the SQL, Phase 3+ ship the consumers. Each type has a Validate()
method that enforces the on-disk invariants the schema will mirror,
and a focused _test.go that pins each invariant's failure mode.

Per-package summary:

internal/auth/oidc/domain/ (OIDCProvider + GroupRoleMapping):
* OIDCProvider carries the operator-configured IdP record. Fields
  match the prompt's Phase 1 list plus IATWindowSeconds and
  JWKSCacheTTLSeconds (Phase 3 references these by name; landing
  them in Phase 1's domain type avoids the lying-field gap).
  ClientSecretEncrypted is opaque from this layer; it is the v2 blob
  produced by internal/crypto/encryption.go and is `json:"-"` so it
  never wire-leaks.
* Validate() rejects: invalid id prefix, empty name, non-https
  issuer_url (matches Phase 3's "JWKS endpoint MUST be HTTPS"),
  empty client_id, empty client_secret_encrypted, non-https
  redirect_uri, invalid groups_claim_format, scopes missing openid,
  IAT window outside (0, 600], JWKS cache TTL below 60s. Defaults
  applied in-place: GroupsClaimPath="groups", GroupsClaimFormat=
  "string-array", Scopes=["openid","profile","email"],
  IATWindowSeconds=300, JWKSCacheTTLSeconds=3600,
  TenantID="t-default".
* GroupRoleMapping carries the operator-configured group-to-role
  rule. Validate() pins prefix conventions ("grm-", "op-", "r-")
  and non-empty group name.
* 18 tests across happy-path + every negative invariant.

internal/auth/session/domain/ (Session + SessionSigningKey):
* Session covers BOTH the post-login row (full 1h-idle/8h-absolute
  cookie lifecycle) AND the Phase 5 pre-login row (10-minute TTL,
  carries OIDC state+nonce+PKCE verifier across the IdP redirect).
  IsPreLogin discriminates. CSRFTokenHash holds SHA-256 of the
  CSRF token plaintext (the plaintext lives in a JS-readable
  certctl_csrf cookie; storing only the hash on the row defends
  against DB-read leaks per the Phase 4 CSRF contract).
* Validate() pins: id prefix "ses-", non-empty actor id/type,
  signing key id prefix "sk-", AbsoluteExpiresAt strictly > Idle,
  IdleExpiresAt strictly > CreatedAt, CSRFTokenHash exactly 64
  lowercase hex chars when set.
* Cookie naming constants pinned by a separate test
  (TestCookieNamingConstants) so a future rename can't silently
  break the GUI's web/src/api/client.ts which reads these names by
  string.
* SessionSigningKey stores the v2-encrypted HMAC key material; the
  retired-before-created invariant catches malformed rows. 14
  tests across both types.

internal/auth/user/domain/ (User):
* Federated-human identity for SSO logins. Distinct from Bundle 1's
  free-form actor_id strings: actor_roles.actor_id = User.ID for
  federated humans (per the prompt's note about how the two
  identity systems intersect).
* WebAuthnCredentials JSONB column reserved for v3 (Decision 12);
  defaults to "[]" on Validate() so Bundle 2 + v3 share the same
  on-disk format from day one.
* Email validation is intentionally loose (basic shape: one @,
  non-empty local + domain, no whitespace, dot in domain). RFC 5321
  / 5322 grammars are not enforced; the IdP issued the email and
  we trust its shape, only rejecting gross corruption.
* 8 tests across happy-path + invalid-id + empty-email +
  malformed-email + invalid-provider-id + tenant defaulting +
  WebAuthn-credentials passthrough.

internal/auth/breakglass/domain/ (BreakglassCredential):
* Phase 7.5 type. Argon2id PHC-format password hash; Validate()
  pins the Argon2id magic prefix so non-Argon2id formats (bcrypt,
  pbkdf2, plaintext) are rejected at the persistence boundary.
* MinPasswordLengthBytes (12) + MaxPasswordLengthBytes (256)
  constants pinned by a dedicated test so the operator-facing
  password-strength contract can't drift silently.
* IsLocked(now) helper exposes the lockout state machine for the
  Phase 7.5 service to consume; the lockout window default is
  15min in the service layer.
* 9 tests across happy-path + per-invariant negative + lockout
  state machine + tenant defaulting.

Cross-cutting:
* Every type has json:"-" on the encrypted-credential field
  (ClientSecretEncrypted, KeyMaterialEncrypted, PasswordHash,
  CSRFTokenHash) so even a misconfigured handler that marshals the
  domain type directly into a response body cannot leak the
  secret. Mirrors Bundle 1's pattern for issuer/target credentials.
* Every type carries TenantID with Validate() defaulting to
  authdomain.DefaultTenantID. Forward-compat for the future
  managed-service multi-tenant activation; Bundle 2 ships
  single-tenant.

Verifications:
* gofmt -l clean across all 8 new files (one round-trip required to
  satisfy Go 1.19+ doc-comment list-formatting rules in
  session/domain/types.go).
* go vet clean on internal/auth/oidc/... + session/... + user/... +
  breakglass/...
* go test -short -count=1 green on all four new domain packages
  (49 test functions total).
* go test -short -count=1 still green on Bundle 1 packages
  (internal/auth, internal/auth/bootstrap, internal/service/auth,
  internal/config).
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.

Phase 1 exit criteria from cowork/auth-bundle-2-prompt.md:
* All types compile: yes.
* Validators have at least 5 test cases each: yes (smallest is
  User with 8 tests; OIDCProvider has 13).
* make verify equivalent green: gofmt + vet + go test pass
  (golangci-lint deferred to CI per the same operating-rule
  pattern Phase 0 used).
2026-05-10 03:41:46 +00:00