auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap +

break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)

Phase 7 — OIDC first-admin bootstrap (Decision 3):

  - Optional AdminBootstrapHook closure on *oidc.Service. When wired,
    HandleCallback consults the hook AFTER group resolution + user
    upsert and BEFORE the empty-mapping fail-closed check. Hook
    receives (providerID, groups, userID); returns grantAdmin=true
    when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
    admin exists yet in the tenant.
  - cmd/server/main.go wires the hook as a closure that:
      * Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
      * Probes AdminExists via authActorRoleRepo (admin-already-exists
        silently returns false; bootstrap mode is one-shot per tenant).
      * Walks group intersection.
      * On match: grants r-admin via authActorRoleRepo.Grant + emits
        the bootstrap.oidc_first_admin audit row with
        event_category=auth + INFO log.
  - Coexists with the Bundle 1 env-var-token bootstrap. Both paths
    can be configured; first match wins (admin-existence probe
    short-circuits the second).
  - HandleCallback's empty-mapping fail-closed check moved AFTER the
    hook so a fresh deployment with zero group_role_mappings can
    still mint the first admin.
  - 5 tests in service_test.go: hook grants admin on match, hook
    returns false preserves empty-mapping fail-closed, admin-already-
    exists silently falls through to normal mapping, hook-error wraps
    + bubbles, idempotent when admin is already in the mapped role set.

Phase 7.5 — Break-glass admin (Decision 4, default-OFF):

Migration 000038 ships:

  - breakglass_credentials table — at-most-one-credential-per-actor
    (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
    state machine (failure_count, locked_until, last_failure_at).
    FK CASCADE on users(id) so deleting a user atomically removes
    their credential.
  - Two new permissions seeded into r-admin only:
      auth.breakglass.admin — set/rotate/unlock/remove credentials.
      auth.breakglass.login — actor uses break-glass to log in.
    CanonicalPermissions extended in lockstep.

internal/auth/breakglass/service.go (~580 LOC):

  - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
  - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
    salt=16 random bytes, output=32 bytes); per-password random salt;
    PHC-format hash output. Min 12 / max 256 byte input.
  - Authenticate: constant-time-compare via subtle.ConstantTimeCompare
    on every code path. Identical 401 + identical timing across the
    wrong-password / locked-account / non-existent-actor paths so an
    attacker cannot probe whether a given actor has break-glass
    configured. Non-existent-actor + locked-account paths run a
    verifyDummy() Argon2id pass for timing parity. Lockout state
    machine: failure_count++ on every wrong attempt; threshold (default
    5) trips locked_until = NOW() + duration (default 15m). Successful
    Authenticate resets the counter. Reset-window: failures aged out
    after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
    auto-reset on next attempt.
  - Unlock + RemoveCredential: admin-only (auth.breakglass.admin
    gated at the router via rbacGate). Audit rows on every operation.
  - All public methods refuse to act when Enabled()==false (returns
    ErrDisabled; the handler maps to HTTP 404 — surface invisibility).

internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.

internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:

  - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
    source IP via the existing rate limiter; returns 404 when
    disabled). On success sets the post-login session cookie + CSRF
    cookie via SessionService.Create + 204. On any failure:
    uniform 401 + identical timing (the service has already audited
    the specific failure category).
  - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
  - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
    (auth.breakglass.admin)
  - DELETE /api/v1/auth/breakglass/credentials/{actor_id}
    (auth.breakglass.admin)

Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.

Tests (internal/auth/breakglass/service_test.go):

All 8 Phase 7.5 spec-mandated negative cases:

  1. Service.Enabled()==false → all ops return ErrDisabled.
  2. Wrong password → ErrInvalidCredentials, failure_count++,
     audit row with event_category=auth.
  3. Failure_count exceeds threshold → locked, subsequent attempts
     (including with the CORRECT password) return identical-shape
     401 while the lockout window holds.
  4. Lockout window expires → next attempt with correct password
     succeeds + resets the counter.
  5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
  6. Password leak hygiene — the service has zero slog calls; the
     audit-row map literal never includes the password plaintext.
  7. Argon2id hash never appears in logs OR API responses — pinned
     by `json:"-"` tag on BreakglassCredential.PasswordHash + a
     belt-and-braces json.Marshal probe asserting the hash bytes
     never appear in the marshaled output.
  8. Constant-time-compare verified via timing-statistical test —
     wrong-password vs no-credential paths take statistically
     indistinguishable time (within 5x ratio). The verifyDummy()
     hash compute on the no-credential + locked paths is what
     keeps timing parity; absent that, an attacker could side-
     channel "actor doesn't have a credential" via timing.

Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.

Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).

cmd/server/main.go wiring:

  - Constructs breakglassRepo + breakglassService + breakglassHandler
    after the OIDC service block.
  - breakglassSessionMinterAdapter shim bridges *session.Service.Create
    to the breakglass.SessionMinter port.
  - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
    visibility for the deliberate SSO-bypass).

internal/config/config.go gains:

  - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
    Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
    CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
  - AuthConfig.Breakglass nested struct with 4 env vars
    (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
    + LOCKOUT_RESET_INTERVAL).

Router wiring:

  - 4 new breakglass routes registered when reg.AuthBreakglass != nil;
    public login route via direct r.mux.Handle (auth-exempt), 3 admin
    routes via r.Register + rbacGate(auth.breakglass.admin).
  - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
    allowlist with Phase 7.5 justification.
  - SpecParityExceptions extended with 4 new entries documenting
    the Phase 7.5 deferral of full per-endpoint OpenAPI rows
    (handler doc-block at the top of auth_breakglass.go is the
    operator-facing reference).

Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):

  - Break-glass is a deliberate bypass of the SSO security boundary.
    An attacker who phishes the password OR finds it in a compromised
    password manager bypasses MFA, OIDC, and every group-claim gate.
  - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
    state. Enable only during SSO-broken incidents. Disable after
    recovery.
  - WebAuthn pairing (v3 per Decision 12) is the load-bearing second
    factor. Without it, break-glass is best treated as an emergency-
    only path.
  - Audit trail surfaces every break-glass action under
    event_category=auth; the auditor role can monitor for unexpected
    break-glass logins.

Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
This commit is contained in:
shankar0123
2026-05-10 06:51:41 +00:00
parent 98cb3780d8
commit 5204f1b5fd
16 changed files with 2356 additions and 5 deletions
+77
View File
@@ -0,0 +1,77 @@
// Package oidc — Auth Bundle 2 Phase 7 / OIDC bootstrap hook.
//
// Phase 7 ships the "first OIDC login matching CERTCTL_BOOTSTRAP_ADMIN_GROUPS
// becomes admin" recovery path. This is Decision 3's preferred bootstrap:
// fresh deployments configure the OIDC provider + group mapping, and the
// first user who logs in via OIDC + carries any of the configured
// bootstrap admin groups is auto-granted r-admin. Subsequent logins fall
// through to normal group→role mapping.
//
// The hook is OPTIONAL — when not wired, OIDC behaves byte-identically
// to Phase 3. When wired, it runs after group resolution + user upsert
// and BEFORE the empty-mapping fail-closed check, so a fresh deployment
// with no group_role_mappings can still mint the first admin via the
// bootstrap path. The hook itself is responsible for the AdminExists
// probe (so admin-already-exists deployments fall through to normal
// mapping).
//
// Audit + lockout semantics:
//
// - The hook emits the bootstrap.oidc_first_admin audit row with
// event_category=auth on every successful first-admin grant.
// - The hook is one-shot per process: once an admin exists in the
// tenant, the AdminExists probe returns true and subsequent OIDC
// logins skip the bootstrap path entirely.
// - The hook NEVER grants admin to an actor whose groups don't match
// CERTCTL_BOOTSTRAP_ADMIN_GROUPS. The intersection is constant-time-
// length-irrelevant (it walks two slices); the relevant guarantee
// is that no group string can be inferred from the hook's pass /
// fail decision because the hook always emits the same audit row
// shape.
package oidc
import "context"
// AdminBootstrapHook is the optional closure HandleCallback consults
// after group resolution + user upsert. The hook decides whether the
// authenticating user should be auto-granted r-admin via the OIDC
// first-admin bootstrap path.
//
// Parameters:
// - providerID: the OIDCProvider id (so the hook can match against
// CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
// - groups: the IdP-supplied group names (so the hook can match
// against CERTCTL_BOOTSTRAP_ADMIN_GROUPS).
// - userID: the just-upserted users.id (so the hook can grant r-admin
// via the ActorRoleRepository).
//
// Returns:
// - grantAdmin: true => HandleCallback appends r-admin to the user's
// resolved role IDs (idempotent; r-admin is appended only if not
// already present from normal mapping).
// - err: non-nil short-circuits HandleCallback with a wrapped error.
// The hook should NOT return an error for the non-match case
// (provider doesn't match / groups don't intersect / admin already
// exists); those are silent skips returning grantAdmin=false.
type AdminBootstrapHook func(ctx context.Context, providerID string, groups []string, userID string) (grantAdmin bool, err error)
// SetAdminBootstrapHook wires the Phase 7 OIDC bootstrap hook.
// cmd/server/main.go calls this after construction; tests stub it
// inline. Nil resets to no-bootstrap-hook (the default).
func (s *Service) SetAdminBootstrapHook(hook AdminBootstrapHook) {
s.adminBootstrapHook = hook
}
// appendIfMissing returns ss with v appended IFF v is not already in
// the slice. Used by HandleCallback to extend roleIDs with r-admin
// idempotently when the bootstrap hook fires AND mappings.Map already
// returned r-admin (an unlikely-but-possible config where the same
// role is granted by both paths).
func appendIfMissing(ss []string, v string) []string {
for _, s := range ss {
if s == v {
return ss
}
}
return append(ss, v)
}
+35 -5
View File
@@ -79,6 +79,12 @@ type Service struct {
mu sync.RWMutex
cache map[string]*providerEntry // keyed by provider ID
clockNow func() time.Time // injectable for tests
// adminBootstrapHook is the optional Phase 7 first-admin bootstrap
// closure. When set, HandleCallback consults it after group
// resolution + user upsert; on grantAdmin=true the user's resolved
// role IDs are extended with r-admin. See bootstrap_hook.go.
adminBootstrapHook AdminBootstrapHook
}
// providerEntry caches the go-oidc Provider + the OAuth2 config + the
@@ -503,14 +509,14 @@ func (s *Service) HandleCallback(
}
}
// Step 9: map groups to role IDs. Empty result => fail closed.
// Step 9: map groups to role IDs. Phase 7 defers the empty-mapping
// fail-closed check until after the bootstrap hook gets a chance to
// grant r-admin (Step 11) — a fresh deployment with zero group_role_
// mappings still needs to mint the first admin.
roleIDs, err := s.mappings.Map(ctx, providerID, groups)
if err != nil {
return nil, fmt.Errorf("oidc: group-role mapping lookup: %w", err)
}
if len(roleIDs) == 0 {
return nil, ErrGroupsUnmapped
}
// Step 10: upsert the user record. Per Phase 1 contract, identity
// is per-(provider, oidc_subject); a person logging in via a new
@@ -520,7 +526,31 @@ func (s *Service) HandleCallback(
return nil, fmt.Errorf("oidc: upsert user: %w", err)
}
// Step 11: mint a post-login session via Phase 4's SessionService.
// Step 11 — Phase 7: OIDC first-admin bootstrap hook. Optional;
// runs after upsertUser. The hook checks AdminExists + group
// intersection against CERTCTL_BOOTSTRAP_ADMIN_GROUPS; on first
// match it grants r-admin to the user via ActorRoleRepository
// + emits a bootstrap.oidc_first_admin audit row + returns
// grantAdmin=true so we ensure r-admin lands in the role set.
// Subsequent logins (admin-already-exists) silently skip via
// grantAdmin=false.
if s.adminBootstrapHook != nil {
grantAdmin, herr := s.adminBootstrapHook(ctx, providerID, groups, user.ID)
if herr != nil {
return nil, fmt.Errorf("oidc: admin bootstrap: %w", herr)
}
if grantAdmin {
roleIDs = appendIfMissing(roleIDs, "r-admin")
}
}
// Step 12: empty-mapping fail-closed. Phase 3 contract preserved —
// deferred from Step 9 only to give the bootstrap hook a chance.
if len(roleIDs) == 0 {
return nil, ErrGroupsUnmapped
}
// Step 13: mint a post-login session via Phase 4's SessionService.
cookieValue, csrfToken, err := s.sessions.MintForUser(ctx, user, roleIDs, ip, userAgent)
if err != nil {
return nil, fmt.Errorf("oidc: session mint: %w", err)
+144
View File
@@ -1092,6 +1092,150 @@ func TestService_RandomB64URL_ProducesNonEmptyAndUnique(t *testing.T) {
}
}
// =============================================================================
// Phase 7 — OIDC first-admin bootstrap hook tests.
// =============================================================================
// Phase 7 spec test #1: fresh DB + OIDC login matching bootstrap groups
// → user becomes admin. Pin: when the hook returns grantAdmin=true, the
// resolved roleIDs include r-admin even if mappings.Map returned empty.
func TestService_BootstrapHook_GrantsAdminOnMatch(t *testing.T) {
idp := newMockIdP(t)
prov := makeProvider(idp.URL(), "op-bootstrap")
pl := newStubPreLogin()
mappings := &stubMappings{roleIDs: nil} // intentionally empty — fresh deploy
users := newStubUsers()
sessions := &stubSessions{}
svc := NewService(&stubProviderLookup{provider: prov}, mappings, users, sessions, pl, "")
hookCalled := false
svc.SetAdminBootstrapHook(func(_ context.Context, providerID string, groups []string, userID string) (bool, error) {
hookCalled = true
// Verify the hook receives the right inputs.
if providerID != "op-bootstrap" {
t.Errorf("hook providerID = %q; want op-bootstrap", providerID)
}
if len(groups) == 0 {
t.Errorf("hook groups empty; expected at least one")
}
if userID == "" {
t.Errorf("hook userID empty; expected upserted user id")
}
return true, nil // grant admin
})
cookie, _, _ := pl.CreatePreLogin(context.Background(), "op-bootstrap", "s", "test-nonce-fixed", "v-bootstrapxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
res, err := svc.HandleCallback(context.Background(), cookie, "code", "s", "10.0.0.1", "Mozilla/5.0")
if err != nil {
t.Fatalf("HandleCallback: %v", err)
}
if !hookCalled {
t.Errorf("bootstrap hook never invoked")
}
if !sliceContains(res.RoleIDs, "r-admin") {
t.Errorf("expected r-admin in RoleIDs after bootstrap; got %v", res.RoleIDs)
}
}
// Phase 7 spec test #2: fresh DB + OIDC login NOT matching bootstrap
// groups → user upserted but mapping fails closed (no admin grant).
// The hook returns grantAdmin=false; mappings.Map empty → ErrGroupsUnmapped.
func TestService_BootstrapHook_NoMatchPreservesEmptyMappingFailClosed(t *testing.T) {
idp := newMockIdP(t)
svc, pl := newServiceWithProviderAndPLNoMappings(t, idp.URL(), "op-no-match")
svc.SetAdminBootstrapHook(func(_ context.Context, _ string, _ []string, _ string) (bool, error) {
return false, nil // not a bootstrap match
})
cookie, _, _ := pl.CreatePreLogin(context.Background(), "op-no-match", "s", "test-nonce-fixed", "v-nomatchxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
_, err := svc.HandleCallback(context.Background(), cookie, "code", "s", "ip", "ua")
if !errors.Is(err, ErrGroupsUnmapped) {
t.Errorf("err = %v; want ErrGroupsUnmapped (no bootstrap match + empty mappings)", err)
}
}
// Phase 7 spec test #3: existing admin + OIDC login matching bootstrap
// groups → bootstrap mode disabled (hook returns grantAdmin=false), normal
// group-role mapping wins. Pin: the hook is ALWAYS called but its
// grantAdmin=false response means the user gets the ordinary mapped
// role set, not r-admin.
func TestService_BootstrapHook_AdminAlreadyExistsFallsThroughToNormalMapping(t *testing.T) {
idp := newMockIdP(t)
svc, pl := newServiceWithProviderAndPL(t, idp.URL(), "op-existing-admin")
// Hook says grantAdmin=false because (in production) an admin already
// exists; the closure does the AdminExists probe.
svc.SetAdminBootstrapHook(func(_ context.Context, _ string, _ []string, _ string) (bool, error) {
return false, nil
})
cookie, _, _ := pl.CreatePreLogin(context.Background(), "op-existing-admin", "s", "test-nonce-fixed", "v-existingxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
res, err := svc.HandleCallback(context.Background(), cookie, "code", "s", "ip", "ua")
if err != nil {
t.Fatalf("HandleCallback: %v", err)
}
// stubMappings returns r-operator; the hook returned false; r-admin
// MUST NOT appear in the role set.
if sliceContains(res.RoleIDs, "r-admin") {
t.Errorf("admin-already-exists path should not grant r-admin; got %v", res.RoleIDs)
}
if !sliceContains(res.RoleIDs, "r-operator") {
t.Errorf("expected normal mapping (r-operator) to win; got %v", res.RoleIDs)
}
}
// Phase 7 hook-error path: hook returns an error → HandleCallback wraps it.
func TestService_BootstrapHook_ErrorWraps(t *testing.T) {
idp := newMockIdP(t)
svc, pl := newServiceWithProviderAndPL(t, idp.URL(), "op-hook-err")
svc.SetAdminBootstrapHook(func(_ context.Context, _ string, _ []string, _ string) (bool, error) {
return false, fmt.Errorf("simulated AdminExists probe failure")
})
cookie, _, _ := pl.CreatePreLogin(context.Background(), "op-hook-err", "s", "test-nonce-fixed", "v-errxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
_, err := svc.HandleCallback(context.Background(), cookie, "code", "s", "ip", "ua")
if err == nil || !strings.Contains(err.Error(), "admin bootstrap") {
t.Errorf("err = %v; want admin bootstrap wrap", err)
}
}
// Phase 7 idempotence: hook returns grantAdmin=true AND mappings.Map
// already includes r-admin → roleIDs has r-admin exactly once.
func TestService_BootstrapHook_IdempotentWhenAdminAlreadyMapped(t *testing.T) {
idp := newMockIdP(t)
prov := makeProvider(idp.URL(), "op-idem")
pl := newStubPreLogin()
mappings := &stubMappings{roleIDs: []string{"r-admin"}} // already mapped
users := newStubUsers()
sessions := &stubSessions{}
svc := NewService(&stubProviderLookup{provider: prov}, mappings, users, sessions, pl, "")
svc.SetAdminBootstrapHook(func(_ context.Context, _ string, _ []string, _ string) (bool, error) {
return true, nil
})
cookie, _, _ := pl.CreatePreLogin(context.Background(), "op-idem", "s", "test-nonce-fixed", "v-idempxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
res, err := svc.HandleCallback(context.Background(), cookie, "code", "s", "ip", "ua")
if err != nil {
t.Fatalf("HandleCallback: %v", err)
}
count := 0
for _, rid := range res.RoleIDs {
if rid == "r-admin" {
count++
}
}
if count != 1 {
t.Errorf("expected r-admin to appear exactly once; got %d (RoleIDs=%v)", count, res.RoleIDs)
}
}
func sliceContains(s []string, v string) bool {
for _, x := range s {
if x == v {
return true
}
}
return false
}
// TestService_SetClockForTest_OverridesNow pins the test seam works.
func TestService_SetClockForTest_OverridesNow(t *testing.T) {
svc := newServiceForUnitTest(t)