mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:41:30 +00:00
1d01c87663
break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)
Phase 7 — OIDC first-admin bootstrap (Decision 3):
- Optional AdminBootstrapHook closure on *oidc.Service. When wired,
HandleCallback consults the hook AFTER group resolution + user
upsert and BEFORE the empty-mapping fail-closed check. Hook
receives (providerID, groups, userID); returns grantAdmin=true
when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
admin exists yet in the tenant.
- cmd/server/main.go wires the hook as a closure that:
* Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
* Probes AdminExists via authActorRoleRepo (admin-already-exists
silently returns false; bootstrap mode is one-shot per tenant).
* Walks group intersection.
* On match: grants r-admin via authActorRoleRepo.Grant + emits
the bootstrap.oidc_first_admin audit row with
event_category=auth + INFO log.
- Coexists with the Bundle 1 env-var-token bootstrap. Both paths
can be configured; first match wins (admin-existence probe
short-circuits the second).
- HandleCallback's empty-mapping fail-closed check moved AFTER the
hook so a fresh deployment with zero group_role_mappings can
still mint the first admin.
- 5 tests in service_test.go: hook grants admin on match, hook
returns false preserves empty-mapping fail-closed, admin-already-
exists silently falls through to normal mapping, hook-error wraps
+ bubbles, idempotent when admin is already in the mapped role set.
Phase 7.5 — Break-glass admin (Decision 4, default-OFF):
Migration 000038 ships:
- breakglass_credentials table — at-most-one-credential-per-actor
(UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
state machine (failure_count, locked_until, last_failure_at).
FK CASCADE on users(id) so deleting a user atomically removes
their credential.
- Two new permissions seeded into r-admin only:
auth.breakglass.admin — set/rotate/unlock/remove credentials.
auth.breakglass.login — actor uses break-glass to log in.
CanonicalPermissions extended in lockstep.
internal/auth/breakglass/service.go (~580 LOC):
- Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
- SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
salt=16 random bytes, output=32 bytes); per-password random salt;
PHC-format hash output. Min 12 / max 256 byte input.
- Authenticate: constant-time-compare via subtle.ConstantTimeCompare
on every code path. Identical 401 + identical timing across the
wrong-password / locked-account / non-existent-actor paths so an
attacker cannot probe whether a given actor has break-glass
configured. Non-existent-actor + locked-account paths run a
verifyDummy() Argon2id pass for timing parity. Lockout state
machine: failure_count++ on every wrong attempt; threshold (default
5) trips locked_until = NOW() + duration (default 15m). Successful
Authenticate resets the counter. Reset-window: failures aged out
after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
auto-reset on next attempt.
- Unlock + RemoveCredential: admin-only (auth.breakglass.admin
gated at the router via rbacGate). Audit rows on every operation.
- All public methods refuse to act when Enabled()==false (returns
ErrDisabled; the handler maps to HTTP 404 — surface invisibility).
internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.
internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:
- POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
source IP via the existing rate limiter; returns 404 when
disabled). On success sets the post-login session cookie + CSRF
cookie via SessionService.Create + 204. On any failure:
uniform 401 + identical timing (the service has already audited
the specific failure category).
- POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
- POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
(auth.breakglass.admin)
- DELETE /api/v1/auth/breakglass/credentials/{actor_id}
(auth.breakglass.admin)
Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.
Tests (internal/auth/breakglass/service_test.go):
All 8 Phase 7.5 spec-mandated negative cases:
1. Service.Enabled()==false → all ops return ErrDisabled.
2. Wrong password → ErrInvalidCredentials, failure_count++,
audit row with event_category=auth.
3. Failure_count exceeds threshold → locked, subsequent attempts
(including with the CORRECT password) return identical-shape
401 while the lockout window holds.
4. Lockout window expires → next attempt with correct password
succeeds + resets the counter.
5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
6. Password leak hygiene — the service has zero slog calls; the
audit-row map literal never includes the password plaintext.
7. Argon2id hash never appears in logs OR API responses — pinned
by `json:"-"` tag on BreakglassCredential.PasswordHash + a
belt-and-braces json.Marshal probe asserting the hash bytes
never appear in the marshaled output.
8. Constant-time-compare verified via timing-statistical test —
wrong-password vs no-credential paths take statistically
indistinguishable time (within 5x ratio). The verifyDummy()
hash compute on the no-credential + locked paths is what
keeps timing parity; absent that, an attacker could side-
channel "actor doesn't have a credential" via timing.
Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.
Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).
cmd/server/main.go wiring:
- Constructs breakglassRepo + breakglassService + breakglassHandler
after the OIDC service block.
- breakglassSessionMinterAdapter shim bridges *session.Service.Create
to the breakglass.SessionMinter port.
- Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
visibility for the deliberate SSO-bypass).
internal/config/config.go gains:
- AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
- AuthConfig.Breakglass nested struct with 4 env vars
(CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
+ LOCKOUT_RESET_INTERVAL).
Router wiring:
- 4 new breakglass routes registered when reg.AuthBreakglass != nil;
public login route via direct r.mux.Handle (auth-exempt), 3 admin
routes via r.Register + rbacGate(auth.breakglass.admin).
- POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
allowlist with Phase 7.5 justification.
- SpecParityExceptions extended with 4 new entries documenting
the Phase 7.5 deferral of full per-endpoint OpenAPI rows
(handler doc-block at the top of auth_breakglass.go is the
operator-facing reference).
Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):
- Break-glass is a deliberate bypass of the SSO security boundary.
An attacker who phishes the password OR finds it in a compromised
password manager bypasses MFA, OIDC, and every group-claim gate.
- Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
state. Enable only during SSO-broken incidents. Disable after
recovery.
- WebAuthn pairing (v3 per Decision 12) is the load-bearing second
factor. Without it, break-glass is best treated as an emergency-
only path.
- Audit trail surfaces every break-glass action under
event_category=auth; the auditor role can monitor for unexpected
break-glass logins.
Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
200 lines
6.0 KiB
Go
200 lines
6.0 KiB
Go
package auth
|
|
|
|
// Seed identifiers and constants used by the Phase 1 migration and the
|
|
// service / handler layers. Centralised here so production code, tests,
|
|
// and migration SQL stay in lockstep on the canonical role / permission
|
|
// names.
|
|
|
|
// DefaultTenantID is the seeded tenant created by migration
|
|
// 000029_rbac.up.sql. Bundle 1 ships single-tenant; every actor_role
|
|
// row carries this tenant_id by default.
|
|
const DefaultTenantID = "t-default"
|
|
|
|
// Seeded role IDs. Stable identifiers used by the migration backfill
|
|
// and the demo-mode synthetic-actor seed.
|
|
const (
|
|
RoleIDAdmin = "r-admin"
|
|
RoleIDOperator = "r-operator"
|
|
RoleIDViewer = "r-viewer"
|
|
RoleIDAgent = "r-agent"
|
|
RoleIDMCP = "r-mcp"
|
|
RoleIDCLI = "r-cli"
|
|
RoleIDAuditor = "r-auditor"
|
|
)
|
|
|
|
// DemoAnonActorID is the synthetic actor used when
|
|
// CERTCTL_AUTH_TYPE=none is configured (the demo path). Phase 1
|
|
// migration seeds the actor + admin role assignment unconditionally;
|
|
// Phase 3 of Bundle 1 wires the middleware to inject this actor into
|
|
// the request context when no-auth mode is active. Reserved system
|
|
// actor: the API rejects mutations / deletions targeting this id.
|
|
const DemoAnonActorID = "actor-demo-anon"
|
|
|
|
// CanonicalPermissions is the canonical Bundle 1 permission catalog,
|
|
// seeded by migration 000029_rbac.up.sql. Bundle 2 extends with
|
|
// auth.session.* and auth.oidc.* permissions (those land in Bundle 2
|
|
// Phase 5's migration).
|
|
//
|
|
// Naming convention: <namespace>.<verb>. Read permissions use
|
|
// `<resource>.read`; mutations use `.create`, `.edit`, `.delete`,
|
|
// `.assign`, `.revoke`, `.use`, `.export`, etc. The catalog is the
|
|
// single source of truth referenced by:
|
|
// - migration 000029_rbac.up.sql (seeds the rows)
|
|
// - service layer (RoleService.Create rejects unknown permissions)
|
|
// - handler layer (auth.RequirePermission perm string)
|
|
var CanonicalPermissions = []string{
|
|
// Certificate lifecycle
|
|
"cert.read",
|
|
"cert.issue",
|
|
"cert.revoke",
|
|
"cert.delete",
|
|
|
|
// Profile management
|
|
"profile.read",
|
|
"profile.edit",
|
|
"profile.delete",
|
|
|
|
// Issuer management
|
|
"issuer.read",
|
|
"issuer.edit",
|
|
"issuer.delete",
|
|
|
|
// Target management
|
|
"target.read",
|
|
"target.edit",
|
|
"target.delete",
|
|
|
|
// Agent management
|
|
"agent.read",
|
|
"agent.edit",
|
|
"agent.retire",
|
|
"agent.heartbeat",
|
|
"agent.job.poll",
|
|
"agent.job.complete",
|
|
"agent.job.report",
|
|
|
|
// Audit access (Phase 8 introduces the auditor split)
|
|
"audit.read",
|
|
"audit.export",
|
|
|
|
// RBAC primitive (Phase 4 surfaces these via /v1/auth/roles)
|
|
"auth.role.list",
|
|
"auth.role.create",
|
|
"auth.role.edit",
|
|
"auth.role.delete",
|
|
"auth.role.assign",
|
|
"auth.role.revoke",
|
|
|
|
// API-key management (Phase 4 + Phase 7 scope-down)
|
|
"auth.key.list",
|
|
"auth.key.create",
|
|
"auth.key.rotate",
|
|
"auth.key.delete",
|
|
|
|
// Bootstrap path (Phase 6)
|
|
"auth.bootstrap.use",
|
|
|
|
// Bundle 1 Phase 3.5: admin-only fine-grained perms for the
|
|
// legacy admin handlers, seeded by migration 000030. Wrapped at
|
|
// the router level via auth.RequirePermission middleware; the
|
|
// in-handler auth.IsAdmin checks have been removed in Phase 3.5.
|
|
"cert.bulk_revoke",
|
|
"crl.admin",
|
|
"scep.admin",
|
|
"est.admin",
|
|
"ca.hierarchy.manage",
|
|
|
|
// Bundle 2 Phase 5 — session + OIDC management permissions
|
|
// seeded by migration 000037. auth.session.list / .revoke gate
|
|
// "list/revoke any session in tenant" (own-session paths bypass
|
|
// the gate via "is path.actor_id == ctx.actor_id?" check at the
|
|
// handler layer); auth.session.list.all gates the all-actors
|
|
// admin view. auth.oidc.{list,create,edit,delete} gates the
|
|
// OIDC-provider-config + group-mapping CRUD endpoints.
|
|
"auth.session.list",
|
|
"auth.session.list.all",
|
|
"auth.session.revoke",
|
|
"auth.oidc.list",
|
|
"auth.oidc.create",
|
|
"auth.oidc.edit",
|
|
"auth.oidc.delete",
|
|
|
|
// Bundle 2 Phase 7.5 — break-glass admin permissions seeded by
|
|
// migration 000038. auth.breakglass.admin gates set/rotate/unlock/
|
|
// remove operations on any actor's break-glass credential.
|
|
// auth.breakglass.login is granted to each actor when their
|
|
// break-glass credential is set, so they can use the local-
|
|
// password recovery path during SSO outages. The whole surface
|
|
// is gated on CERTCTL_BREAKGLASS_ENABLED at the service layer
|
|
// (Service.Enabled() short-circuits every operation when false).
|
|
"auth.breakglass.admin",
|
|
"auth.breakglass.login",
|
|
}
|
|
|
|
// DefaultRoles describes the seven default roles seeded by the
|
|
// migration, mapped to the permissions each role holds at global
|
|
// scope. Permissions not in CanonicalPermissions cause the migration
|
|
// to fail-closed.
|
|
var DefaultRoles = map[string][]string{
|
|
RoleIDAdmin: CanonicalPermissions, // admin gets every permission
|
|
|
|
RoleIDOperator: {
|
|
"cert.read", "cert.issue", "cert.revoke", "cert.delete",
|
|
"profile.read", "profile.edit",
|
|
"issuer.read", "issuer.edit",
|
|
"target.read", "target.edit", "target.delete",
|
|
"agent.read", "agent.edit",
|
|
"audit.read",
|
|
},
|
|
|
|
RoleIDViewer: {
|
|
"cert.read",
|
|
"profile.read",
|
|
"issuer.read",
|
|
"target.read",
|
|
"agent.read",
|
|
"audit.read",
|
|
},
|
|
|
|
RoleIDAgent: {
|
|
"cert.read",
|
|
"agent.heartbeat",
|
|
"agent.job.poll",
|
|
"agent.job.complete",
|
|
"agent.job.report",
|
|
},
|
|
|
|
RoleIDMCP: {
|
|
// MCP gets operator-equivalent minus destructive ops.
|
|
// Defense in depth for Claude / IDE integrations where
|
|
// destructive verbs warrant additional scrutiny.
|
|
"cert.read", "cert.issue", "cert.revoke",
|
|
"profile.read", "profile.edit",
|
|
"issuer.read", "issuer.edit",
|
|
"target.read", "target.edit",
|
|
"agent.read",
|
|
"audit.read",
|
|
},
|
|
|
|
RoleIDCLI: {
|
|
// CLI = operator-equivalent. Operators can scope down via
|
|
// `certctl auth keys scope-down` if they want narrower CLI
|
|
// access in production.
|
|
"cert.read", "cert.issue", "cert.revoke", "cert.delete",
|
|
"profile.read", "profile.edit",
|
|
"issuer.read", "issuer.edit",
|
|
"target.read", "target.edit", "target.delete",
|
|
"agent.read", "agent.edit",
|
|
"audit.read",
|
|
"auth.key.list", "auth.key.create", "auth.key.rotate",
|
|
},
|
|
|
|
RoleIDAuditor: {
|
|
// Phase 8 ships the auditor split. Phase 1 reserves the
|
|
// role id + the read-only permission set so subsequent
|
|
// phases don't have to renumber.
|
|
"audit.read",
|
|
"audit.export",
|
|
},
|
|
}
|