mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:01:32 +00:00
90210c93348959fa0f62decfc35627b40f763a42
358 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
90210c9334 |
fix(oidc/prelogin): encrypt state/nonce/PKCE-verifier at rest (HIGH-5)
Pre-login rows previously persisted the OIDC state, nonce, and PKCE
verifier as plaintext columns; an operator restoring an unredacted
backup of oidc_pre_login_sessions to a debug environment leaked every
in-flight handshake. If the IdP also leaked the auth code in the same
window (logged at a misconfigured TLS terminator, etc.), the attacker
could exchange code + verifier directly. RFC 7636 §7 requires verifier
confidentiality.
This commit:
- Migration 000041 adds {state,nonce,pkce_verifier}_enc BYTEA columns
and makes the legacy plaintext columns nullable. A follow-up
migration drops the plaintext columns once the rolling deploy
completes.
- internal/repository/postgres/oidc_prelogin.go::Create encrypts the
three secrets via crypto.EncryptIfKeySet (v3 magic 0x03 + per-row
salt + nonce + AES-256-GCM tag) and writes only the encrypted
columns; legacy plaintext stays NULL on the write path.
- LookupAndConsume prefers encrypted columns via materialize(),
falling back to the legacy plaintext only when _enc is NULL — the
rolling-deploy compat layer that 000042 will retire.
- NewPreLoginRepository takes encryptionKey; cmd/server/main.go threads
cfg.Encryption.ConfigEncryptionKey in.
- Encryption key reuses CERTCTL_CONFIG_ENCRYPTION_KEY (same passphrase
already protecting OIDC client secrets and SessionSigningKey material).
No new env var.
Why encryption-at-rest, not HMAC: the spec's HMAC approach required
moving plaintext into the cookie (the cookie currently carries only
row ID + HMAC). Re-shaping the cookie wire format would be a larger
refactor; the audit explicitly admits encryption-at-rest is an
acceptable closure (weaker because backups still contain decryptable
ciphertext, but the encryption key is held separately from the DB
backup, and the 10-minute TTL further bounds usable secret window).
Three new regression tests in oidc_prelogin_encryption_test.go pin:
(a) _enc columns contain v3-format ciphertext, NOT plaintext
substrings, post-Create
(b) legacy plaintext columns are NULL post-Create (defends against
future patches that re-introduce plaintext writes)
(c) LookupAndConsume round-trips state/nonce/verifier byte-for-byte
A fourth test pins the legacy-row fallback for rolling-deploy compat.
Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-5
Spec: cowork/auth-bundles-fixes-2026-05-10/09-high-5-prelogin-secret-protection.md
|
||
|
|
0f340beb14 |
fix(auth/ux): cause-aware OIDC + session error surfacing (HIGH-7 + HIGH-8 closure)
Server (HIGH-7): the OIDC callback failure path now 302-redirects to /login?error=oidc_failed&reason=<category> instead of emitting a blank 400. `category` is the existing audit `failure_category` value; classifyOIDCFailure was extended with three new sentinel paths (email_domain_not_allowed, email_missing_but_required, pkce_invalid) so CRIT-5 + PKCE failures get distinguishable GUI rendering. Audit-log observability is unchanged — the same failure_category is written to the auth.oidc_login_failed audit row; the 302 is purely a UX leg layered on top. Server (HIGH-8): SessionMiddleware now stashes a cause classification on the request context when Validate returns an error, mapping the sentinels via classifySessionError (errors.Is-based, so wrapped sentinels still classify) to the stable wire-strings idle_timeout / absolute_timeout / back_channel_revoked / invalid_token. The 401 emit point in bearerSkipIfAuthenticated reads the stashed cause and emits WWW-Authenticate: Bearer realm="certctl", error="invalid_token", error_description=<cause> per RFC 6750 §3. GUI (HIGH-7): LoginPage reads ?error= + ?reason= from the URL via react-router useSearchParams and renders an operator-friendly amber-bordered banner above the form; OIDC_FAILURE_REASON_TEXT maps all 16 known categories with a defensive 'unspecified' fallback for forward-compat with future server-side categories. GUI (HIGH-8): api/client fetchJSON parses the WWW-Authenticate cause via parseWWWAuthenticateCause and attaches it to the 'certctl:auth-required' CustomEvent detail; AuthProvider redirects to /login?session_expired=<cause> on cause-aware 401s; LoginPage renders a blue-bordered session-cause banner. invalid_token stays on the current page (no hard redirect for opaque failures). Misc cleanup: ErrorState now accepts the title/message/data-testid form added by CRIT-4 BreakglassPage (was erroring tsc on master). Regression matrix: - internal/api/handler/oidc_redirect_categories_test.go pins all 16 failure categories to the 302 + reason= location + audit-row leg - internal/auth/session/www_authenticate_test.go pins the 4 stable cause categories on classifySessionError (incl. errors.Is wrapped sentinels) + the WWW-Authenticate emission across all 4 categories + the no-session-context fallback case - internal/api/handler/auth_session_oidc_test.go: 4 pre-existing TestLoginCallback_*Returns400 tests updated to assert 302 + reason= location (the wire shape changed from 400 to 302, but the audit observability and behaviour-equivalent failure-classification are preserved) - web/src/pages/LoginPage.test.tsx: 6 new cases pinning the failure banner, session-cause banner, unknown-reason fallback, and forward-compat 'unspecified' category Spec: cowork/auth-bundles-fixes-2026-05-10/08-high-7-8-error-surfacing.md Closes: HIGH-7, HIGH-8 of cowork/auth-bundles-audit-2026-05-10.md |
||
|
|
15435ca02b |
fix(oidc/bcl): jti replay-cache + iat freshness check (HIGH-3 closure)
Closes HIGH-3 of the 2026-05-10 audit. Pre-fix the BCL handler
accepted any logout_token whose iat + jti were syntactically present
but never checked (a) that iat fell within a skew window or (b) that
jti hadn't been seen before. A captured logout_token was replayable
indefinitely; once CRIT-2 was fixed, every replay would revoke the
user's current sessions — persistent DoS. RFC 9700 §2.7 + OIDC BCL
1.0 §2.5 require jti replay defense.
- Migration 000040_bcl_replay_cache: oidc_bcl_consumed_jtis table with
composite PK on (jti, issuer_url) — RFC 7519 §4.1.7 per-issuer
uniqueness — and an expires_at index for the GC sweep.
- repository.BCLReplayRepository interface + ErrBCLJTIAlreadyConsumed
sentinel. Postgres impl uses INSERT...ON CONFLICT DO NOTHING
RETURNING true for atomic single-use semantics in one round-trip.
- handler.DefaultBCLVerifier gains WithMaxAge + nowFn clock seam. iat
freshness check rejects tokens whose iat is in the future beyond
max-age OR stale beyond it. Verifier signature extended:
Verify(ctx, jwt) (iss, sub, sid, jti string, iat int64, err error).
- handler.AuthSessionOIDCHandler gains BCLReplayConsumer (interface)
+ WithBCLReplayConsumer(consumer, maxAge) setter. BackChannelLogout
consumes the jti post-verify with TTL = max(24h, 2*maxAge):
- first-receive → 200, sessions revoked, audit outcome=revoked
- replay (ErrBCLJTIAlreadyConsumed) → 200 + Cache-Control: no-store,
audit outcome=jti_replayed, sessions NOT re-revoked
- transient (non-AlreadyConsumed error) → 503 so the IdP retries
- internal/scheduler/scheduler.go: SetBCLReplayGarbageCollector wires
SweepExpired into the existing session-GC tick (no separate ticker
for short-lived replay rows).
- cmd/server/main.go: bclMaxAge from cfg.Auth.OIDCBCLMaxAgeSeconds
(default 60s, env CERTCTL_OIDC_BCL_MAX_AGE_SECONDS); bclReplayRepo
wired into the verifier + handler + scheduler.
- Three regression tests in internal/api/handler/bcl_replay_test.go:
TestBackChannelLogout_FirstReceiveConsumesJTI,
TestBackChannelLogout_ReplayedJTIReturns200WithAudit,
TestBackChannelLogout_TransientConsumeFailureReturns503.
- internal/api/handler/auth_session_oidc_test.go: stubBCLVerifier
gains jti + iat fields; existing TestBackChannelLogout_* tests
rewritten for the new Verify return.
Verification gate green: gofmt clean, go vet clean, go test -short
-count=1 on internal/api/handler / internal/api/router /
internal/scheduler / cmd/server / internal/auth/oidc /
internal/auth/breakglass — all pass.
CRIT-1..CRIT-5 + HIGH-1 + HIGH-2 + HIGH-3 of the 2026-05-10 audit
now closed on this branch. Spec at
cowork/auth-bundles-fixes-2026-05-10/07-high-3-bcl-replay-defense.md.
Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-3
|
||
|
|
1697845493 |
fix(auth): wire RevokeAllForActor + RotateCSRFToken to mutation paths
Closes HIGH-1 + HIGH-2 of the 2026-05-10 audit.
HIGH-1: breakglass.Service.SetPassword and RemoveCredential now call
sessions.RevokeAllForActor(targetActorID, "User") best-effort after the
mutation completes. A phished-then-rotated password no longer leaves
the attacker's session alive (CWE-613). Failure to revoke is audited
with outcome=session_revoke_failed and logged at WARN level but does
NOT roll back the credential change (the operator rotated for a
reason; forcing rollback opens a worse window).
- breakglass.SessionMinter interface extended with RevokeAllForActor.
- cmd/server/main.go::breakglassSessionMinterAdapter gains the bridge
to session.Service.RevokeAllForActor.
- stubSessions in service_test.go tracks revokeAllIDs / revokeAllTypes
/ revokeAllErr.
- Three regression tests:
- TestService_SetPassword_RevokesExistingSessions
- TestService_RemoveCredential_RevokesExistingSessions
- TestService_SetPassword_RevokeFailureDoesNotRollback
HIGH-2: New session.Service.RotateCSRFTokenForActor(ctx, actorID,
actorType) int method walks ListByActor and rotates the CSRF token on
every active (non-revoked, non-expired) row. Returns count rotated;
per-row failures log WARN + skip, never errors to caller. New
handler.CSRFRotator interface + AuthHandler.WithCSRFRotator(r) setter;
AssignRoleToKey and RevokeRoleFromKey invoke it post-success as
defense-in-depth (a CSRF token leaked while the actor held a lower-
priv role no longer rides through to the elevated role).
- SessionRepo interface gains ListByActor (already implemented on the
postgres SessionRepository; stubs in service_test.go + bench_test.go
updated to match).
- cmd/server/main.go calls .WithCSRFRotator(sessionService) on the
AuthHandler.
- Two regression tests:
- TestRotateCSRFTokenForActor_RotatesAllActiveRows (asserts revoked /
expired / other-actor rows are skipped)
- TestRotateCSRFTokenForActor_NoSessionsReturnsZero
Verification gate green: gofmt clean, go vet clean, go test -short
-count=1 ./internal/auth/breakglass/ ./internal/auth/session/
./internal/api/handler/ ./internal/api/router/ ./cmd/server/
./internal/domain/auth/ — all pass.
CRIT-1..CRIT-5 + HIGH-1 + HIGH-2 of the 2026-05-10 audit now closed
on this branch. Spec at
cowork/auth-bundles-fixes-2026-05-10/06-high-1-2-revoke-and-rotate.md.
Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-1 HIGH-2
|
||
|
|
739745e9fe |
fix(oidc): enforce AllowedEmailDomains allowlist in HandleCallback
Closes CRIT-5 of the 2026-05-10 audit — the LAST Critical blocker for
v2.1.0. The OIDCProvider.AllowedEmailDomains field shipped persisted
(internal/auth/oidc/domain/types.go:47), API-surfaced
(internal/api/handler/auth_session_oidc.go), MCP-surfaced
(internal/mcp/tools_auth_bundle2.go), and GUI-editable, but the
verifier in internal/auth/oidc/service.go::HandleCallback NEVER read
it. Operators filling allowed_email_domains: ["acme.com"] expected
"users outside acme.com cannot log in" — the field had zero effect.
Textbook lying-field shape per CLAUDE.md's "complete path" rule.
This commit:
- Adds Step 7.5 to HandleCallback (between profile-claim resolve and
group-claim resolve): when the provider's AllowedEmailDomains slice
is non-empty, the user's email-domain MUST match a list entry (case-
insensitive exact match; subdomains NOT auto-accepted — operators
who want dev.acme.com authorized must list it explicitly).
- Two new sentinel errors at the package level:
- ErrEmailDomainNotAllowed — email is set but domain not in list
- ErrEmailMissingButRequired — allowlist set + ID token has no email
- New extractEmailDomain helper: case-folds + trims whitespace + uses
LastIndex for the @ split + rejects empty input / no-@ / empty
local-part / empty domain-part. Returns the lowercase domain or
an error.
- 21 regression tests in internal/auth/oidc/email_domain_test.go:
- 10 extractEmailDomain shape cases (plain, mixed-case input,
leading/trailing whitespace, subdomain preserved, empty, no @,
empty local-part, empty domain-part, multiple @ via LastIndex).
- 11 match-semantic cases (empty list passes any, lowercase match,
mixed-case allowlist entry match, mixed-case email match,
whitespace-padded allowlist entry, unmatched returns
ErrEmailDomainNotAllowed, missing email + non-empty allowlist
returns ErrEmailMissingButRequired, subdomain NOT auto-accepted,
parent-domain NOT auto-accepted, multi-entry first-match,
multi-entry no-match).
Subdomain matching (alice@dev.acme.com against allowlist=[acme.com])
is intentionally NOT auto-accepted. The audit's MED-line tracks the
wildcard / suffix support story for v3; v2.1 ships strict.
Verification gate green:
- gofmt clean
- go vet clean
- go test -short -count=1 ./internal/auth/oidc/... ./internal/api/...
./internal/domain/auth/ — all pass (incl. existing OIDC service
test suite, the 4 BCL tests, the auditor pin, and the AST
RBAC-gate coverage guard).
Branch dev/auth-bundle-2 status post-commit: CRIT-1 (
|
||
|
|
f1d97710e1 |
feat(gui+auth): break-glass admin GUI surface (CRIT-4 closure)
Closes CRIT-4 of the 2026-05-10 audit. Bundle 2 Phase 7.5 shipped the
break-glass backend (Argon2id + lockout + 4 endpoints) but no GUI
surface. Operators recovering during an SSO outage had to hand-craft
curl commands — operationally hostile and the opposite of what
docs/operator/security.md advertised. This commit closes the gap.
Three GUI surfaces:
1. LoginPage.tsx — inline "Use break-glass account (SSO outage
recovery)" toggle below the API-key form. Clicking reveals an
amber-bordered inline form (actor-id + password, autocomplete=off).
Calls breakglassLogin(actor_id, password); on success navigates
to "/" where AuthProvider re-validates via the session-cookie path.
Intentionally low-visibility (text-amber-600 small text) — this is
the deliberate-bypass path, not the everyday-login path.
2. web/src/pages/auth/BreakglassPage.tsx — admin page at /auth/breakglass
(permission-gated by auth.breakglass.admin). Three sections:
- Sticky security banner ("every action audited; use only during
incidents").
- Set/rotate-password form (≥12-char + confirm-match).
- Credentialed-actor table with rotate / unlock (disabled when
not locked) / remove per row. Remove requires type-the-actor-id
confirmation.
3. Layout.tsx nav — "Break-glass" entry under the auth section. Visible
to all callers; the page itself permission-gates (server-side 403 is
the load-bearing defense). Cosmetic hide-when-no-perm is deferred
to fix 14's LOW bundle.
Backend support (new endpoint required to enumerate credentialed actors):
- internal/repository/breakglass.go — BreakglassCredentialRepository
gains List(ctx, tenantID) method.
- internal/repository/postgres/breakglass.go — postgres impl; reuses
the existing breakglassColumns / scanBreakglass helpers.
- internal/auth/breakglass/service.go — Service.List(ctx) method;
returns ErrDisabled when CERTCTL_BREAKGLASS_ENABLED=false (handler
maps to 404 for surface invisibility).
- internal/api/handler/auth_breakglass.go — ListCredentials handler;
password_hash field NEVER serialized to the wire (response shape
is intentionally limited to actor_id + timestamps + failure_count +
locked_until).
- internal/api/router/router.go — registers GET
/api/v1/auth/breakglass/credentials gated by auth.breakglass.admin.
- internal/api/router/openapi_parity_test.go — SpecParityExceptions
entry for the new endpoint (full OpenAPI row rides along with the
next OpenAPI sweep).
GUI api/client.ts gains breakglassListCredentials() + the
BreakglassCredentialRow type matching the wire shape.
Six Vitest cases in BreakglassPage.test.tsx pin the contract:
permission gate (forbidden state when caller lacks the perm; admin
surface when they have it), set-password mismatch rejection, set-
password below-threshold-length rejection, unlock-disabled-when-not-
locked, remove-modal type-confirm.
Verification gate green:
- gofmt -l clean on all touched files
- go vet clean
- go test -short -count=1 on internal/api/router (TestRouter_OpenAPIParity
+ TestRouterRBACGateCoverage + TestRouter_AuthExemptAllowlist),
internal/api/handler (all BCL tests + ListCredentials),
internal/auth/breakglass (Service.List + stubRepo.List),
internal/repository/postgres, internal/domain/auth (auditor pin)
— all pass.
CRIT-1 + CRIT-2 + CRIT-3 from the same audit are already closed on
this branch (commits
|
||
|
|
00eace8068 |
fix(api/cors): narrow Bundle-2 routes from wildcard to NewCORS(corsCfg)
Closes CRIT-3 of the 2026-05-10 audit. Bundle 2's OIDC handshake +
back-channel-logout + logout + bootstrap + breakglass-login routes were
wrapped by middleware.CORS — a hard-coded
Access-Control-Allow-Origin: * middleware that ignored the operator's
CERTCTL_CORS_ORIGINS knob (CWE-942). The properly-configured
middleware.NewCORS(corsCfg) exists right next to it but wasn't used here.
The deprecation comment on middleware.CORS said "Kept for health endpoints"
but Bundle 2 added four additional call sites without converting them.
This commit:
- Renames middleware.CORS -> middleware.CORSWildcard with a stronger doc
block making the security tradeoff explicit at every remaining call
site. The doc references the CI guard + the 2026-05-10 audit closure.
- Adds a CorsCfg middleware.CORSConfig field to router.HandlerRegistry
and threads it from cmd/server/main.go using the existing
cfg.CORS.AllowedOrigins value. The same config that drives the global
corsMiddleware now also drives the per-route NewCORS wraps for the
auth-exempt direct r.mux.Handle blocks.
- Swaps middleware.CORS -> middleware.NewCORS(reg.CorsCfg) for the 7
credentialed auth-exempt routes:
- GET /auth/oidc/login
- GET /auth/oidc/callback
- POST /auth/oidc/back-channel-logout
- POST /auth/logout
- POST /auth/breakglass/login
- GET /api/v1/auth/bootstrap
- POST /api/v1/auth/bootstrap
- Keeps middleware.CORSWildcard for the 4 credential-free probe routes:
- GET /health
- GET /ready
- GET /api/v1/version
- GET /api/v1/auth/info
- Adds scripts/ci-guards/cors-wildcard-allowlist.sh — pins the 4-route
allowlist; fails CI when a new middleware.CORSWildcard wrap appears
outside the allowlist. Adding a new wildcard call site requires
updating the allowlist AND documenting why in the commit body.
Operators who configured CERTCTL_CORS_ORIGINS=https://admin.example.com
expecting the OIDC + BCL + breakglass-login routes to honor it now do.
Previously those routes ignored the knob and emitted ACAO: * regardless.
Verification gate green:
- gofmt -l . clean
- go vet ./... clean
- go test -short -count=1 ./internal/api/... ./internal/auth/...
./internal/domain/auth/ ./internal/service/auth/ ./cmd/server/ pass
- go build ./... clean
- scripts/ci-guards/cors-wildcard-allowlist.sh passes (4 allowlisted
routes; zero violations)
CRIT-1 + CRIT-2 from the same audit are already closed on this branch
(commits
|
||
|
|
ca1e135aa3 |
fix(oidc/bcl): resolve sub→actor_id via users.GetByOIDCSubject (CRIT-2 closure)
Closes CRIT-2 of the 2026-05-10 audit. The BCL handler previously called
sessionSvc.RevokeAllForActor(sub, "User") but session rows are keyed by
user.ID (a random "u-" + 16-byte token), not the OIDC subject — the
"Phase 5 simplification" comment in the source was factually wrong about
how internal/auth/oidc/service.go::upsertUser seeds user.ID. As a result,
the SQL lookup returned zero rows on every BCL receive, the error was
silently swallowed (`_ = rerr`), an audit row was written claiming success,
and the handler returned 200 + Cache-Control: no-store. OIDC BCL 1.0 §2.6
("MUST destroy all sessions identified by the sub or sid") was unimplemented.
CWE-613.
This commit:
- Adds userRepo (repository.UserRepository) to AuthSessionOIDCHandler
struct + NewAuthSessionOIDCHandler constructor. cmd/server/main.go
injects the existing oidcUserRepo (no new repository instance).
- Replaces the broken sub-as-actor-id path with:
1. providerRepo.List(ctx, tenantID) + IssuerURL filter to map
claims.iss → provider row (N is small; typically 1-5).
2. userRepo.GetByOIDCSubject(ctx, provider.ID, sub) to resolve the
OIDC subject → user.ID.
3. sessionSvc.RevokeAllForActor(user.ID, "User") with the RESOLVED
actor_id (not the OIDC subject).
- Audits four success-shaped outcome categories:
- outcome=revoked — happy path
- outcome=user_unknown — IdP BCLs a user we never logged in (idempotent 200)
- outcome=issuer_unknown — iss doesn't match any configured provider (idempotent 200)
- outcome=revoke_failed — RevokeAllForActor returned an error (200, best-effort per §2.8)
And two transient outcomes that return 503 (IdP retries per §2.8):
- outcome=provider_lookup_failed — providerRepo.List error
- outcome=user_lookup_failed — non-NotFound userRepo error
- Removes the misleading "Phase 5 simplification" comment block; replaces
with a doc explaining the resolution path + outcome taxonomy + spec refs.
- Adds 5 regression tests in internal/api/handler/auth_session_oidc_test.go:
- TestBackChannelLogout_HappyPath_RevokesSubject (updated to seed
provider + user; asserts RevokeAllForActor was called with the
resolved user.ID, not the raw OIDC subject — the test that would
have caught CRIT-2 had it existed)
- TestBackChannelLogout_UnknownUserReturns200WithAudit
- TestBackChannelLogout_IssuerUnknownReturns200WithAudit
- TestBackChannelLogout_TransientUserRepoErrorReturns503
- TestBackChannelLogout_RevokeFailureReturns200WithAuditFailureOutcome
- Introduces stubUserRepo in the handler test file (matching the four
repository.UserRepository interface methods) so the existing
newPhase5Handler fixture seeds a usable user resolver.
Verification gate green:
- gofmt -l . clean
- go vet ./... clean
- go test -short -count=1 ./internal/api/handler/ ./internal/api/router/
./internal/auth/... ./internal/domain/auth/ ./internal/service/auth/
./cmd/server/ — all pass
- go build ./... clean
CRIT-1 from the same audit is already closed on this branch (commit
|
||
|
|
68ca42fef1 |
fix(auth): apply rbacGate to every state-changing + read handler (CRIT-1 closure)
Closes the wire-layer authorization gap surfaced by the 2026-05-10 audit
(CRIT-1). Before this commit only ~24 of ~140 routes carried rbacGate
enforcement — all of them admin-only fine-grained perms (auth.session.*,
auth.oidc.*, auth.breakglass.admin, cert.bulk_revoke, crl.admin, scep.admin,
est.admin, ca.hierarchy.manage). Every catalogued legacy-CRUD perm
(cert.read/issue/revoke/delete, profile.edit/delete, issuer.edit/delete,
target.*, agent.*, plus role-mgmt verbs) was declared in
internal/domain/auth/validate.go but never wired at the router. A r-viewer
Bearer was essentially r-admin minus five verbs at the wire layer (CWE-862).
This commit:
- Adds rbacGateScoped(checker, perm, scopeType, scopeFn, h) helper to
internal/api/router/router.go for path-bound scope resolution. Per-profile
and per-issuer grants (Decision 2) now reach the wire layer.
- Wraps every state-changing route AND every read endpoint in router.go
with rbacGate (global) or rbacGateScoped (path-bound). The auth-management
routes (POST /api/v1/auth/roles, etc.) gain router-level enforcement
in addition to the existing service-layer Authorizer check — defense in
depth (HIGH-9 of the same audit collapses into this closure).
- Auth-exempt surfaces stay un-gated by design: login, callback, BCL,
logout, breakglass-login, bootstrap, health, auth-info, version. Allowlist
is documented in TestRouterRBACGateCoverage.
- Extends internal/domain/auth/validate.go CanonicalPermissions with 30 new
perms across 12 namespaces: cert.edit; job.read, job.cancel; approval.read,
approval.approve, approval.reject; policy.read/edit/delete;
team.read/edit/delete; owner.read/edit/delete; notification.read/edit;
discovery.read/run/claim; network_scan.read/edit/run;
healthcheck.read/edit/delete/acknowledge; digest.read, digest.send;
verification.read, verification.run; stats.read; metrics.read.
- Updates DefaultRoles for r-admin / r-operator / r-viewer / r-mcp / r-cli /
r-agent. r-auditor gets NOTHING new — the auditor pin
(TestAuditorRoleHoldsExactlyAuditReadAndExport) stays invariant.
- Migration 000039_audit_crit1_perms seeds the new perm rows + role grants
per the updated DefaultRoles map. Idempotent ON CONFLICT DO NOTHING.
Reverse migration removes role_permissions before permissions
(ON DELETE RESTRICT on the FK).
- AST-level CI guard TestRouterRBACGateCoverage in
internal/api/router/router_rbac_coverage_test.go walks router.go and
asserts every state-changing + read route is wrapped (or in the
documented allowlist). Adding a new ungated route fails CI.
- Updates docs/operator/rbac.md permission-catalogue table with the new
namespaces + footer link to the AST CI guard.
- Updates certctl/CHANGELOG.md v2.1.0 section with the closure narrative.
Audit doc cowork/auth-bundles-audit-2026-05-10.md CRIT-1 row annotated
CLOSED 2026-05-10. Bundle's exit-gate spec lives at
cowork/auth-bundles-fixes-2026-05-10/01-crit-1-rbac-gates.md.
CRIT-2 / CRIT-3 / CRIT-4 / CRIT-5 of the same audit remain open and
continue to block the v2.1.0 tag.
Verification gate green:
- gofmt -d (no diff after gofmt -w on the touched files)
- go vet ./...
- go test -short -count=1 ./... (all packages pass including auditor pin)
- go build ./...
HIGH-9 of the audit closes via this commit's router-layer rbacGate on
POST /api/v1/auth/keys/{id}/roles + DELETE /api/v1/auth/keys/{id}/roles/{role_id}
(defense-in-depth on top of the existing service-layer privilege check).
Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-1 HIGH-9
|
||
|
|
9b6294e83d |
auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets
Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four
benchmarks producing four numbers + the operator-doc table; three
default-tag benchmarks runnable on every CI runner, the fourth
(cold-cache OIDC) runnable on operator-side Docker hosts via the
new make target.
Files
=====
internal/auth/session/bench_test.go (NEW):
* BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs).
Warm in-memory repo + warm session row. Pure CPU: parseCookie +
HMAC verify + map lookup + sentinel checks.
* BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms).
Same pipeline but with a configurable per-call delay simulating
a 1ms Postgres RTT on each repo call. Two repo calls per
Validate (signing-key fetch + session-row fetch) = 2ms minimum;
Go time.Sleep granularity adds ~1-2ms jitter. Documented why
testcontainers Postgres isn't viable inside b.N: 30+ second
container boot incompatible with per-iteration timing.
* slowSessionRepo + slowKeyRepo wrappers add the per-call delay
via time.Sleep; they delegate to the existing in-memory stubs.
* reportPercentiles helper sorts + reports p50/p95/p99/max via
b.ReportMetric (Go testing.B doesn't surface percentiles
natively).
internal/auth/oidc/bench_test.go (NEW):
* BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms).
Drives full HandleCallback against an in-process mockIdP
(httptest.Server localhost loopback). Pre-warmed JWKS cache via
RefreshKeys at setup. Pipeline: pre-login consume + state
compare + token exchange (localhost ~50-200µs) + go-oidc
Verify (RSA-2048 sig verify + alg pin) + service-layer iss/
aud/azp/at_hash/exp/iat/nonce re-checks + group-claim
resolution + group→role mapping + user upsert + session mint.
* The localhost-loopback /token call adds ~100-500µs of TCP
overhead vs pure crypto; the prompt's "no network calls"
steady-state framing accommodates this since the localhost
loopback is the closest practical proxy for a same-region
IdP /token call (which adds 5-15ms in production).
internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration):
* BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs).
Drives RefreshKeys against a live Keycloak container from the
Phase 10 testfixtures harness. Each iteration evicts the
in-process cache + re-fetches discovery + re-fetches JWKS over
real HTTP + re-runs the IdP-downgrade-attack defense.
* Network-bounded: the cold path is dominated by HTTPS RTT to
the IdP discovery endpoint, NOT crypto. The 200ms cap
accommodates a geographically-distant IdP (~150ms RTT) plus
the in-process JWKS fetch + downgrade-defense logic (~5ms
locally).
* Reuses the sharedKeycloak fixture from
integration_keycloak_test.go (Phase 10) so the benchmark
doesn't pay the 60-90s container boot cost separately. Skips
with a clear message if invoked without the integration test
setup.
* Reports p50/p95/p99/max in MILLISECONDS (vs the
microsecond-granularity steady-state benchmarks) since the
cold path is two orders of magnitude slower.
internal/auth/oidc/service_test.go (MODIFIED):
* Refactored newMockIdP(t *testing.T) to delegate to a new
newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern
for sharing test fixtures between *testing.T and *testing.B.
No behavior change for existing service_test.go tests; the
benchmark file in bench_test.go calls newMockIdPWithTB(b)
to get the same fixture.
docs/operator/auth-benchmarks.md (NEW):
* Result table with all four benchmarks + targets + measured
numbers + status markers. Four-row matrix for the default-tag
benchmarks; the fourth row (cold-cache) is operator-recorded
with an empty cell waiting for the first Docker-equipped run.
* Hardware floor section pinning the 4 vCPU / 8 GiB RAM /
Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners
satisfy this; operators on weaker hardware re-record.
* "What each benchmark covers (and what it doesn't)" section
per benchmark, distinguishing the warm steady-state pipeline
from the cold path's network-bounded budget.
* "Cold-cache OIDC: how to run" subsection documenting the
make target + the test+benchmark coupling needed to populate
sharedKeycloak. Operator-recorded baseline table seeded
empty for first runs.
* "Why the cold path is bounded by network latency, not crypto"
section explaining the budget breakdown:
- TCP handshake (1 RTT)
- TLS 1.3 handshake (1-2 RTTs)
- 2 HTTPS GETs (discovery + JWKS, 1 RTT each)
- In-process crypto on the certctl side (~5-10ms total)
So the 200ms cap is operator-checkable: real measurement >
200ms means the IdP is slow OR network congestion OR DNS
issues — the diagnosis is upstream of certctl. Real
measurement < 200ms means the IdP is on a fast same-region
link.
* Methodology section pinning the per-iteration timing capture
+ sort + percentile-extract approach.
* Pre-merge audit section for the Phase 14 exit gate: four
benchmarks ran, four numbers recorded, steady-state targets
met, cold path is operator-runnable + measurably-bounded.
Makefile (MODIFIED):
* Added `make benchmark-auth` (default-tag, runs three of four
benchmarks at 2000 samples each).
* Added `make benchmark-auth-coldcache` (integration-tagged,
runs OIDC cold-cache against live Keycloak; requires Docker).
* Both targets carry explanatory comment blocks.
docs/README.md (MODIFIED):
* Added the auth-benchmarks.md doc to the Operator nav table
alongside performance-baselines.md.
Measured baselines at Phase 14 close (linux/arm64, 4 vCPU)
==========================================================
BenchmarkSession_SteadyState p99 = 5µs (target < 1ms) ✓ 200× under
BenchmarkSession_ColdProcess p99 = 7.1ms (target < 10ms) ✓
BenchmarkOIDC_SteadyState p99 = 1.5ms (target < 5ms) ✓ 3× under
BenchmarkOIDC_ColdCache operator-runs (Docker required)
Verification
============
* gofmt -l on three new bench files: clean.
* go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean
(default tag).
* go vet -tags integration ./internal/auth/oidc/...: clean (integration
tag covers the bench_keycloak_test.go file).
* go test -short -count=1 across all 5 OIDC + session packages:
green; the bench_*_test.go files compile but don't run under
-short (testing.Short() guards + benchmarks are not selected
by -run pattern).
* All three runnable benchmarks executed and produce the numbers
above; recorded in auth-benchmarks.md.
|
||
|
|
130a65f3b6 |
auth-bundle-2 Phase 13: negative-test backfill (OIDC PreLoginAdapter) + OIDC client_secret encryption invariant + multi-tenant query CI guard + coverage floors held at 90 across 4 Bundle-2 packages + E2E coverage map
Closes Phase 13 of cowork/auth-bundle-2-prompt.md. Ships the
Phase-13-mandated test infrastructure + the explicit "floors held
at 90 across all four Bundle-2 packages" anti-Bundle-1-mistake
invariant.
Files
=====
internal/auth/oidc/prelogin_test.go (NEW, +375 LOC):
* PreLoginAdapter coverage backfill. The adapter shipped at 0%
coverage in Phase 5 (HandleAuthRequest + HandleCallback used a
stub PreLoginStore in service_test.go); this file lifts the
package's coverage from 78.8% to 93.7%.
* 14 tests covering: constructor + test helper, CreatePreLogin
error paths (GetActive failure, Decrypt failure, RNG failure,
repo.Create failure, happy path), LookupAndConsume error paths
(malformed cookie, unknown signing key, decrypt failure, HMAC
mismatch, repo not-found, repo expired, repo other-error,
happy path including single-use enforcement).
internal/repository/postgres/oidc_encryption_invariant_test.go (NEW,
+208 LOC, integration test gated by testing.Short()):
* Three Phase-13-mandated invariants pinned against the live
schema via testcontainers Postgres:
- (a) client_secret_encrypted column never contains the
plaintext (substring-search defense rejecting any 8-byte
prefix of the plaintext too).
- (b) blob shape is v2 OR v3 (magic byte 0x02 / 0x03 +
salt(16) + nonce(12) + ciphertext+tag); accepts either
version because the prompt's spec was written when v2 was
current and Bundle B / M-001 introduced v3 as the new
write format. Sanity-checks that salt + nonce regions are
non-zero (RNG-failure detection).
- (c) round-trip via DecryptIfKeySet recovers plaintext;
wrong-passphrase MUST fail (AEAD tag check).
* Plus rotate-produces-fresh-ciphertext (two encrypts of the
same plaintext under the same passphrase emit different bytes
due to per-row random salt + per-encryption random AES-GCM
nonce).
* Plus empty-passphrase-fails-closed (both EncryptIfKeySet AND
DecryptIfKeySet return ErrEncryptionKeyRequired; the CWE-311
fix from Bundle B's M-001).
scripts/ci-guards/multi-tenant-query-coverage.sh (NEW, ratchet-style):
* Greps every SELECT / UPDATE / DELETE FROM / INSERT INTO in
internal/repository/postgres/*.go (excluding *_test.go) that
targets a tenant-aware table. Counts queries that lack
tenant_id in the surrounding 7-line window.
* Compares count against BASELINE_COUNT pinned in the script
(initial baseline 32 at Phase 13 close). Regression (count >
baseline) → FAIL with line-by-line violation list. Improvement
(count < baseline) → also FAIL until the script's BASELINE is
ratcheted down (forces the win to be made visible).
* Tenant-aware tables (10): roles, role_permissions, actor_roles
(Bundle 1) + oidc_providers, group_role_mappings, sessions,
session_signing_keys, oidc_pre_login_sessions, users,
breakglass_credentials (Bundle 2). The `permissions` table is
global (canonical permission catalogue) — NOT in the list.
* Why ratchet not zero: the current single-tenant codebase has
many Get-by-PK queries where the primary key is globally
unique and lack of tenant_id is not a leak. Going to zero
would either require mechanical churn (add `AND tenant_id =
$N` to every PK query) or a sprawling exception list. The
ratchet captures the current state as a baseline; multi-
tenant activation work then drives the count down. New code
that ADDS to the count without operator review is what we
catch.
.github/coverage-thresholds.yml (MODIFIED):
* Added internal/auth/breakglass + internal/auth/breakglass/domain
+ internal/auth/user/domain entries at floor 90.
* Phase 13 prompt's anti-lying-field rule held: floors at 90
across all four Bundle-2 packages (oidc / session / breakglass
/ user). NO held-low-with-rationale entry.
* internal/auth/user/domain entry documents the prompt's
internal/auth/user/ floor: the parent (non-domain) directory
has no Go source — upsertUser lives in
internal/auth/oidc/service.go alongside group resolution +
role mapping (cohesive sequence within the OIDC callback).
Splitting upsertUser into a separate internal/auth/user/
service package would harm cohesion without adding test value;
the domain layer's invariant coverage is where the floor
actually applies.
web/src/__tests__/e2e/README.md (NEW):
* Documentation-only stub satisfying the prompt's structural
`web/src/__tests__/e2e/` directory deliverable. Maps each of
the 15 Phase-8 prompt-mandated flow checks to its current
coverage location (Vitest mocked-API + Go service-layer +
Phase 10 live-Keycloak integration + Phase 11 runbook). Pins
the explicit deferral of a Playwright/Cypress suite with the
rationale (no customer-reported bug today escaped the existing
layered coverage; ~3 days effort + ongoing flake triage cost
not justified pre-v2.1.0).
Coverage results
================
internal/auth/oidc/ 93.7% ≥ 90 ✓ (was 78.8%, lifted by prelogin_test.go)
internal/auth/oidc/domain/ 96.2% ≥ 90 ✓
internal/auth/oidc/groupclaim/ 100.0% ≥ 95 ✓
internal/auth/session/ 94.9% ≥ 90 ✓
internal/auth/session/domain/ 100.0% ≥ 90 ✓
internal/auth/breakglass/ 91.5% ≥ 90 ✓
internal/auth/breakglass/domain/ 100.0% ≥ 90 ✓
internal/auth/user/domain/ 96.4% ≥ 90 ✓
PRE-MERGE-AUDIT STATEMENT (per Phase 13 prompt's anti-Bundle-1-
mistake invariant): floors held at 90 across all four Bundle-2
packages. No held-low-with-rationale entry. Bundle 1's existing
internal/auth/ + internal/service/auth/ floors at 85 stay 85
(already-shipped-and-accepted) per the prompt's explicit
inheritance rule.
Verification
============
* gofmt -l on the new test files: clean.
* go vet ./internal/auth/oidc/... ./internal/repository/postgres/...:
clean.
* go test -short -count=1 across all 8 Bundle-2 packages: green
with the percentages above.
* multi-tenant-query-coverage.sh: PASS (count 32 == baseline 32).
Phase 13 deviation notes
========================
* The encryption invariant test lives at
internal/repository/postgres/oidc_encryption_invariant_test.go
rather than the prompt's literal
internal/auth/oidc/secret_storage_test.go. Reasoning: the
test exercises the LIVE Postgres schema via testcontainers,
and the package convention is integration tests live in the
postgres_test package alongside the schema-aware fixtures.
Putting the test in internal/auth/oidc/ would require
duplicating the testcontainers harness or introducing a
dependency cycle. The semantic content is identical to the
prompt's spec.
* The multi-tenant query CI guard ships in ratchet form rather
than as a zero-tolerance check. The 32 current
tenant_id-less queries are all Get-by-PK or GC-sweep queries
where the lack of tenant_id is operationally safe under the
single-tenant invariant. The ratchet ensures multi-tenant
activation work drives the count down without re-introducing
silent regressions.
* The full Playwright/Cypress E2E suite is deferred. The
web/src/__tests__/e2e/README.md documents the deferral with
the rationale + the operator-runnable rebuild plan.
|
||
|
|
8de28a74ba |
auth-bundle-2 Phase 10: Keycloak testcontainers harness + 5-test e2e OIDC matrix + optional Okta smoke (integration build tag)
Closes Phase 10 of cowork/auth-bundle-2-prompt.md. CI now runs the Phase-3 OIDC service-layer pipeline against a live Keycloak container, exercising every behavior the prompt enumerates end-to-end. Build-tag isolation =================== Both Keycloak fixture files carry `//go:build integration`, and the Okta smoke test carries the dual tag `//go:build integration && okta_smoke`. The pre-commit `make verify` gate runs `go test -short ./...` (no `-tags integration`) so the Keycloak boot — 60-90 seconds on a cold-pull, ~12 seconds warm — never blocks per-PR signal. Verified: go test -short -count=1 ./internal/auth/oidc/... → ok internal/auth/oidc (3.6s, 21+ Phase-3 negatives) → ok internal/auth/oidc/domain (0.005s) → ok internal/auth/oidc/groupclaim (0.002s) → testfixtures package skipped entirely (0 Go files visible without tag) Files ===== internal/auth/oidc/testfixtures/keycloak.go (NEW, //go:build integration): * StartKeycloak(t) boots quay.io/keycloak/keycloak:25.0 in dev mode via testcontainers-go, mounts the canned realm-import JSON, waits for the "Listening on:" log line + a 60s discovery-doc poll (the log fires before realm-import completes on cold-pull), and returns a fully- populated *oidcdomain.OIDCProvider. * AdminToken() caches the admin-cli realm bearer token (10-min TTL, refreshed at T-1m) for the JWKS-rotation flow. * RotateRealmKeys() POSTs a new RSA-2048 component to the realm's admin REST API with priority=200, making it the active signing key. * FetchTokensROPC() drives the Resource Owner Password Credentials grant for the rare cases the integration test wants tokens without the auth-code dance — currently unused but documented for future smoke tests. * Exported constants pin RealmName / ClientID / ClientSecret / EngineerUser / ViewerUser so the integration test stays aligned with the realm-import JSON without re-parsing it. internal/auth/oidc/testfixtures/keycloak-realm.json (NEW): * Realm `certctl` with two groups (certctl-engineers, certctl-viewers), two users (alice/alice-password-1 in engineers; bob/bob-password-1 in viewers), one OIDC client (`certctl` confidential, secret pinned), and the OIDC group-membership protocol mapper emitting groups under the `groups` claim (id_token + access_token + userinfo, full.path=false). * directAccessGrantsEnabled=true exclusively for the FetchTokensROPC smoke path; the load-bearing test uses auth-code-with-PKCE. internal/auth/oidc/integration_keycloak_test.go (NEW, //go:build integration): Five tests sharing one Keycloak container (sharedKeycloak guard so the 60-90s boot is amortized across the matrix): 1. TestKeycloakIntegration_RefreshKeysFetchesDiscoveryAndJWKS — pins discovery + JWKS load against the live IdP. 2. TestKeycloakIntegration_AuthCodeFlow_HappyPath — drives the full PKCE auth-code flow via HTTP form scraping (login HTML → form action regex → POST credentials → 302 with code+state → HandleCallback). Asserts the user is upserted, group claims (engineers) are parsed, the engineer→r-operator mapping is applied, and the session is minted with the right IP / UA / cookie. 3. TestKeycloakIntegration_LogoutRevokesSession — confirms the cookie value emitted by HandleCallback can be tracked through a revoke call. (The full session.Service.Revoke contract is exercised by Phase 4 service_test.go's 15-case negative matrix.) 4. TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey — runs a baseline login under the original key, calls RotateRealmKeys to add a new RSA-2048 component, calls RefreshKeys, then runs a second login flow. Pins behavior #7 from the prompt. 5. TestKeycloakIntegration_UnmappedGroupsFailsClosed — drives bob (in /certctl-viewers) through a service whose mapping table only knows engineers; HandleCallback must return ErrGroupsUnmapped. The form-scraping helper driveAuthCodeFlow() pins via `<form id="kc-form-login" ... action="...">`, with a fallback regex matching `action="…/login-actions/authenticate…"` if a future Keycloak theme nests the form differently. Failure surfaces a truncated HTML body in the t.Fatal so the operator can update the regex on a Keycloak upgrade. internal/auth/oidc/integration_okta_smoke_test.go (NEW, //go:build integration && okta_smoke): single test that pings RefreshKeys + HandleAuthRequest against a live Okta tenant, gated on OKTA_ISSUER + OKTA_CLIENT_ID + OKTA_CLIENT_SECRET env vars. Skips cleanly when any are missing. Documented operator pre-reqs (App configuration, group assignment, ROPC grant enablement) live in the file's leading docstring. Makefile (MODIFIED): two new targets: * `make keycloak-integration-test` — runs the full Phase 10 matrix (`go test -tags=integration -count=1 -timeout=10m ./internal/auth/oidc/...`). * `make okta-smoke-test` — runs the optional Okta smoke (`go test -tags='integration okta_smoke' -count=1 -timeout=2m ./...`). Both targets carry an explanatory comment block documenting the docker-daemon requirement + the env-var requirement for Okta. Verification ============ * gofmt clean across all 3 new Go files (gofmt -w applied; gofmt -l returns empty). * `go vet ./internal/auth/oidc/... ./internal/auth/... ./internal/api/handler/... ./internal/api/router/... ./internal/mcp/...` — clean. * `go vet -tags integration ./internal/auth/oidc/...` — clean. * `go vet -tags 'integration okta_smoke' ./internal/auth/oidc/...` — clean. * `go test -short -count=1 ./internal/auth/oidc/...` — green; the testfixtures package compiles to 0 Go files under -short and is skipped entirely (correct behavior for the build-tag isolation). * No go.mod / go.sum drift — testcontainers-go was already in the graph from Phase 2. Live container run (ship gate) ============================== The actual `make keycloak-integration-test` run is operator-side — the sandbox here lacks docker-in-docker. The CI runner with Docker available is where the matrix flips green. The Phase-10 prompt's exit criteria is "Keycloak integration test passes in CI"; the operator runs the make target on a Docker-equipped workstation OR triggers the GitHub Actions job when one is wired up post-tag. Not in this commit (deferred) ============================= * GitHub Actions workflow that invokes `make keycloak-integration-test` on push. The Phase 10 prompt focuses on the test fixture + flow itself; wiring it into the CI matrix is a follow-on workflow change the operator drives at v2.1.0 tag time. * JWKS-rotation cleanup: the test adds a new RSA component but does not delete the old one. Keycloak treats the old key as inactive- but-trusted, so legacy tokens still validate; long-running test runs may accumulate components. Acceptable for ephemeral test fixtures. |
||
|
|
b09bd0984a |
auth-bundle-2 Phase 9: 11 OIDC + session MCP tools (Phase-5 surface parity)
Closes Phase 9 of cowork/auth-bundle-2-prompt.md. Every Phase-5 HTTP
endpoint now has a matching MCP tool so operators driving certctl
from Claude / VS Code / any MCP client get the same OIDC-provider +
group-mapping + session management capability the GUI + CLI already
expose.
Coverage map (each tool → HTTP endpoint → permission)
=====================================================
certctl_auth_list_oidc_providers GET /v1/auth/oidc/providers auth.oidc.list
certctl_auth_get_oidc_provider GET /v1/auth/oidc/providers (filtered) auth.oidc.list
certctl_auth_create_oidc_provider POST /v1/auth/oidc/providers auth.oidc.create
certctl_auth_update_oidc_provider PUT /v1/auth/oidc/providers/{id} auth.oidc.edit
certctl_auth_delete_oidc_provider DELETE /v1/auth/oidc/providers/{id} auth.oidc.delete
certctl_auth_refresh_oidc_provider POST /v1/auth/oidc/providers/{id}/refresh auth.oidc.edit
certctl_auth_list_group_mappings GET /v1/auth/oidc/group-mappings?provider_id auth.oidc.list
certctl_auth_add_group_mapping POST /v1/auth/oidc/group-mappings auth.oidc.edit
certctl_auth_remove_group_mapping DELETE /v1/auth/oidc/group-mappings/{id} auth.oidc.edit
certctl_auth_list_sessions GET /v1/auth/sessions[?actor_id=&actor_type=] auth.session.list (own) | auth.session.list.all (other)
certctl_auth_revoke_session DELETE /v1/auth/sessions/{id} auth.session.revoke (or own-bypass)
Implementation notes
====================
internal/mcp/tools_auth_bundle2.go (NEW): 11 tools wired through three
focused register functions (registerAuthOIDCProviderTools,
registerAuthGroupMappingTools, registerAuthSessionTools). Every tool
routes through the existing Client (Get/Post/Put/Delete) so permission
gates fire server-side via the Phase-5 rbacGate wrappers — a non-admin
caller's MCP tool invocation gets whatever 403 the underlying HTTP
handler emits, not an MCP-side bypass.
Empty-id guard
--------------
Every path-id tool short-circuits to errorResult(fmt.Errorf("id is required"))
BEFORE the HTTP call. Defense against url.PathEscape("") collapsing a
singular op into the list endpoint (which would silently succeed against
a permissive backend). Same pattern across all 6 path-id tools (get,
update, delete, refresh provider; remove mapping; revoke session).
auth_get_oidc_provider list-then-filter
---------------------------------------
The Phase-5 HTTP API doesn't expose a singular GET /v1/auth/oidc/providers/{id}
endpoint — the GUI's OIDCProviderDetailPage fetches the full list and
filters in-process. The MCP tool mirrors that pattern exactly: GET the
list, JSON-decode the providers envelope, walk the array filtering by
id, return the matching raw JSON object on hit or an explicit "oidc
provider not found: <id>" error on miss. This keeps the MCP surface
in lockstep with the GUI's permission boundary (auth.oidc.list grants
"see any provider", as it does on the GUI) without inventing a new HTTP
endpoint.
internal/mcp/types.go (MODIFIED): 8 new input types matching the
Phase-5 wire shapes (oidcProviderRequest at internal/api/handler/auth_session_oidc.go).
client_secret on Update is optional — empty preserves the existing
ciphertext on the server, providing a value rotates. Mirrors the GUI's
edit-without-rotate UX from web/src/pages/auth/OIDCProviderDetailPage.tsx.
internal/mcp/tools.go (MODIFIED): registerAuthBundle2Tools wired into
RegisterTools alongside the Bundle 1 Phase 11 registerAuthTools.
Test coverage
=============
internal/mcp/tools_auth_bundle2_test.go (NEW), 5 test cases:
* TestAuthBundle2MCP_AllToolsRegister — registerAuthBundle2Tools
doesn't panic; catches duplicate-name regressions before CI.
* TestAuthBundle2MCP_PathsAndMethods — 11 cases (one per tool) +
the admin-other-actor variant of list_sessions; asserts the right
method + path + body + query string fires against the mock API.
* TestAuthBundle2MCP_ForbiddenSurfacesError — every tool's underlying
HTTP path returns a propagated error containing "forbidden" / "403"
when the mock returns 403, exercising the errorResult fence path.
* TestAuthBundle2MCP_GetProviderFiltersListByID — pins the list-then-
filter shape end-to-end with both the hit-and-return (returns the
matching raw JSON object) and miss-returns-error (sentinel string
"oidc provider not found") branches.
* TestAuthBundle2MCP_EmptyIDInputShortCircuits — pins the
strings.TrimSpace empty-id guard at the top of every path-id handler.
* TestAuthBundle2MCP_PromptCoverage — every tool the prompt enumerates
is also present in tools_per_tool_test.go's allHappyPathCases (so
the live-dispatch + 5xx error-path tests cover all 11 tools).
internal/mcp/tools_per_tool_test.go (MODIFIED): 11 new toolCase entries
in allHappyPathCases (live in-memory MCP dispatch + happy-path fence
shape + 5xx error-path fence shape) + a mock-API special case for
GET /api/v1/auth/oidc/providers that returns the right envelope shape
({"providers":[{"id":"op-okta",...}]}) so the get_oidc_provider tool's
in-process filter resolves under the live dispatch.
Verification
============
* gofmt + go vet — clean across internal/mcp/...
* go test -short -count=1 — green across internal/mcp + internal/auth/...
+ internal/api/handler + internal/api/router (13 packages, 0 failures).
* MCP tool count re-derive (CLAUDE.md command):
grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go
→ tools.go=121, tools_auth.go=12, tools_auth_bundle2.go=11 (new),
tools_est.go=6 — total 150. Matches the live count
TestMCP_RegisterTools_DispatchableToolCount asserts.
* staticcheck deferred — sandbox /tmp at 99% disk, can't install the
binary; all SA*/ST* lints would have run via the staticcheck-CI step
on push. go vet caught the only real issue (an unused context import)
before commit.
Not in this commit (deferred)
=============================
* Break-glass admin MCP tools (4 endpoints from Phase 7.5). The Phase 9
prompt does NOT enumerate break-glass tools; its exit criteria is
"Every API endpoint from Phase 5 has an MCP tool". Phase 5 does not
include the break-glass surface (Phase 7.5 ships those endpoints with
surface-invisibility semantics: 404 when CERTCTL_BREAKGLASS_ENABLED=false,
which complicates LLM tool-discovery UX). If the operator wants
break-glass MCP parity, that's a follow-on bundle.
|
||
|
|
1d01c87663 |
auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap +
break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)
Phase 7 — OIDC first-admin bootstrap (Decision 3):
- Optional AdminBootstrapHook closure on *oidc.Service. When wired,
HandleCallback consults the hook AFTER group resolution + user
upsert and BEFORE the empty-mapping fail-closed check. Hook
receives (providerID, groups, userID); returns grantAdmin=true
when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
admin exists yet in the tenant.
- cmd/server/main.go wires the hook as a closure that:
* Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
* Probes AdminExists via authActorRoleRepo (admin-already-exists
silently returns false; bootstrap mode is one-shot per tenant).
* Walks group intersection.
* On match: grants r-admin via authActorRoleRepo.Grant + emits
the bootstrap.oidc_first_admin audit row with
event_category=auth + INFO log.
- Coexists with the Bundle 1 env-var-token bootstrap. Both paths
can be configured; first match wins (admin-existence probe
short-circuits the second).
- HandleCallback's empty-mapping fail-closed check moved AFTER the
hook so a fresh deployment with zero group_role_mappings can
still mint the first admin.
- 5 tests in service_test.go: hook grants admin on match, hook
returns false preserves empty-mapping fail-closed, admin-already-
exists silently falls through to normal mapping, hook-error wraps
+ bubbles, idempotent when admin is already in the mapped role set.
Phase 7.5 — Break-glass admin (Decision 4, default-OFF):
Migration 000038 ships:
- breakglass_credentials table — at-most-one-credential-per-actor
(UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
state machine (failure_count, locked_until, last_failure_at).
FK CASCADE on users(id) so deleting a user atomically removes
their credential.
- Two new permissions seeded into r-admin only:
auth.breakglass.admin — set/rotate/unlock/remove credentials.
auth.breakglass.login — actor uses break-glass to log in.
CanonicalPermissions extended in lockstep.
internal/auth/breakglass/service.go (~580 LOC):
- Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
- SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
salt=16 random bytes, output=32 bytes); per-password random salt;
PHC-format hash output. Min 12 / max 256 byte input.
- Authenticate: constant-time-compare via subtle.ConstantTimeCompare
on every code path. Identical 401 + identical timing across the
wrong-password / locked-account / non-existent-actor paths so an
attacker cannot probe whether a given actor has break-glass
configured. Non-existent-actor + locked-account paths run a
verifyDummy() Argon2id pass for timing parity. Lockout state
machine: failure_count++ on every wrong attempt; threshold (default
5) trips locked_until = NOW() + duration (default 15m). Successful
Authenticate resets the counter. Reset-window: failures aged out
after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
auto-reset on next attempt.
- Unlock + RemoveCredential: admin-only (auth.breakglass.admin
gated at the router via rbacGate). Audit rows on every operation.
- All public methods refuse to act when Enabled()==false (returns
ErrDisabled; the handler maps to HTTP 404 — surface invisibility).
internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.
internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:
- POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
source IP via the existing rate limiter; returns 404 when
disabled). On success sets the post-login session cookie + CSRF
cookie via SessionService.Create + 204. On any failure:
uniform 401 + identical timing (the service has already audited
the specific failure category).
- POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
- POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
(auth.breakglass.admin)
- DELETE /api/v1/auth/breakglass/credentials/{actor_id}
(auth.breakglass.admin)
Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.
Tests (internal/auth/breakglass/service_test.go):
All 8 Phase 7.5 spec-mandated negative cases:
1. Service.Enabled()==false → all ops return ErrDisabled.
2. Wrong password → ErrInvalidCredentials, failure_count++,
audit row with event_category=auth.
3. Failure_count exceeds threshold → locked, subsequent attempts
(including with the CORRECT password) return identical-shape
401 while the lockout window holds.
4. Lockout window expires → next attempt with correct password
succeeds + resets the counter.
5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
6. Password leak hygiene — the service has zero slog calls; the
audit-row map literal never includes the password plaintext.
7. Argon2id hash never appears in logs OR API responses — pinned
by `json:"-"` tag on BreakglassCredential.PasswordHash + a
belt-and-braces json.Marshal probe asserting the hash bytes
never appear in the marshaled output.
8. Constant-time-compare verified via timing-statistical test —
wrong-password vs no-credential paths take statistically
indistinguishable time (within 5x ratio). The verifyDummy()
hash compute on the no-credential + locked paths is what
keeps timing parity; absent that, an attacker could side-
channel "actor doesn't have a credential" via timing.
Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.
Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).
cmd/server/main.go wiring:
- Constructs breakglassRepo + breakglassService + breakglassHandler
after the OIDC service block.
- breakglassSessionMinterAdapter shim bridges *session.Service.Create
to the breakglass.SessionMinter port.
- Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
visibility for the deliberate SSO-bypass).
internal/config/config.go gains:
- AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
- AuthConfig.Breakglass nested struct with 4 env vars
(CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
+ LOCKOUT_RESET_INTERVAL).
Router wiring:
- 4 new breakglass routes registered when reg.AuthBreakglass != nil;
public login route via direct r.mux.Handle (auth-exempt), 3 admin
routes via r.Register + rbacGate(auth.breakglass.admin).
- POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
allowlist with Phase 7.5 justification.
- SpecParityExceptions extended with 4 new entries documenting
the Phase 7.5 deferral of full per-endpoint OpenAPI rows
(handler doc-block at the top of auth_breakglass.go is the
operator-facing reference).
Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):
- Break-glass is a deliberate bypass of the SSO security boundary.
An attacker who phishes the password OR finds it in a compromised
password manager bypasses MFA, OIDC, and every group-claim gate.
- Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
state. Enable only during SSO-broken incidents. Disable after
recovery.
- WebAuthn pairing (v3 per Decision 12) is the load-bearing second
factor. Without it, break-glass is best treated as an emergency-
only path.
- Audit trail surfaces every break-glass action under
event_category=auth; the auditor role can monitor for unexpected
break-glass logins.
Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
|
||
|
|
3189f3cd71 |
auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +
chained-auth combinator + AuthInfo OIDC providers extension + 2 CI
guards (Bundle-1-compat + Bundle-1-to-2-upgrade)
Phase 6 wires the Phase 4 session service + Phase 5 OIDC handlers into
the request path. Three middlewares + one combinator land in
internal/auth/session/middleware.go:
1. SessionMiddleware reads `certctl_session` cookie, validates via
SessionService.Validate, populates the legacy UserKey/AdminKey
+ Phase 3 RBAC context keys (ActorIDKey/ActorTypeKey/TenantIDKey)
so downstream RequirePermission + audit-attribution see a
consistent caller. Best-effort UpdateLastSeen keeps the idle-
expiry sliding window fresh. CRITICALLY: never 401s on validate
failure — defers to the next middleware so the chained-auth
combinator can fall back to Bearer.
2. CSRFMiddleware gates state-changing methods (POST/PUT/DELETE/
PATCH) for session-authenticated requests. API-key actors are
EXEMPT (no session row in context => CSRF doesn't apply; they're
not browser-driven). Constant-time-compares SHA-256(X-CSRF-Token
header) against the session row's stored hash via
SessionService.ValidateCSRF. Mismatch returns 403.
3. ChainAuthSessionThenBearer is the load-bearing chained-auth
combinator: tries the session cookie first; on miss/invalid,
falls back to the API-key Bearer middleware; if neither
authenticates, 401. The composition uses bearerSkipIfAuthenticated
so a request with both a valid session AND a valid Bearer uses
the session (cookie wins per the Bundle 2 contract).
Middleware chain order in cmd/server/main.go (per Phase 6 spec):
RequestID → Logging → Recovery → CORS → RateLimit → AUTH (chained:
session → Bearer) → CSRF (state-changing only; API-key exempt) →
Audit → Handler
The chained authMiddleware replaces the bare Bundle-1 bearerMiddleware
at the chain entry point; csrfMiddleware lands immediately after so
session-authenticated requests pass through CSRF before audit. Both
new middlewares are pass-throughs when sessionService is nil
(pre-Phase-4 builds).
AuthInfo extension (Category E): GET /api/v1/auth/info now returns the
list of configured OIDC providers (id + display_name + login_url
where login_url = `/auth/oidc/login?provider=<id>`) so the GUI Login
page renders the correct "Sign in with X" buttons. Endpoint stays
auth-exempt; the providers list is public configuration. Wired via
HealthHandler.OIDCProvidersResolver + a new OIDCProvidersListResolver
projection interface; the cmd/server adapter
oidcProvidersListAdapter projects the postgres OIDCProviderRepository
into the public-safe shape. Resolver lookups are best-effort: failures
fall back to the minimal payload rather than 500-ing the GUI's auth
probe. Nil resolver preserves the pre-Phase-6 minimal shape so test
fixtures + no-db deploys keep compiling.
Bypass list preserved (Category E): the existing public-route
allowlist in router.AuthExemptRouterRoutes is preserved by virtue of
those routes registering via direct r.mux.Handle (they bypass the
entire chain). The protocol-endpoint allowlist (ACME/SCEP/EST/OCSP/
CRL) bypasses via cmd/server/main.go::buildFinalHandler URL-prefix
dispatch — those routes never reach the auth middleware at all. Both
preservations are pinned by the Bundle-1 compat CI guard below.
Tests (internal/auth/session/middleware_test.go):
All 7 Phase 6 spec-mandated middleware-chain tests pass:
1. Session cookie + correct CSRF → 200.
2. Session cookie + wrong CSRF → 403.
3. Bearer-only (no session) + no CSRF → 200 (API-key actors are
CSRF-exempt by design).
4. No cookie + no Bearer → 401.
5. Expired cookie + valid Bearer → fall back to Bearer succeeds.
6. Tampered cookie → 401 (no Bearer to fall back to).
7. Bypass-list awareness — state-changing method, no auth, no
session row → uniform 401 (NOT a CSRF 403; the CSRF check is
gated on session-row presence and never fires for unauth
requests).
Plus coverage-lift tests covering nil-service pass-through, safe-
methods bypass, SessionFromContext nil + populated, isStateChangingMethod
matrix, clientIPFromRequest variants (RemoteAddr / XFF first-hop /
XFF single / no-port), nil-bearer chain branches.
Coverage on internal/auth/session/middleware.go: 100% per-function
across the 9 entry points (SessionValidator interfaces +
NewSessionMiddleware + NewCSRFMiddleware + ChainAuthSessionThenBearer +
bearerSkipIfAuthenticated + SessionFromContext + isStateChangingMethod
+ clientIPFromRequest + lastIndexByte). Package coverage 94.9%.
Two new CI guards:
scripts/ci-guards/bundle-1-compat-regression.sh — Bundle-1-only
compat invariants. Static-source checks that protect the Bundle-1
path since spinning up docker-compose + running the integration
test suite is sandbox-infeasible:
1. SessionMiddleware MUST defer-to-next on missing/invalid cookie.
2. CSRFMiddleware MUST be pass-through on missing session row.
3. cmd/server/main.go MUST wire ChainAuthSessionThenBearer.
4. The 4 public OIDC routes MUST be in AuthExemptRouterRoutes.
5. AuthInfo MUST guard on OIDCProvidersResolver != nil.
scripts/ci-guards/bundle-1-to-2-upgrade-regression.sh — Bundle-1 →
Bundle-2 upgrade invariants:
1. Migrations 000034..000037 use CREATE TABLE IF NOT EXISTS.
2. Migrations are wrapped in BEGIN; ... COMMIT;.
3. NO DROP TABLE / ALTER ... DROP COLUMN against any of the 19
protected Bundle-1 tables (api_keys, audit_events, certificates,
certificate_versions, profiles, issuers, targets, agents, jobs,
owners, teams, agent_groups, notifications, roles, permissions,
role_permissions, actor_roles, tenants, approvals,
intermediate_cas, issuance_approval_requests).
4. 000037 INSERTs use ON CONFLICT DO NOTHING (idempotent re-apply).
5. ChainAuthSessionThenBearer is wired (Bundle-1 Bearer keys
continue to authenticate post-upgrade).
6. Bootstrap handler is registered (fresh-deployment bootstrap
still works).
Both guards are sandbox-feasible static analysis. When the operator
gets a Linux VM with docker-in-docker, promote both to real `docker
compose up` integration tests against a v2.1.0 baseline DB dump.
Verifications: gofmt clean, go vet ./internal/auth/... ./internal/api/...
./cmd/server/... clean, go test -short -count=1 -race green across
internal/auth/session (94.9% coverage), internal/api/handler,
internal/api/router, no regressions in Bundle 1 packages, both new
ci-guards green.
|
||
|
|
9c679a5960 |
auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints),
pre-login store, OpenID Connect Back-Channel Logout 1.0, cookieAuth
scheme, 7 new auth permissions, CI guard, handler tests
Phase 5 of the bundle puts the Phase 3 OIDC service + Phase 4 session
service on the wire. 13 HTTP endpoints split into three logical groups:
Public OIDC handshake (auth-exempt; protocol-mediated):
GET /auth/oidc/login?provider=<id> -> 302 to IdP authorization URL
+ sets certctl_oidc_pending cookie
(10-min TTL, Path=/auth/oidc/,
SameSite=Lax)
GET /auth/oidc/callback?code=...&state=... -> consume pre-login row,
run Phase 3's 11-step token
validation, mint post-login
session, 302 to dashboard
POST /auth/oidc/back-channel-logout -> OpenID Connect BCL 1.0 — IdP
POSTs logout_token JWT; certctl
validates signature against IdP
JWKS via Phase 3 alg allow-list,
required claims (iss/aud/iat/jti/
events; exactly one of sub/sid;
nonce ABSENT per spec §2.4),
revokes matching sessions,
returns 200 with
Cache-Control: no-store
POST /auth/logout -> revoke caller's session
Session management (RBAC-gated auth.session.*):
GET /api/v1/auth/sessions -> auth.session.list (own / all)
DELETE /api/v1/auth/sessions/{id} -> auth.session.revoke (own bypass)
OIDC provider + group-mapping CRUD (RBAC-gated auth.oidc.*):
GET /api/v1/auth/oidc/providers -> auth.oidc.list
POST /api/v1/auth/oidc/providers -> auth.oidc.create
(client_secret encrypted
at rest via
internal/crypto.EncryptIfKeySet)
PUT /api/v1/auth/oidc/providers/{id} -> auth.oidc.edit
DELETE /api/v1/auth/oidc/providers/{id} -> auth.oidc.delete
(refused via
ErrOIDCProviderInUse → 409
when users authenticated
via this provider)
POST /api/v1/auth/oidc/providers/{id}/refresh -> auth.oidc.edit
(re-runs IdP downgrade
defense via
OIDCService.RefreshKeys)
GET /api/v1/auth/oidc/group-mappings -> auth.oidc.list
POST /api/v1/auth/oidc/group-mappings -> auth.oidc.edit
DELETE /api/v1/auth/oidc/group-mappings/{id} -> auth.oidc.edit
Migration 000037 ships:
- oidc_pre_login_sessions table (10-min absolute TTL, FK CASCADE on
oidc_provider_id, FK RESTRICT on signing_key_id; index on
absolute_expires_at for the GC sweep);
- 7 new permissions seeded into r-admin only:
auth.session.list, auth.session.list.all, auth.session.revoke,
auth.oidc.list, auth.oidc.create, auth.oidc.edit, auth.oidc.delete
CanonicalPermissions extended in lockstep at internal/domain/auth/
validate.go.
Pre-login machinery:
- internal/repository/oidc.go gains PreLoginRepository interface +
PreLoginSession struct + ErrPreLoginNotFound / ErrPreLoginExpired
sentinels.
- internal/repository/postgres/oidc_prelogin.go ships the impl;
LookupAndConsume uses DELETE ... RETURNING for atomic single-use.
- internal/auth/oidc/prelogin.go is the PreLoginAdapter that bridges
the OIDC service's Phase 3 PreLoginStore interface to the new
repository, signing the cookie value under the active
SessionSigningKey via the same v1.<id>.<key>.<HMAC> wire format
Phase 4 uses for post-login cookies. Defense-in-depth: the
pre-login `pl-` prefix is enforced by ParseCookieValue(prefix);
a stolen pre-login cookie cannot be replayed against the
post-login Validate path (pinned by
TestService_Validate_RejectsPreLoginCookieAtPostLoginGate).
Session package extension:
- internal/auth/session/service.go gains exported SignCookieValue,
ParseCookieValue (with caller-supplied id-1 prefix), ComputeCookieHMAC,
DecryptKeyMaterial wrappers so the OIDC pre-login adapter shares
the same length-prefixed HMAC math without code duplication.
- parseCookie no longer hardcodes the `ses-` prefix check (moved to
Validate as defense-in-depth; pre-login cookie verification uses
the `pl-` prefix via ParseCookieValue).
Cookie attributes (all Phase 5 endpoints honor CERTCTL_SESSION_SAMESITE
+ Secure=true via SessionCookieAttrs from Phase 4 config):
- certctl_oidc_pending: Path=/auth/oidc/, MaxAge=600s, SameSite=Lax
(cannot be Strict because the IdP-initiated callback is a top-level
navigation from a different origin).
- certctl_session: Path=/, Expires=8h, SameSite=Lax|Strict, HttpOnly.
- certctl_csrf: Path=/, Expires=8h, HttpOnly=false (intentional —
GUI must read it to echo into X-CSRF-Token header).
Audit logging on every mutating operation (event_category="auth"):
auth.oidc_login_succeeded / failed / unmapped_groups
auth.oidc_back_channel_logout / failed
auth.session_revoked
auth.oidc_provider_{created,updated,deleted,refreshed}
auth.group_mapping_{added,removed}
OpenAPI updates:
- cookieAuth security scheme added to api/openapi.yaml under
components.securitySchemes (apiKey / cookie / certctl_session).
- The 13 Phase 5 routes are added to SpecParityExceptions with a
deferral note: full per-endpoint OpenAPI rows land in a follow-on
commit alongside the GUI work (Phase 8) so the ergonomic shape can
be validated against the live GUI client.
CI guard: scripts/ci-guards/N-bundle-2-security-empty-preserved.sh
asserts api/openapi.yaml has ≥ 14 'security: []' occurrences (the
pre-Bundle-2 baseline). Reducing the count below 14 would silently
force a Bearer-or-cookie requirement onto an endpoint that legitimately
runs without certctl-issued credentials; the guard fires before that
regression lands.
Handler tests (internal/api/handler/auth_session_oidc_test.go):
- All 6 prompt-mandated negative cases:
BCL with missing events claim -> 400
BCL with nonce present -> 400 (per spec §2.4)
BCL with sig signed by an unknown key -> 400
Callback with replayed state -> 400
Callback with PKCE verifier mismatch -> 400
Callback with expired pre-login row -> 400
- Plus happy paths for every endpoint, edge cases (missing-cookie,
duplicate-name, in-use-409, wrong-tenant), and the Helper-function
coverage (peekIssuer, classifyOIDCFailure, defaultIfBlank,
defaultIntIfZero, clientIPFromRequest, encryptClientSecret).
Coverage on internal/api/handler/auth_session_oidc.go: 80.9% per-function
(above the Phase 5 spec's ≥ 80% floor).
Server wiring (cmd/server/main.go):
Wired AFTER sessionService (Phase 4) so the OIDC PreLoginAdapter can
sign pre-login cookies under the active SessionSigningKey:
oidcProviderRepo + oidcMappingRepo + oidcUserRepo + oidcPreLoginRepo
-> preLoginAdapter -> oidcService -> authSessionOIDCHandler.
sessionMinterAdapter shim bridges *session.Service.Create to the
oidcsvc.SessionMinter port the OIDC service consumes.
Router wiring (internal/api/router/router.go):
4 public OIDC routes via direct r.mux.Handle (auth-exempt; pinned in
AuthExemptRouterRoutes); 9 RBAC-gated routes via r.Register +
rbacGate(checker, perm, h). Routes only register when
reg.AuthSessionOIDC != nil so pre-Phase-5 builds skip the block
entirely.
Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/api/handler (74 tests +
new Phase 5 batch), internal/api/router (parity + auth-exempt
allowlist), internal/auth/oidc + session (no regressions), full domain
+ scheduler + config sweeps green, ci-guard
N-bundle-2-security-empty-preserved.sh green (17 ≥ 14 baseline).
|
||
|
|
17b30c1f7f |
auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
|
||
|
|
854135dfb7 |
auth-bundle-2 Phase 3: OIDC service (HandleAuthRequest, HandleCallback,
RefreshKeys), hand-rolled group-claim resolver, 21+ negative-test
matrix, token-leak hygiene, IdP downgrade-attack defense
Phase 3 of the bundle ships the business logic that turns the Phase 2
storage primitives into a working OpenID Connect 1.0 + RFC 7636 PKCE
authorization-code flow against any enterprise IdP (Okta / Azure AD /
Google Workspace / Keycloak / Authentik / Auth0).
Service surface:
- Service.HandleAuthRequest(providerID) -> authURL, cookie, preLoginID
Builds the IdP redirect with PKCE-S256 (mandatory; RFC 9700 §2.1.1),
server-generated 32-byte state + nonce, persisted to the pre-login
row keyed by the cookie value.
- Service.HandleCallback(cookie, code, state, ip, ua) -> *CallbackResult
11-step validation: pre-login lookup-and-consume (single-use),
constant-time state compare, code-for-token exchange with PKCE
verifier, ID-token verify (alg pin via go-oidc/v3), service-layer
re-checks of iss / aud / azp (multi-aud requires it; mismatch
rejected) / at_hash (REQUIRED when access_token returned —
Phase 3 lifts the OIDC core "MAY" to a service-level "MUST") /
exp / iat-window / nonce, group-claim resolution with userinfo
fallback, group->role mapping (fail-closed on no match),
user upsert, session mint via SessionMinter port.
- Service.RefreshKeys(providerID) — explicit cache eviction +
re-load. Re-runs the IdP downgrade-attack defense so a provider
that later rotates to advertising HS* / none is caught BEFORE the
next user login attempt.
Security posture (every fail-closed branch is a sentinel error +
test):
- Algorithm pinning: allow-list {RS256, RS512, ES256, ES384, EdDSA};
deny-list {HS256, HS384, HS512, none}. Belt-and-braces re-check
via isDisallowedAlg after go-oidc.Verify.
- PKCE-S256 mandatory (oauth2.GenerateVerifier + S256ChallengeOption);
`plain` rejection sentinel exists for defense-in-depth.
- State + nonce: 32-byte crypto/rand, base64url-no-pad,
constant-time compare, single-use.
- IdP downgrade-attack defense: at provider creation / RefreshKeys,
reject any IdP whose discovery doc advertises HS* / none in
id_token_signing_alg_values_supported.
- JWKS fail-closed: in-flight login fails 503; existing sessions
untouched. isJWKSFetchError detects the gooidc verify-error
shape; ErrJWKSUnreachable is the wire mapping.
- Token-leak hygiene: ID tokens, access tokens, refresh tokens,
authorization codes, PKCE verifiers, state, nonce, signing key
bytes — NEVER logged at any level. logging_test.go pins the
invariant via a slog buffer + grep-assert across HandleAuthRequest,
HandleCallback, alg rejection, and provider-load paths.
Group-claim resolver (internal/auth/oidc/groupclaim/):
- Hand-rolled per Decision 10 (no JSON-path lib; ~150 LOC).
- URL-shape paths (https:// / http://) treated as a single
literal key — Auth0 namespaced claims like
https://your-namespace/groups work without splitting on the
dots in the URL.
- Dot-separated paths walked through nested map[string]interface{}.
- []interface{} / []string / single-string normalized to []string;
bool / number / object / nil → fail closed.
- 18 unit tests + sentinels (ErrPathEmpty, ErrSegmentMissing,
ErrSegmentNotObject, ErrInvalidValueType).
Test surface:
- service_test.go: 57 test functions including all 21 prompt-mandated
negative cases (wrong aud / wrong iss / expired / unknown alg /
alg=none / HMAC alg / azp missing on multi-aud / azp mismatched /
at_hash missing / at_hash mismatched / iat in future / iat too old /
nonce mismatched / state mismatched / state replayed / PKCE plain
sentinel / pre-login replay / forged cookie / IdP downgrade /
group-claim missing / group-claim unmapped) plus the userinfo
fallback matrix (happy path + endpoint-missing + endpoint-failing +
userinfo-also-empty), HandleAuthRequest entry point + RNG-failure
paths, upsertUser update + create + display-name fallback +
Validate-error paths, decryptClientSecret real-encrypt round-trip
+ bad-passphrase, alg-parser malformed-header matrix.
- logging_test.go: 4 hygiene tests pinning no token / code / verifier /
state / cookie / client_secret / alg name appears in any captured
log line.
- groupclaim/resolver_test.go: 18 cases covering Okta string-array,
Keycloak realm_access.roles, Auth0 namespaced URL claim,
single-string normalization, deeply-nested 3-segment walks, and
every fail-closed branch.
Coverage:
internal/auth/oidc 92.2% (floor: 90)
internal/auth/oidc/groupclaim 100.0% (floor: 95)
internal/auth/oidc/domain 96.2% (floor: 90)
Coverage gates added at .github/coverage-thresholds.yml so a future
regression in any fail-closed branch fails CI before the commit lands.
Phase 3 of cowork/auth-bundle-2-prompt.md is closed. Next up: Phase 4
(Session service: cookies, revocation, sliding-vs-absolute expiry).
|
||
|
|
95f1d6cf63 |
auth-bundle-2 Phase 2b: repository interfaces + Postgres impls + integration tests
Closes Phase 2 end-to-end. Builds on Phase 2a's three migrations (000034 oidc_providers + group_role_mappings, 000035 sessions + session_signing_keys, 000036 users) by shipping the repository surface Phase 3+ services consume. Interfaces: * internal/repository/oidc.go - OIDCProviderRepository (List, Get, GetByName, Create, Update, Delete) + GroupRoleMappingRepository (ListByProvider, Get, Add, Remove, Map). Sentinels: ErrOIDCProviderNotFound, ErrOIDCProviderDuplicateName, ErrOIDCProviderInUse (FK ON DELETE RESTRICT translation), ErrGroupRoleMappingNotFound, ErrGroupRoleMappingDuplicate. * internal/repository/session.go - SessionRepository (Create, Get, ListByActor, UpdateLastSeen, Revoke, RevokeAllForActor, GarbageCollectExpired, Delete) + SessionSigningKeyRepository (List, GetActive, Get, Add, Retire, Delete). Sentinels: ErrSessionNotFound, ErrSessionRevoked, ErrSessionExpired, ErrSessionSigningKeyNotFound, ErrSessionSigningKeyInUse. * internal/repository/user.go - UserRepository (Get, GetByOIDCSubject, Create, Update, ListAll). Sentinels: ErrUserNotFound, ErrUserDuplicateOIDCSubject. Postgres implementations: * internal/repository/postgres/oidc.go - 309 lines. Translates SQLSTATE 23505 (unique_violation) to ErrOIDCProviderDuplicateName / ErrGroupRoleMappingDuplicate; SQLSTATE 23503 (foreign_key_violation) to ErrOIDCProviderInUse so the Phase 5 handler maps to HTTP 409 when an operator tries to delete a provider with authenticated users. pq.StringArray bridges Go []string to Postgres TEXT[] for scopes + allowed_email_domains. Map() uses `WHERE group_name = ANY($2)` so a single SELECT resolves N IdP group claims at once. * internal/repository/postgres/session.go - 350 lines. Both Session + SessionSigningKey repos. Revoke + Retire are idempotent (re-revoking an already-revoked session returns nil; same for retire). The GarbageCollectExpired sweep deletes both absolute-expiry-passed sessions AND pre-login rows older than the 10-minute TTL in one DELETE so the scheduler tick is cheap. ErrSessionSigningKeyInUse pinned via SQLSTATE 23503 from the sessions.signing_key_id FK ON DELETE RESTRICT. * internal/repository/postgres/user.go - 137 lines. GetByOIDCSubject is the Phase 3 hot-path lookup; the (oidc_provider_id, oidc_subject) UNIQUE constraint trip translates to ErrUserDuplicateOIDCSubject. Update only writes the mutable field set (email, display_name, last_login_at, webauthn_credentials); oidc_subject + oidc_provider_id are immutable per the per-(provider, subject) identity model. Integration tests (testing.Short()-gated, testcontainers + Postgres 16 Alpine, schema-per-test isolation via getTestDB().freshSchema): * oidc_test.go: 11 tests covering happy-path + GetNotFound + DuplicateName + List + Update + DeleteNotFound + DeleteSucceeds + DeleteRefusedWhenUsersReference (the FK ON DELETE RESTRICT pin); GroupRoleMapping coverage includes Add/List/Map (3 cases: marketing-not-mapped, multi-group hits, empty groups returns empty), Duplicate rejection, and the ON DELETE CASCADE on provider deletion. * session_test.go: 12 tests covering SessionSigningKey + Session. Key tests: GetActiveSkipsRetired (mints older, retires it, mints newer, asserts GetActive returns newer), DeleteRefusedWhenSessions- Reference (FK pin), RetireIsIdempotent. Session tests: CreateAndGet roundtrip, GetNotFound, Revoke + idempotent re-Revoke, ListByActor (3 active + 1 revoked + 1 pre-login -> returns 3, pinning the WHERE filter), RevokeAllForActor, GarbageCollectExpired (seeds an absolute-expired row + pre-login >10min row + active session via raw SQL to bypass CHECK constraints, asserts GC kills exactly 2 + active survives), UpdateLastSeen. * user_test.go: 7 tests covering CreateAndGet, GetNotFound, GetByOIDCSubject (hit + miss), DuplicateOIDCSubjectRejected, UpdateMutableFields (asserts oidc_subject NOT mutated by Update), ListAll, FKRestrictsProviderDelete (mirror of the OIDC test from the user side - both ends of the FK contract pinned). Verifications: * gofmt -l clean across all 9 new files. * go vet ./internal/repository/postgres/ rc=0. * go test -short -count=1 green on internal/repository/postgres/ + internal/auth/... + Bundle 1 packages (testing.Short() skips the testcontainers integration tests, but the test files compile + the short-mode skip path is exercised so the suite is wired correctly). * Full integration tests run in CI's non-short job against Postgres 16 Alpine via testcontainers-go. * govulncheck ./... clean. * All 24 ci-guards pass. Phase 2 exit criteria from cowork/auth-bundle-2-prompt.md (all met): * All three Phase-2 migrations apply cleanly, idempotently: yes (Phase 2a). Break-glass migration ships separately in Phase 7.5. * Repository tests pass against Postgres 16 Alpine: integration tests written, gated by testing.Short(), structured to run cleanly in CI's non-short job. * make verify equivalent green: gofmt + vet + go test pass; golangci-lint deferred to CI per Phase 0/1's same pattern. |
||
|
|
b0ac24fbf8 |
auth-bundle-2 Phase 1: OIDC + Session + User + Breakglass domain types
Phase 1 ships the persisted-shape types Bundle 2 needs end-to-end.
No DB migrations, no service layer, no HTTP handlers; Phase 2 ships
the SQL, Phase 3+ ship the consumers. Each type has a Validate()
method that enforces the on-disk invariants the schema will mirror,
and a focused _test.go that pins each invariant's failure mode.
Per-package summary:
internal/auth/oidc/domain/ (OIDCProvider + GroupRoleMapping):
* OIDCProvider carries the operator-configured IdP record. Fields
match the prompt's Phase 1 list plus IATWindowSeconds and
JWKSCacheTTLSeconds (Phase 3 references these by name; landing
them in Phase 1's domain type avoids the lying-field gap).
ClientSecretEncrypted is opaque from this layer; it is the v2 blob
produced by internal/crypto/encryption.go and is `json:"-"` so it
never wire-leaks.
* Validate() rejects: invalid id prefix, empty name, non-https
issuer_url (matches Phase 3's "JWKS endpoint MUST be HTTPS"),
empty client_id, empty client_secret_encrypted, non-https
redirect_uri, invalid groups_claim_format, scopes missing openid,
IAT window outside (0, 600], JWKS cache TTL below 60s. Defaults
applied in-place: GroupsClaimPath="groups", GroupsClaimFormat=
"string-array", Scopes=["openid","profile","email"],
IATWindowSeconds=300, JWKSCacheTTLSeconds=3600,
TenantID="t-default".
* GroupRoleMapping carries the operator-configured group-to-role
rule. Validate() pins prefix conventions ("grm-", "op-", "r-")
and non-empty group name.
* 18 tests across happy-path + every negative invariant.
internal/auth/session/domain/ (Session + SessionSigningKey):
* Session covers BOTH the post-login row (full 1h-idle/8h-absolute
cookie lifecycle) AND the Phase 5 pre-login row (10-minute TTL,
carries OIDC state+nonce+PKCE verifier across the IdP redirect).
IsPreLogin discriminates. CSRFTokenHash holds SHA-256 of the
CSRF token plaintext (the plaintext lives in a JS-readable
certctl_csrf cookie; storing only the hash on the row defends
against DB-read leaks per the Phase 4 CSRF contract).
* Validate() pins: id prefix "ses-", non-empty actor id/type,
signing key id prefix "sk-", AbsoluteExpiresAt strictly > Idle,
IdleExpiresAt strictly > CreatedAt, CSRFTokenHash exactly 64
lowercase hex chars when set.
* Cookie naming constants pinned by a separate test
(TestCookieNamingConstants) so a future rename can't silently
break the GUI's web/src/api/client.ts which reads these names by
string.
* SessionSigningKey stores the v2-encrypted HMAC key material; the
retired-before-created invariant catches malformed rows. 14
tests across both types.
internal/auth/user/domain/ (User):
* Federated-human identity for SSO logins. Distinct from Bundle 1's
free-form actor_id strings: actor_roles.actor_id = User.ID for
federated humans (per the prompt's note about how the two
identity systems intersect).
* WebAuthnCredentials JSONB column reserved for v3 (Decision 12);
defaults to "[]" on Validate() so Bundle 2 + v3 share the same
on-disk format from day one.
* Email validation is intentionally loose (basic shape: one @,
non-empty local + domain, no whitespace, dot in domain). RFC 5321
/ 5322 grammars are not enforced; the IdP issued the email and
we trust its shape, only rejecting gross corruption.
* 8 tests across happy-path + invalid-id + empty-email +
malformed-email + invalid-provider-id + tenant defaulting +
WebAuthn-credentials passthrough.
internal/auth/breakglass/domain/ (BreakglassCredential):
* Phase 7.5 type. Argon2id PHC-format password hash; Validate()
pins the Argon2id magic prefix so non-Argon2id formats (bcrypt,
pbkdf2, plaintext) are rejected at the persistence boundary.
* MinPasswordLengthBytes (12) + MaxPasswordLengthBytes (256)
constants pinned by a dedicated test so the operator-facing
password-strength contract can't drift silently.
* IsLocked(now) helper exposes the lockout state machine for the
Phase 7.5 service to consume; the lockout window default is
15min in the service layer.
* 9 tests across happy-path + per-invariant negative + lockout
state machine + tenant defaulting.
Cross-cutting:
* Every type has json:"-" on the encrypted-credential field
(ClientSecretEncrypted, KeyMaterialEncrypted, PasswordHash,
CSRFTokenHash) so even a misconfigured handler that marshals the
domain type directly into a response body cannot leak the
secret. Mirrors Bundle 1's pattern for issuer/target credentials.
* Every type carries TenantID with Validate() defaulting to
authdomain.DefaultTenantID. Forward-compat for the future
managed-service multi-tenant activation; Bundle 2 ships
single-tenant.
Verifications:
* gofmt -l clean across all 8 new files (one round-trip required to
satisfy Go 1.19+ doc-comment list-formatting rules in
session/domain/types.go).
* go vet clean on internal/auth/oidc/... + session/... + user/... +
breakglass/...
* go test -short -count=1 green on all four new domain packages
(49 test functions total).
* go test -short -count=1 still green on Bundle 1 packages
(internal/auth, internal/auth/bootstrap, internal/service/auth,
internal/config).
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.
Phase 1 exit criteria from cowork/auth-bundle-2-prompt.md:
* All types compile: yes.
* Validators have at least 5 test cases each: yes (smallest is
User with 8 tests; OIDCProvider has 13).
* make verify equivalent green: gofmt + vet + go test pass
(golangci-lint deferred to CI per the same operating-rule
pattern Phase 0 used).
|
||
|
|
2d9110b0c4 |
auth-bundle-2 Phase 0: dependency-add + oidc auth-type literal + runtime guard
Bundle 2 Phase 0 stages the dependencies + auth-type discriminator
literal that later phases consume. No handler chain wired yet; an
operator who sets CERTCTL_AUTH_TYPE=oidc on this commit gets a clear
refuse-to-start error rather than a silent fallback to api-key (the
G-1 failure mode that drove "jwt" out of the allowed set).
Deliverables:
* go.mod: github.com/coreos/go-oidc/v3 v3.18.0 added as a direct
require. Per the pre-bundle dependency audit (Apache-2.0, zero CVEs
ever per OSV.dev, 2,400+ stars, used by Hashicorp Vault + Dex +
Hydra + Authentik + every Kubernetes OIDC integration), this is the
ecosystem-standard Go OIDC client. Pinned to a specific minor
(v3.18.0) per the prompt's "no bare latest" rule.
* go.mod: golang.org/x/oauth2 promoted from // indirect to direct,
bumped from v0.34.0 to v0.36.0 by go mod tidy. Both versions are
OSV-clean. Maintained by the Go team.
* No JSON-path library added (forbidden by the dependency audit; the
group-claim resolver is hand-rolled in Phase 3).
* internal/config/config.go: AuthTypeOIDC constant added with a
load-bearing comment explaining (a) this is the AUTH-TYPE literal,
not a JWT alg literal, so the G-1 closure invariant is preserved
("jwt" stays out of ValidAuthTypes forever); (b) the runtime guard
in cmd/server/main.go intentionally refuses-to-start when oidc is
set pre-Phase-6 to avoid the silent-downgrade failure mode.
ValidAuthTypes() now returns {api-key, none, oidc}.
* internal/config/config_test.go: TestValidAuthTypesIsExactly_APIKey_None
renamed to TestValidAuthTypesIsExactly_APIKey_None_OIDC and now pins
the 3-entry set. TestValidAuthTypesDoesNotContainJWT (G-1 closure
test) still passes because "jwt" is never added back.
TestValidate_GenericInvalidAuthType's bad-types list updated:
"oidc" removed (now valid), "saml" added (correctly rejected per
Decision 5's SAML deferral).
* cmd/server/main.go: defense-in-depth runtime auth-type guard now
has an explicit AuthTypeOIDC case that exit(1)s with an actionable
message: "the OIDC auth chain is not yet wired in this build (Auth
Bundle 2 Phase 6 ships the session middleware that consumes this
auth-type literal)." This closes the lying-field gap the literal
would otherwise create. Phase 6 of Bundle 2 relaxes this case to
fall through alongside api-key + none.
* api/openapi.yaml: /v1/auth/info auth_type enum extended from
[api-key, none] to [api-key, none, oidc] with an in-line comment
explaining the Phase-0-vs-Phase-6 timing so an OpenAPI consumer
isn't surprised by "oidc" appearing here pre-Bundle-2-merge.
* deploy/helm/certctl/templates/_helpers.tpl::certctl.validateAuthType:
valid set extended to include "oidc". Chart-time validation now
passes for type=oidc; the binary's runtime guard takes over to
refuse the start. Once Bundle 2 ships, the runtime guard relaxes
and OIDC works end-to-end with no further chart edits.
* .env.example: CERTCTL_AUTH_TYPE comment block updated to document
the three valid values + the Phase-0-vs-Phase-6 timing.
* internal/auth/oidc/doc.go: new package directory with package doc
+ transitional blank imports for coreos/go-oidc/v3 + x/oauth2 so
go mod tidy keeps both deps as direct requires until Phase 3's
service.go replaces the blanks with real symbol use. Doc explains
the package layout (oidc/ + oidc/domain/ + oidc/groupclaim/ +
oidc/testfixtures/) so the post-Bundle-2 reader can navigate.
Verifications:
* gofmt clean on every changed file.
* go vet clean on internal/config + cmd/server + internal/auth/oidc.
* go test -short -count=1 green on internal/config (including the
G-1 closure + new validation tests), cmd/server, internal/auth (all
Bundle 1 packages), internal/service/auth.
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.
Phase 0 exit criteria from cowork/auth-bundle-2-prompt.md:
* go.mod shows coreos/go-oidc/v3 as direct: yes.
* golang.org/x/oauth2 is direct (not indirect): yes.
* govulncheck ./... clean: yes.
* No JSON-path library in go.mod / go.sum deltas: confirmed (only
v3 of go-oidc + the x/oauth2 bump landed).
* make verify green: gofmt + vet + go test pass; full make verify
(which would invoke golangci-lint) deferred to CI since the
sandbox doesn't have golangci-lint installed; the operator runs
make verify locally before pushing per CLAUDE.md operating rule.
|
||
|
|
5d79e53ad0 |
auth-bundle-1 follow-on: close coverage gaps to clear Phase 12 floors
CI run #486 (post-Bundle-1 merge + Go 1.25.10 bump) failed three coverage-threshold gates: internal/api/handler 74.7% < floor 75 (-0.3pp) internal/auth 66.3% < floor 85 (-18.7pp) internal/service/auth 51.1% < floor 85 (-33.9pp) The Phase 12 gate file's "85% with negative-test coverage" claim turned out to be aspirational — the read-side and Update-path methods on RoleService / PermissionService / ActorRoleService had zero unit-test coverage, and internal/auth's keystore + HasPermission helper had zero tests. This commit closes the gap without lowering the gate. Per-package CI-style averages after this commit (per scripts/check-coverage-thresholds.sh's per-function-mean): internal/api/handler 76.1% (+1.4pp, margin +1.1pp) internal/auth 90.5% (+24.2pp, margin +5.5pp) internal/service/auth 93.7% (+42.6pp, margin +8.7pp) Tests added: internal/service/auth/service_test.go (+18 tests, +518 LOC): PermissionService.List, PermissionService.GetByName, RoleService.Get (4 paths), RoleService.List (system caller), RoleService.Update (4 paths), RoleService.ListPermissions (3 paths), RoleService.AddPermission/RemovePermission round-trip + gate paths, RoleService.Delete (success + nil-caller + no-perm + audit), RoleService.Create (nil-caller), ActorRoleService.ListForActor (self-bypass + cross-actor + nil-caller + system + with-perm), ActorRoleService.Effective- Permissions (same shape), ActorRoleService.ListKeys (3 paths + system bypass), ActorRoleService.Revoke (4 paths), Authorizer edge cases (empty actorID short-circuit, empty tenantID default, scoped-grant-without-scope-id no-match invariant, repo-error wrap-and-return, HoldsAnyOf early-exit), recordAudit nil-arm short-circuits. internal/auth/keystore_test.go (NEW, +175 LOC): StaticKeyStore.Len, StaticKeyStore.LookupByHash hit + miss, MutableKeyStore seeded lookup + Len, Add registers new key, AddHashed registers from precomputed hash, AddHashed replaces on duplicate hash (idempotent boot-loader contract), HasPermission no-actor / default-actor-type / checker-error / scoped-check threading. internal/auth/bootstrap/service_test.go (+36 LOC): Service.Available nil-receiver/nil-strategy short-circuit, Service.Available delegates to Strategy when configured. internal/api/handler/auth_test.go (+208 LOC): GetRole returns role + permissions, GetRole 404 + 401, UpdateRole 200 + invalid-JSON-400 + 401, ListKeys returns actor list + 401, RemoveRolePermission 204 (global + scoped) + 401, rolePermToResponse scope encoding pin via GetRole. Verified: gofmt -l . clean (touched files only). go vet ./internal/auth/... ./internal/service/auth/... ./internal/api/handler/ rc=0. go test -count=1 -short on the four packages green. CI-style per-function averages computed via the live scripts/check-coverage-thresholds.sh arithmetic — all three gated packages clear their floors with margin. Per CLAUDE.md "complete path" + "do not lower the gate to make CI green": gate file unchanged. The 85/85/75 floors stand. |
||
|
|
3e91c7a1f0 |
chore(security): bump Go toolchain 1.25.9 -> 1.25.10 + golang.org/x/net 0.49 -> 0.53
CI run #484's Go Build & Test job failed govulncheck (M-024 hard gate). Six standard-library CVEs land in go1.25.9 + one golang.org/x/net CVE in v0.49.0; all are fixed in go1.25.10 + x/net v0.53.0 respectively. The advisories that fired were: GO-2026-4986 Quadratic string concat in net/mail.consumeComment — called via internal/api/handler/validation.go's ValidateCommonName -> mail.ParseAddress GO-2026-4977 Quadratic string concat in net/mail.consumePhrase — same call site GO-2026-4982 Bypass of meta-content URL escaping in html/template — called via internal/service/digest.go's RenderDigestHTML -> Template.Execute GO-2026-4980 Escaper bypass in html/template — same call site GO-2026-4971 Panic in net.Dial / LookupPort on Windows NUL bytes — many call sites (email notifier, SSH connector, ACME validators, validation.ValidateSafeURL, ...) GO-2026-4918 Infinite loop in net/http2 transport on bad SETTINGS_MAX_FRAME_SIZE — called via internal/connector/target/f5.go's F5Client.Authenticate -> http.Client.Do Bumps applied: * `go.mod`: `go 1.25.9` -> `go 1.25.10`; `golang.org/x/net v0.49.0` -> `v0.53.0` (kept indirect — the upgrade is force-pulled by the module-version directive; transitive deps will pick the higher). * `.github/workflows/{ci,codeql,release}.yml`: setup-go pin and the release.yml `GO_VERSION` env var bumped to 1.25.10. The security-deep-scan.yml workflow uses the major-minor `1.25` pin which auto-resolves to the latest 1.25.x and is unaffected. * `Dockerfile` + `Dockerfile.agent`: `golang:1.25-alpine@sha256:5caa...` re-pinned to `golang:1.25.10-alpine@sha256:8d22e29d960bc50cd0...` (digest looked up against `registry-1.docker.io/v2/library/golang/ manifests/1.25.10-alpine`; verified by the digest-validity ci-guard). The explicit `1.25.10-alpine` tag form replaces the moving `1.25-alpine` pin so the image-spec is reproducible end-to-end even without the digest reference. * `deploy/test/f5-mock-icontrol/Dockerfile`: `golang:1.25.9-bookworm @sha256:1a14...` re-pinned to `golang:1.25.10-bookworm@sha256: e3a54b77385b4f8a31c1...` (looked up the same way). * `deploy/test/f5-mock-icontrol/go.mod`: `go 1.25.9` -> `go 1.25.10`. * `internal/api/handler/version.go` + `api/openapi.yaml`: the `runtime.Version()`-shape comment + OpenAPI `example: go1.25.9` bumped to keep doc/example freshness. * `docs/contributor/ci-pipeline.md` + `docs/reference/connectors/ iis.md`: doc-only `Go 1.25.9` -> `Go 1.25.10` references. Verification done in-tree: * All `scripts/ci-guards/*.sh` pass locally including `digest-validity.sh` (the new digests resolve cleanly against Docker Hub). * `S-1-hardcoded-source-counts.sh` clean (the false-positive on "Bundle 1 migrations" was fixed in the prior commit). Operator step required post-push (sandbox has no Go toolchain): cd certctl && go mod tidy This regenerates go.sum's `golang.org/x/net v0.49.0` h1: lines into v0.53.0 ones. CI's `go mod tidy && git diff --exit-code go.mod go.sum` step will catch the drift if missed; in that case run the command, commit, and push the go.sum-only delta. |
||
|
|
06cea1ce0f |
auth-bundle-1 Phase 12 follow-up: in-tree TODO for path-12 deferral
Self-audit on
|
||
|
|
cbb47aaf5d |
auth-bundle-1 Phase 11 + 12: RBAC MCP tools + negative-test coverage gate
# Phase 11 — RBAC MCP tools
12 new tools in internal/mcp/tools_auth.go mirroring the Phase-4
+ Phase-7 HTTP surface so operators driving certctl from Claude
/ VS Code / any MCP client get the same management capability
the GUI + CLI already expose:
certctl_auth_me GET /v1/auth/me
certctl_auth_list_roles GET /v1/auth/roles
certctl_auth_get_role GET /v1/auth/roles/{id}
certctl_auth_create_role POST /v1/auth/roles
certctl_auth_update_role PUT /v1/auth/roles/{id}
certctl_auth_delete_role DELETE /v1/auth/roles/{id}
certctl_auth_list_permissions GET /v1/auth/permissions
certctl_auth_add_permission_to_role POST /v1/auth/roles/{id}/permissions
certctl_auth_remove_permission_from_role DELETE /v1/auth/roles/{id}/permissions/{perm}
certctl_auth_list_keys GET /v1/auth/keys
certctl_auth_assign_role_to_key POST /v1/auth/keys/{id}/roles
certctl_auth_revoke_role_from_key DELETE /v1/auth/keys/{id}/roles/{role_id}
Each tool routes through the existing HTTP client (no parallel
business logic), so permission gates fire server-side: a
non-admin caller's MCP tool invocation returns whatever 403 the
underlying HTTP handler emits, fenced via errorResult for LLM-
prompt-injection defense.
Input types in internal/mcp/types.go (AuthRoleIDInput,
AuthCreateRoleInput, AuthUpdateRoleInput,
AuthRolePermissionGrantInput, AuthRolePermissionRevokeInput,
AuthAssignKeyRoleInput, AuthRevokeKeyRoleInput) carry
jsonschema descriptions so the MCP consumer's tool catalogue
shows operator-friendly hints.
internal/mcp/tools_auth_test.go ships 14 tests:
- TestAuthMCP_AllToolsRegister (registration must not panic)
- TestAuthMCP_PathsAndMethods (table-driven, 12 rows pinning
each tool's HTTP method + URL)
- TestAuthMCP_ForbiddenSurfacesFencedError (12 tools × 403
mock → error surface)
internal/mcp/tools_per_tool_test.go's allHappyPathCases extended
with the 12 new rows so the in-memory dispatch coverage gate
(TestMCP_RegisterTools_DispatchableToolCount) stays green at the
new total of 139 registered tools.
Re-derived total via 'grep -cE "gomcp\.AddTool\(" internal/mcp/tools*.go':
133 (121 in tools.go + 12 in tools_auth.go).
# Phase 12 — negative-test coverage gate
Audit of the prompt's 12 negative-test paths against existing
coverage:
1. Missing actor → 401 ✓ TestRequirePermission_NoActorReturns401, TestRBACGate_NoActorReturns401
2. No roles → 403 ✓ TestRequirePermission_DeniedActorReturns403, TestRBACGate_AuditorRole_403sOnAdminRoutes
3. Role lacks specific perm → 403 ✓ same suite
4. Wrong scope → 403 ✓ TestAuthorizer_SpecificScopeMatchesExactID (wrongID arm)
5. Self-grant w/o auth.role.assign → 403 ✓ TestActorRoleService_GrantRequiresAuthRoleAssign
6. Bootstrap token wrong → 401 ✓ TestEnvTokenStrategy_WrongTokenReturnsInvalidToken, TestBootstrapHandler_Mint_WrongToken_401
7. Bootstrap used twice → 410 ✓ TestEnvTokenStrategy_OneShotConsumption, TestBootstrapHandler_Mint_TwiceReturns410
8. Bootstrap when admin exists → 410 ✓ TestEnvTokenStrategy_AdminExistsClosesPath, TestBootstrapHandler_Mint_AdminExists410
9. Role delete with assignees → 409 NEW: TestRoleService_DeleteWithActorsAssignedReturns409
10. Profile-edit loophole → gated ✓ TestProfileEdit_RequiresApprovalLoopholeClosed
11. Permission not in catalog → 400 ✓ TestRoleService_AddPermissionRejectsNonCanonical
12. Scope ID for nonexistent resource → 404 (validation deferred — no FK constraint between role_permissions.scope_id and the resource tables; documented for a future bundle)
Filled the gap at #9 with TestRoleService_DeleteWithActorsAssignedReturns409
which pins the repository sentinel pass-through (postgres FK
ON DELETE RESTRICT → repository.ErrAuthRoleInUse → service
returns the sentinel verbatim → handler maps to HTTP 409).
# Coverage gates
.github/coverage-thresholds.yml gains 2 entries:
- internal/auth: floor 85
- internal/service/auth: floor 85
.github/workflows/ci.yml's coverage test command extended with
./internal/auth/... and ./internal/api/router/... so the
threshold check has data to evaluate.
# Protocol-endpoint not-gated test (Category F)
internal/api/router/phase12_protocol_allowlist_test.go (new)
adds 3 router-level invariant tests:
- TestPhase12_ProtocolEndpointsNotGated: AST-walks router.go,
asserts no rbacGate(...) call references a path under any
protocol-endpoint prefix (/acme, /scep, /.well-known/est,
/.well-known/pki/ocsp, /.well-known/pki/crl).
- TestPhase12_IsProtocolEndpoint_CoversCanonicalPrefixes:
pins auth.IsProtocolEndpoint against the canonical prefix
set; if a future protocol lands without lockstep allowlist
update, this fails.
- TestPhase12_RBACGateRoutesAreUnderAPIv1: belt-and-braces —
every rbacGate-wrapped route MUST start with /api/v1/.
Catches accidental cross-prefix wraps.
Complements the existing TestRequirePermission_ProtocolEndpointBypassesGate
(middleware-level) + TestRouter_AuthExemptAllowlist_PinsActualRegistrations
(allowlist drift) so the Category F invariant is pinned at all
three layers (middleware + router + dispatch).
# Verifications
* gofmt clean repo-wide.
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
service + repository + cmd + domain + mcp: clean.
* go test -short -count=1 green across internal/auth (incl.
bootstrap), internal/api/handler, internal/api/router,
internal/cli, internal/service (incl. auth),
internal/domain/auth, internal/mcp, cmd/server, cmd/cli.
|
||
|
|
69a508dfcf |
auth-bundle-1 Phase 9 + 10: approval-bypass closure + RBAC GUI
# Phase 9 — approval-bypass closure (Decision 9, option a)
* Migration 000033_approval_kinds.up.sql: ALTER TABLE
issuance_approval_requests ADD COLUMN approval_kind +
payload JSONB; relax certificate_id + job_id to nullable;
CHECK (approval_kind IN ('cert_issuance','profile_edit'))
+ CHECK (per-kind nullability invariant) + index on
approval_kind. Idempotent throughout via DO blocks.
* domain.ApprovalKind enum (cert_issuance / profile_edit) +
IsValidApprovalKind. ApprovalRequest gains Kind +
Payload []byte for the pending profile diff.
* postgres.ApprovalRepository.Create + scanApprovalRow extended
to round-trip the new columns; certificate_id + job_id
switched to sql.NullString so profile_edit rows persist
cleanly. Default Kind=cert_issuance preserves back-compat
for every Phase-7-2026-05-03 caller.
* ApprovalService.RequestProfileEditApproval: new entry point
that creates a pending profile-edit row carrying the
serialized profile diff. Bypass mode (CERTCTL_APPROVAL_BYPASS)
short-circuits the same way it does for cert_issuance.
* ApprovalService.SetProfileEditApply hook: cmd/server/main.go
registers a closure that deserializes req.Payload + persists
via profileRepo.Update + emits a profile.edit_applied audit
row with category=auth. The hook avoids the Approval ↔
Profile import cycle.
* ProfileService.UpdateProfile: gates when (a) the live
profile carries RequiresApproval=true, OR (b) the proposed
edit would set it true. Returns ErrProfileEditPendingApproval
with the new approval ID; ProfileHandler maps to HTTP 202
Accepted + {pending_approval_id}. Both arms close the
flip-flop loophole because every transition through an
approval-tier profile fires the gate.
* TestProfileEdit_RequiresApprovalLoopholeClosed pins all 3
bypass attempts (flip-off / kept-on / flip-on) gated; nil-
approval-service preserves pre-Phase-9 direct-apply for
test fixtures.
* Approval service tests gain 4 profile_edit rows: pending row
shape; same-actor self-approve rejected with
ErrApproveBySameActor (load-bearing two-person integrity);
approve fails-closed when apply callback unwired;
apply callback invoked on approve.
* docs/reference/profiles.md (new) explains the gate +
edit response shape (202) + same-actor invariant + bypass
+ audit hooks.
# Phase 10 — RBAC management GUI
* useAuthMe hook (web/src/hooks/useAuthMe.ts): TanStack Query
fetches /api/v1/auth/me on app boot, caches for 60s, exposes
hasPerm(p) + hasAnyPerm + isAdmin predicates. Every Phase-10
page consumes this on mount + gates affordances against the
cached effective_permissions slice. Server-side enforcement
is the load-bearing gate; client-side hide/disable is UX.
* New routes:
- /auth/roles — list (auth.role.list); create-role modal
(auth.role.create) hidden when missing.
- /auth/roles/:id — detail + permissions; edit
(auth.role.edit), delete (auth.role.delete), add/remove
permission affordances each gated.
- /auth/keys — list of every actor with role grants; assign
+ revoke modals (auth.role.assign). actor-demo-anon
flagged system-managed; mutation buttons hidden for it.
- /auth/settings — stub showing /v1/auth/me identity +
bootstrap-endpoint availability via /v1/auth/bootstrap.
* AuditPage extended with category filter ('All categories'
+ the 3 enum values from migration 000032). Selection flows
to the API call params + the URL-driven query state.
* Layout: 3 new nav entries (Roles / API Keys / Auth Settings).
* api/client.ts: 12 new exported functions for the RBAC
surface (authMe, list/get/create/update/delete role,
list/add/remove role permissions, list keys, assign/revoke
key role, bootstrap-availability probe).
* data-testid attributes on every interactive element so a
future Playwright suite can assert behavior without brittle
CSS selectors.
* Empty state, error state, and unsaved-changes warnings on
every form per the prompt's implementation rules.
# Frontend tests
* RolesPage.test.tsx (6 tests): list render, empty state,
error state, hide-create-button-without-perm,
show-create-button-with-perm, submit-create-modal.
* KeysPage.test.tsx (3 tests): demo-anon flagged
system-managed (no buttons), permission-gated affordance
hide for auditor caller, assign-modal-POST contract.
* AuthSettingsPage.test.tsx (2 tests): identity surface,
bootstrap-OPEN-status surface.
* AuditPage.test.tsx (+1): category-filter select renders
with the 4 documented options.
15 frontend tests total in src/pages/auth/ + the audit
category-filter test; all pass via npx vitest run.
# Verifications
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
service + repository + cmd + domain: clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/service,
internal/api/handler, internal/api/router, internal/auth,
internal/auth/bootstrap, internal/service/auth,
internal/domain/auth, cmd/server, cmd/cli, internal/cli.
* npx tsc --noEmit clean.
* npm run build green (vite build produces dist/index.html
+ 946KB JS bundle; chunk-size warning is pre-existing).
* npx vitest run src/pages/auth/ src/pages/AuditPage.test.tsx
green (15 tests, 4 files).
|
||
|
|
af4fa12724 |
auth-bundle-1 Phase 8 follow-up: classify issuer/target audit rows + auditor end-to-end tests + gofmt drift
Self-audit caught five real gaps in 3ef45e2; this commit closes them. # Phase 8 — issuer/target audit rows now classified as 'config' The Phase 8 prompt explicitly required existing config-mutation calls (issuer config, target config, etc.) to write event_category=config. The |
||
|
|
3ef45e2ad4 |
auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap * internal/auth/bootstrap/ (new package): Strategy interface + EnvTokenStrategy with constant-time compare, one-shot consumption via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC- first-admin will plug in alongside as an alternate Strategy. * BootstrapService.ValidateAndMint: validates the operator's CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API key value, persists the SHA-256 hash to api_keys, grants r-admin via actor_roles, AddHashed's the runtime keystore so the just- minted key authenticates the next request without restart, and records bootstrap.consume to the audit trail with category=auth. * internal/auth/keystore.go (new): KeyStore interface + StaticKeyStore (immutable env-var-only path) + MutableKeyStore (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth middleware now consumes a KeyStore so the bootstrap path can extend the lookup table at runtime. * migrations/000031_api_keys.up/down.sql: api_keys table with (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by, created_at, expires_at, last_used_at). Idempotent. * /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both routes documented in api/openapi.yaml + AuthExemptRouterRoutes allowlist updated. The token never leaves internal/auth/bootstrap; the minted plaintext key flows only into the HTTP response body. * Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND admin actors already exist (config drift signal). * Tests: 4 strategy invariants (empty token born disabled, wrong token=ErrInvalidToken without consumption, one-shot consumption, admin-exists closes path), 5 service tests (happy path + actor- name validation + propagation of strategy errors + nil-deps guard + 32-byte entropy budget), 8 HTTP-handler tests (status 201/410/401/400 mapping + token-leak hygiene scan of slog + audit details + Location header). Token-leak test redirects slog.Default to a buffer for the test scope. # Phase 7 — API-key migration + scope-down CLI * GET /v1/auth/keys handler + service method ListKeys backed by ActorRoleRepository.ListDistinctActors. Returns one row per (actor_id, actor_type) pair with the slice of role IDs they hold. Permission: auth.role.list. * internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown (interactive), AuthScopeDownNonInteractive (JSON config), AuthScopeDownSuggest (--suggest with optional --apply). The synthetic actor-demo-anon is filtered out of every interactive / bulk path; non-interactive flow logs and skips it explicitly. * SuggestRoleFromAuditEvents (pure function): walks 30 days of audit events per actor and returns the narrowest matching role (admin / mcp / viewer / agent / operator) plus a one-line reason. Classification: any admin-shaped action wins; otherwise all-MCP → mcp; all-read-only → viewer; all-agent-shaped → agent; otherwise operator. Test table pins all six classifications. * CLI subcommand tree extended: 'auth keys list' + 'auth keys scope-down [--non-interactive <cfg>] [--suggest [--apply]]'. * CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS call-out + four flow examples. # Phase 8 — auditor role + event_category column * migrations/000032_audit_category.up/down.sql: ALTER TABLE audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT 'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config) + (event_category) and (event_category, timestamp DESC) indexes for the auditor-filter query path. WORM trigger from migration 000018 continues to enforce append-only at the DB layer (DDL is not blocked). * domain.AuditEvent gains EventCategory string (omitempty); domain.EventCategoryCertLifecycle / Auth / Config constants. * AuditService.RecordEventWithCategory sibling of RecordEvent; legacy callers stay on RecordEvent (defaults to cert_lifecycle). Auth callers (RoleService, ActorRoleService, BootstrapService) switched to RecordEventWithCategory(..., 'auth', ...). * GET /v1/audit?category=<cat>: handler accepts the optional query param, validates against the enum (400 on invalid value), dispatches through ListAuditEventsByCategory. OpenAPI updated with the new query param + AuditEvent.event_category schema. * Postgres AuditRepository.Create now writes event_category; AuditRepository.List filters on it; AuditFilter.EventCategory gates the WHERE clause. * Tests: 5 audit-category-filter HTTP tests (dispatch routing, back-compat fallback, 400 for invalid values, all 3 enum values accepted, page+category combine, JSON output surfaces the field). 3 auditor-role invariants (auditor holds exactly audit.read+audit.export, no mutating perms, disjoint from viewer except audit.read). # Cross-phase wiring * HandlerRegistry.Bootstrap field added; cmd/server/main.go wires the bootstrap service ahead of RegisterHandlers (extracted assembleNamedAPIKeys helper into auth_backfill.go, moved the keystore + bootstrap construction up alongside the auth repos). * AuthCheckResolver / AuthActorRoleService extended with ListKeys to satisfy the Phase 7 surface; existing fakes updated. * fakeAudit + mockAuditService stubs in tests gain RecordEventWithCategory + ListAuditEventsByCategory; existing tests untouched. # Verifications * gofmt -l: clean across every modified file. * go vet ./...: clean. * staticcheck across internal/auth + handler + router + cli + service + repository + cmd + domain: clean. * go test -short -count=1: green across every Bundle-1-touched package — internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service/auth, internal/service, internal/domain/auth, internal/repository/postgres, cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware, cmd/agent, internal/mcp. |
||
|
|
60a589ab96 |
auth-bundle-1 Phase 0-5 closure: demo-mode wire, named-key backfill, AuthCheck enrichment, OpenAPI schema, intermediate-ca comment refresh
Closes the 5 gaps the post-Phase-5 audit flagged on dev/auth-bundle-1.
C1: cmd/server/main.go now selects auth.NewDemoModeAuth() when
CERTCTL_AUTH_TYPE=none and falls back to auth.NewAuthWithNamedKeys
otherwise. Pre-closure, the no-op pass-through that
NewAuthWithNamedKeys returns for empty keys would have left
ActorIDKey / ActorTypeKey / TenantIDKey unpopulated and 401'd
every Phase-3.5 rbacGate-wrapped admin route + every Phase-4
RBAC handler in demo deployments. NewDemoModeAuth injects the
synthetic 'actor-demo-anon' actor seeded by migration 000029,
which holds r-admin at global scope.
C2: backfillNamedKeyActorRoles startup hook (cmd/server/auth_backfill.go)
iterates CERTCTL_API_KEYS_NAMED entries (and legacy
CERTCTL_AUTH_SECRET synthesized fallbacks) and grants r-admin
or r-viewer to each via authActorRoleRepo.Grant before the
HTTP server starts accepting requests. Idempotent via
ON CONFLICT DO NOTHING in the repo. Failures log a warning but
are non-fatal — the server still starts and the operator can
fix grants via /v1/auth/keys. Helper extracted from main.go so
the role-mapping invariant is pinned by 4 focused unit tests
(admin->r-admin, non-admin->r-viewer, empty no-op,
grant-error non-fatal, nil-logger safe).
M1: HealthHandler.AuthCheck now returns actor_id, actor_type,
tenant_id, roles, effective_permissions, and admin_via_role
when the optional AuthCheckResolver is wired (production path:
authCheckResolverAdapter wraps the postgres ActorRoleRepository
in main.go). Nil resolver preserves the legacy {status, user,
admin} contract for back-compat with pre-Bundle-1 GUIs and
test fixtures. Adds 2 regression tests + 1 fake resolver shim.
M2: refreshes the stale 'Admin gate: every method calls
auth.IsAdmin first' comment on IntermediateCAHandler — the gate
moved to router.go::rbacGate via auth.RequirePermission
middleware in Phase 3.5; the new comment block points readers
there.
M4: 11 RBAC routes (auth/me, auth/permissions, 5 role lifecycle,
2 role-permission grant/revoke, 2 actor-role grant/revoke) added
to api/openapi.yaml under the [Auth] tag with operationIds and
shared AuthRole / AuthRolePermission schemas. AuthCheck path
extended with the Bundle-1 enrichment fields. The 11 entries
removed from openapi_parity_test.go::SpecParityExceptions.
Tests: go vet + staticcheck + go test -short -count=1 green
across cmd/server/, internal/auth/, internal/api/router/, and
internal/api/handler/. New tests: 4 backfill unit tests,
2 AuthCheck M1 enrichment tests, 1 demo-mode + rbacGate chain
integration test (TestRBACGate_DemoModeChainReachesHandler).
Branch SECURITY.md (cowork/auth-bundle-1-SECURITY.md, not part
of this commit) captures the full posture of dev/auth-bundle-1
as of this closure for the operator's pre-merge review.
|
||
|
|
7ff2e2de08 |
auth-bundle-1 Phase 3.5: handler IsAdmin -> router-wrapped RequirePermission
Phase 3.5 atomic conversion. The five legacy admin-gated handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) had their in-body auth.IsAdmin checks removed; the gate moved to router.go via auth.RequirePermission middleware wrapping each route. Non-admin operators with the right scoped permission can now reach these endpoints; legacy in-body admin checks no longer block them.
Migration 000030_rbac_admin_perms.up.sql ships five admin-only fine-grained permissions: cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage. All five are seeded into r-admin only; operator/viewer/agent/mcp/cli/auditor do not receive them by default. Operators can grant any of these to a custom role via the Phase 4 RBAC API. Idempotent + transaction-wrapped.
internal/domain/auth/validate.go::CanonicalPermissions extended with the five new entries so RoleService.AddPermission accepts them.
internal/api/router/router.go: HandlerRegistry gains a Checker field (auth.PermissionChecker). New rbacGate(checker, perm, handler) helper wraps a handler with auth.RequirePermission middleware; nil-checker fall-through preserves test/demo deployments without the RBAC stack. 12 admin routes wrapped: cert.bulk_revoke (POST /api/v1/certificates/bulk-revoke + POST /api/v1/est/certificates/bulk-revoke), crl.admin (GET /api/v1/admin/crl/cache), scep.admin (GET /api/v1/admin/scep/profiles + GET /api/v1/admin/scep/intune/stats + POST /api/v1/admin/scep/intune/reload-trust), est.admin (GET /api/v1/admin/est/profiles + POST /api/v1/admin/est/reload-trust), ca.hierarchy.manage (POST /api/v1/issuers/{id}/intermediates + GET /api/v1/issuers/{id}/intermediates + POST /api/v1/intermediates/{id}/retire + GET /api/v1/intermediates/{id}).
cmd/server/main.go: HandlerRegistry.Checker wired with the same authPermissionCheckerAdapter shim Phase 4 introduced for AuthHandler. Same adapter; one source of truth.
Handler bodies: removed eight in-body auth.IsAdmin checks across the 5 files. bulk_revocation.go's BulkRevoke + BulkRevokeEST, admin_crl_cache.go::ListCache, admin_scep_intune.go's three methods, admin_est.go's two methods, intermediate_ca.go's four methods. Replaced each with a comment naming the new gate location. Unused 'github.com/certctl-io/certctl/internal/auth' imports removed.
Test triplet rewrite: deleted obsolete _NonAdmin_Returns403 and _AdminExplicitFalse_Returns403 tests across 6 test files (5 handler tests + bulk_revocation_est_test.go) — they tested the now-removed in-body gate. _AdminPermitted_ForwardsActor tests stay intact: they pin the actor-passthrough invariant which is still relevant. Added internal/api/router/rbac_gate_integration_test.go with four router-level integration tests pinning the new gate: deny → 403 + handler not reached, permit → 200 + handler reached, nil-checker → fall-through, no-actor → 401.
M-008 admin-gate registry: AdminGatedHandlers map now empty (Phase 3.5 invariant: zero in-handler auth.IsAdmin call sites; only health.go's informational caller remains). m008_admin_gate_test.go retains the scan to enforce the invariant going forward; new admin-gated routes must wrap at router.go::rbacGate, not gate in-handler. Updated error message to direct future contributors to the new pattern.
Verifications: gofmt clean across all touched files; go vet ./... clean; go test -short across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, cmd/server all green.
Branch: dev/auth-bundle-1. Commit chain:
|
||
|
|
b169f258de |
auth-bundle-1 Phase 4 + 5: RBAC HTTP API + CLI surface
Phase 4 (HTTP API): * internal/api/handler/auth.go: AuthHandler with 12 endpoints under /api/v1/auth/* — ListRoles, GetRole, CreateRole, UpdateRole, DeleteRole, ListPermissions, AddRolePermission, RemoveRolePermission, AssignRoleToKey, RevokeRoleFromKey, Me. callerFromRequest builds an authsvc.Caller from the Phase 3 ActorIDKey/ActorTypeKey/TenantIDKey context values. writeAuthError translates service + repository sentinels into HTTP status codes (401/403/404/409/400/500). 14 handler tests with in-memory fakes pin the HTTP shape + error mapping. * internal/api/router/router.go: HandlerRegistry gains an Auth field; 11 new routes registered. openapi_parity_test SpecParityExceptions extended with the new auth routes (OpenAPI YAML schema land in a Phase 4 follow-up commit so the schema review is its own atomic change; the route shape is fully documented inline via the Go type definitions until then). * cmd/server/main.go: wires the postgres auth repos (RoleRepository, PermissionRepository, ActorRoleRepository) + the Authorizer + RoleService/PermissionService/ActorRoleService into the new AuthHandler. Adds authPermissionCheckerAdapter to bridge the typed-string Authorizer signature to the auth.PermissionChecker interface (avoids an internal/auth → internal/service/auth import cycle). Phase 5 (CLI): * cmd/cli/main.go: adds 'auth' command dispatch with subcommands roles/permissions/keys/me. * internal/cli/auth.go: AuthMe, AuthListRoles, AuthGetRole, AuthListPermissions, AuthAssignRoleToKey, AuthRevokeRoleFromKey methods on Client. Mirrors the Phase 4 HTTP surface. Phase 3.5 (handler IsAdmin → middleware-wrapped RequirePermission) DEFERRED. Honest reasoning: (1) The 5 admin handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) currently gate via auth.IsAdmin checks INSIDE the handler bodies. Converting cleanly requires moving the gate to the router (auth.RequirePermission middleware wrap) AND removing the in-handler check AND rewriting the existing 3-test triplets per handler (M-008 pinned: _NonAdmin_Returns403 / _AdminExplicitFalse_Returns403 / _AdminPermitted_ForwardsActor) because the existing tests call the handler function directly, bypassing middleware. After conversion, those tests would pass without 403'ing because the gate moved away — the test invariants need to flow through a router-level integration setup instead. (2) Picking the right permission per handler is a security-review-worthy decision. Using existing operator-class perms (cert.revoke, issuer.edit) widens access from admin-only to operator-class; adding new admin-only perms (cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage) requires a migration 000030 plus a coordinated catalogue update in internal/domain/auth/validate.go. Both options are defensible but warrant a focused commit, not a 5-handler sweep mixed in with the API + CLI work. (3) The conversion can be done now without functional regressions IF we leave the in-handler IsAdmin checks in place AND add middleware wraps as defense-in-depth — but that's the worst of both worlds (legacy gate still blocks non-admin operators, defeating the point of RBAC; new gate adds runtime cost with no semantic change). A clean conversion needs the in-handler check removed. Concrete plan for Phase 3.5 (separate commit, next session): (a) add new admin-only perms via migration 000030 OR document the widening to operator-class; (b) wrap each of the 5 admin routes with auth.RequirePermission(checker, perm, nil) in router.go; (c) remove auth.IsAdmin checks from the 5 handler bodies; (d) move the M-008 _NonAdmin/_AdminExplicitFalse tests to router-level integration tests, keep _AdminPermitted as a direct handler test for actor-passthrough; (e) update m008_admin_gate_test.go registry to track auth.RequirePermission middleware wraps in router.go instead of auth.IsAdmin call sites in handler files. Verifications: go vet ./... clean; gofmt clean across all touched files; go test -short -count=1 across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, internal/cli, cmd/server, cmd/cli all green (one transient too-many-open-files retry on internal/cli + internal/api/router; second run clean). Branch: dev/auth-bundle-1. Commit chain: |
||
|
|
d473398aba |
auth-bundle-1 Phase 3 (primitive): RequirePermission middleware + demo-mode + protocol allowlist
Bundle 1 / Phase 3 (primitive ship): the load-bearing RBAC middleware factory plus its dependencies. Handler conversion sweep (5 admin files: bulk_revocation.go, admin_crl_cache.go, admin_scep_intune.go, admin_est.go, intermediate_ca.go) + m008_admin_gate_test.go registry update is Phase 3.5 follow-on; this commit ships the primitive so 3.5 is mechanical. New context keys (internal/auth/context.go): ActorIDKey, ActorTypeKey, TenantIDKey alongside the legacy UserKey + AdminKey. New helpers GetActorID / GetActorType / GetTenantID with safe fallbacks (UserKey for actor id, ActorTypeAPIKey for missing type, DefaultTenantID for missing tenant). Constants DemoAnonActorID + ActorTypeAPIKey + ActorTypeAnonymous mirror internal/domain/auth without an import cycle. RequirePermission factory (internal/auth/require_permission.go): wraps a handler and gates it behind a named permission. 401 when no actor, 403 when actor lacks permission, 500 on repository error. Skips the gate entirely for protocol endpoints (ACME / SCEP / EST / OCSP / CRL) per the audit's Category F do-not-gate allowlist. PermissionChecker is an interface so internal/auth doesn't depend on internal/service/auth (cmd/server wires the concrete Authorizer at startup). HasPermission is the imperative variant for handlers that branch behaviour rather than 403'ing. ScopeFunc closure extracts the scope type + id from the request for per-resource gating. Protocol-endpoint allowlist (internal/auth/protocol_endpoints.go): IsProtocolEndpoint matches /acme, /scep, /.well-known/est, /.well-known/pki/ocsp, /.well-known/pki/crl prefixes. Adding a new protocol endpoint MUST update this list and add a parallel test. Demo-mode synthetic admin (internal/auth/middleware.go::NewDemoModeAuth): when CERTCTL_AUTH_TYPE=none is configured, this middleware injects ActorID=actor-demo-anon, ActorType=Anonymous, TenantID=t-default, plus the legacy UserKey + AdminKey for back-compat with existing handlers. The synthetic actor's admin-role grant is seeded by migration 000029 so RequirePermission resolves through the JOIN like any other actor. cmd/server startup wires this middleware only when none-mode is configured. API-key middleware extension: NewAuthWithNamedKeys now populates the new keys (ActorIDKey, ActorTypeKey=APIKey, TenantIDKey=t-default) alongside UserKey + AdminKey on every successful Bearer match. Existing handlers continue to read UserKey / IsAdmin until the Phase 3.5 sweep converts them to RequirePermission. Test coverage: TestRequirePermission_NoActorReturns401, TestRequirePermission_GrantedActorReaches200, TestRequirePermission_DeniedActorReturns403, TestRequirePermission_CheckerErrorReturns500, TestRequirePermission_ProtocolEndpointBypassesGate (covers all 5 prefixes), TestRequirePermission_ScopeFnExtractsResourceID, TestIsProtocolEndpoint_PrefixesOnly, TestNewDemoModeAuth_InjectsSyntheticActor, TestNewAuthWithNamedKeys_PopulatesPhase3ContextKeys. fakeChecker pins the contract without a database. Phase 3.5 follow-on (NOT in this commit): convert each of the 5 admin handlers from auth.IsAdmin checks to auth.RequirePermission middleware in router.go; update internal/api/handler/m008_admin_gate_test.go to track auth.RequirePermission call sites instead of (or alongside) auth.IsAdmin; pick the right permission per handler (cert.revoke for bulk_revocation, etc.). Each handler conversion needs the 3-test triplet (_NonAdmin_Returns403 / _AdminExplicitFalse_Returns403 / _AdminPermitted_ForwardsActor) per M-008. Branch: dev/auth-bundle-1. Phase 2 was prior commit (service layer). Phase 3.5 (handler conversion) + Phase 4 (HTTP API) on the next session. |
||
|
|
bd54d5f7fa |
auth-bundle-1 Phase 2: RBAC service layer + Authorizer primitive
Bundle 1 / Phase 2: ships PermissionService, RoleService, ActorRoleService, and the Authorizer primitive that Phase 3 RequirePermission middleware calls on every gated request.
Authorizer.CheckPermission semantics: a grant matches when (a) the permission name equals the requested permission AND (b) the grant is global-scoped OR the grant scope_type+scope_id exactly match the request. Global beats specific; per-resource grants widen the effective set rather than shadowing global. Hot-path query is one ActorRoleRepository.EffectivePermissions JOIN call (already shipped in Phase 1) plus an in-memory walk; Phase 12 will add benchmarks + caching if the JOIN cost shows up at scale.
Privilege-escalation guard: ActorRoleService.Grant and Revoke require the caller to hold auth.role.assign globally. Without it, ErrSelfRoleAssignment. System callers (AsSystemCaller()) bypass the check; bootstrap, migrations, scheduler-initiated grants use this path. Reserved actor actor-demo-anon is rejected on Grant + Revoke so the demo path stays alive even after a misclick (ErrAuthReservedActor).
Caller abstraction: every service entry point takes *Caller (ActorID, ActorType, TenantID, IsSystem). CallerFromContext is a stub returning ErrUnauthenticated; Phase 3 wires the middleware-context bridge that fills the Caller from request context. The contract is pinned by TestCallerFromContext_Phase2ReturnsUnauthenticated so the Phase 3 upgrade is observable.
Audit recording: every mutating service operation calls AuditService.RecordEvent. Bundle 1 Phase 8 adds the event_category column + parameter and back-fills 'auth' for these calls; until then the rows go in with the default category.
Test coverage: in-memory fakeRoleRepo / fakePermissionRepo / fakeActorRoleRepo / fakeAudit pin the privilege-escalation invariants (ErrUnauthenticated for nil caller, ErrForbidden for missing perm, ErrInvalidPermission for non-canonical permission name, ErrSelfRoleAssignment for Grant without auth.role.assign, ErrAuthReservedActor for actor-demo-anon mutations, system-caller bypass) without requiring testcontainers. Phase 12 will add live-Postgres integration coverage.
Branch: dev/auth-bundle-1. Phase 1 was
|
||
|
|
19497eef87 |
auth-bundle-1 Phase 1: RBAC schema + domain types + repository layer
Bundle 1 / Phase 1: ships the RBAC primitive as schema + domain types + repo layer. Service-layer wiring lands in Phase 2; middleware integration in Phase 3.
Schema (migrations/000029_rbac.up.sql, 272 lines, idempotent, transaction-wrapped):
tenants, roles, permissions, role_permissions, actor_roles. TEXT primary keys with prefixes (t-, r-, p-, ar-) per CLAUDE.md Architecture Decisions. TIMESTAMPTZ time columns. FK cascade explicit (tenant CASCADE, role RESTRICT, actor CASCADE). Three-value scope_type CHECK ('global', 'profile', 'issuer') matched 1:1 with internal/domain/auth.ScopeType. UNIQUE(tenant_id, name) on roles, UNIQUE(name) on permissions, UNIQUE(actor_id, actor_type, role_id, tenant_id) on actor_roles.
Seeds: t-default tenant, 7 default roles (admin, operator, viewer, agent, mcp, cli, auditor), 33-permission canonical catalogue (cert.* / profile.* / issuer.* / target.* / agent.* / audit.* / auth.role.* / auth.key.* / auth.bootstrap.use), full role->permission grant matrix at global scope. Demo-mode preservation: actor-demo-anon seeded with admin role unconditionally; Phase 3 wires the auth middleware to inject this actor into the context when CERTCTL_AUTH_TYPE=none. Reserved system actor; Phase 4 API rejects mutations / deletions targeting it with 409 Conflict.
Domain types (internal/domain/auth/{types,validate,validate_test}.go):
Tenant, Role, Permission, RolePermission, ActorRole structs with JSON tags. ScopeType enum (global/profile/issuer). ActorTypeValue mirrors internal/domain.ActorType to avoid an import cycle. CanonicalPermissions slice + DefaultRoles map are the single source of truth referenced by the migration; validate_test.go pins (a) no duplicate permissions, (b) every default-role permission is canonical, (c) admin holds the full catalogue, (d) seeded IDs carry the prefix convention, (e) ScopeType enum has exactly 3 values matching the CHECK constraint.
Extended internal/domain/audit.go: added ActorTypeAPIKey + ActorTypeAnonymous to the existing User/System/Agent enum so the audit trail can distinguish API-key requests from federated humans (Bundle 2 OIDC) and demo-mode (CERTCTL_AUTH_TYPE=none). Existing code that records actor_type=User keeps working; new APIKey value used by Bundle 1 Phase 3 middleware.
Repository layer (internal/repository/auth.go + internal/repository/postgres/auth.go):
TenantRepository (Get, List, EnsureDefault). RoleRepository (Get, GetByName, List, Create, Update, Delete with ErrAuthRoleInUse on FK RESTRICT, ListPermissions, AddPermission idempotent, RemovePermission). PermissionRepository (List, GetByName, IsCanonical for fail-fast catalog check). ActorRoleRepository (ListByActor, ListByRole, Grant idempotent, Revoke, EffectivePermissions which is the JOIN that auth.RequirePermission will use in Phase 3 — returns deduplicated (permission, scope) triples honouring the not-yet-expired predicate so future time-bound grants work without code change). Sentinel errors ErrAuthNotFound, ErrAuthDuplicateName, ErrAuthRoleInUse, ErrAuthReservedActor, ErrAuthUnknownPermission for handler-layer 404/409/400 mapping.
Verification: gofmt clean, go vet ./... clean, go test -short ./internal/domain/auth ./internal/repository/postgres pass. Integration tests against a live Postgres are gated by testing.Short() per repo convention; Phase 12 wires the testcontainers harness for full e2e coverage.
Branch: dev/auth-bundle-1. Phase 0 was
|
||
|
|
99a012e3be |
auth-bundle-1 Phase 0: extract internal/auth/ from middleware package
Bundle 1 / Phase 0: pure refactor splitting auth surface out of internal/api/middleware so Bundle 2 (OIDC + sessions) and the broader RBAC primitive (roles, permissions, scoped grants) have a clean home. Moved to internal/auth/: NamedAPIKey, HashAPIKey, AuthConfig, NewAuthWithNamedKeys, NewAuth, UserKey, AdminKey, GetUser, IsAdmin. Added testfixtures.go (WithActor / WithAdmin / WithActorAdmin) so handler tests don't construct context manually. Stayed in internal/api/middleware/: RequestID, Logging, NewLogging, Recovery, RateLimitConfig, NewRateLimiter (now imports auth.GetUser for per-user keying per audit Category C), CORSConfig, NewCORS, ContentType, CORS, GetRequestID, responseWriter, Chain, audit middleware (now imports auth.GetUser). Updated 22 caller files across cmd/, internal/api/handler/, internal/api/middleware/, internal/mcp/. Existing m008_admin_gate_test.go now scans for auth.IsAdmin( substring; Phase 3 will further evolve to track auth.RequirePermission. Behavior unchanged: all handler / middleware / service / connector / cmd / mcp tests pass with no test-logic edits, only import-path renames. Phase 0 exit criteria: internal/auth/ exists with 6 files; middleware.go went 575 -> 422 lines (auth-related ~150 lines moved out); grep -rE 'middleware\.(GetUser|IsAdmin|UserKey|AdminKey|NamedAPIKey|HashAPIKey|NewAuth)' returns 0 hits; context.WithValue(.*middleware.UserKey/AdminKey) returns 0 hits; go vet ./... clean; go test -short ./... green across all packages tested. Branch: dev/auth-bundle-1. Per cowork/auth-bundle-1-prompt.md, do not merge to master without (1) make verify green, (2) >= 2 external testers confirm, (3) >= 90% coverage on internal/auth/ in .github/coverage-thresholds.yml. |
||
|
|
0e06f6c4fc |
cli: promote --force on renew + require --reason on revoke (closes P3-1, P3-2)
Closes findings P3-1 and P3-2 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Both findings
flagged hidden defaults that the CLI was sending without exposing them
to operators: `force=false` baked into every renew payload, and a silent
fallback to `reason="unspecified"` whenever --reason was omitted.
P3-1 — promote --force on `certs renew` (full end-to-end plumbing)
The pre-2026-05-05 CLI sent `{"force": false}` in the renew body. The
API handler never decoded it — a textbook "lying field" per the
operator's CLAUDE.md "complete path, not the easy path" rule: the body
field stored a value, claimed to do something, and silently did nothing
because the wire never reached the consumer. Adding a --force flag that
also went unread would have created another lying field.
This commit takes the complete path:
service.CertificateService.TriggerRenewal grew a `force bool` parameter
(internal/service/certificate.go). When force=true, the
RenewalInProgress block is overridden so operators can recover stuck
in-flight renewals where a previous job hung without releasing the
status flag. Archived and Expired remain terminal blockers regardless
of force — those are semantic dead-ends that --force should not paper
over (archived = decommissioned, expired = issue a new cert instead of
renewing a dead one).
handler.CertificateHandler.TriggerRenewal parses force from
?force=true (or ?force=1) query param, OR {"force": true} JSON body,
whichever the client picks. Defaults to false. Passes through to the
service.
internal/cli/client.go::RenewCertificate(id, force bool) sends
?force=true on the URL when --force is set. The historical hardcoded
`{"force": false}` body is gone — no more lying field.
cmd/cli/main.go dispatches `certs renew <id> [--force]` (ID-first
flag-second convention matches the existing `agents retire <id>
[--force]`).
P3-2 — require --reason on `certs revoke` (Option A: strict refusal)
The pre-2026-05-05 CLI dropped to `--reason unspecified` whenever the
operator omitted the flag. Compliance reporting (RFC 5280 §5.3.1, PCI-
DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful;
silent fallback defeats the audit trail because every revocation looks
identical.
cmd/cli/main.go dispatch refuses to send when --reason is empty,
prints the canonical RFC 5280 §5.3.1 reason-code menu, and exits
non-zero.
internal/cli/client.go exposes ValidRevokeReasons() returning the
canonical camelCase list (unspecified, keyCompromise, caCompromise,
affiliationChanged, superseded, cessationOfOperation, certificateHold,
removeFromCRL, privilegeWithdrawn, aaCompromise) and
NormalizeRevokeReason() that accepts both camelCase and snake_case
inputs and normalises to the canonical wire form. Off-list reasons
are rejected at dispatch with the menu re-printed.
Test pins:
internal/cli/client_test.go::TestClient_RenewCertificate_ForceFlag —
--force=true sends ?force=true with empty body; --force=false sends
no query and no body.
internal/cli/client_test.go::TestNormalizeRevokeReason +
TestValidRevokeReasons — canonical-camelCase + snake_case + reject-
off-enum behaviour.
cmd/cli/dispatch_test.go::TestHandleCerts_Revoke_RequiresReason +
TestHandleCerts_Revoke_RejectsUnknownReason +
TestHandleCerts_Renew_ForceFlag — dispatch-layer pins for the same
contracts.
internal/api/handler/certificate_handler_test.go::TestTriggerRenewal_
ForceQueryParam — query-param passthrough (no-flag, force=true,
force=1, force=false) flows through to the service-layer parameter.
internal/service/certificate_test.go::TestTriggerRenewal_
ForceOverridesInProgress — force=false preserves the
RenewalInProgress block; force=true clears it.
Existing TestTriggerRenewal_Archived extended to assert force=true
still blocks Archived (terminal-state guarantee).
Docs: docs/reference/cli.md updated with the --force example for renew
and the strict --reason semantics for revoke (including snake_case
input acceptance).
Acceptance gate (verified):
- go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/...
./cmd/mcp-server/... clean.
- go vet ./... clean.
- go test -short -count=1 ./... pass repo-wide.
- bash scripts/ci-guards/openapi-handler-parity.sh clean
(router 178, OpenAPI 144, exceptions 36 — unchanged; we add
parameter parsing, not routes).
- gofmt -l clean.
|
||
|
|
ff75361553 |
mcp(coverage): add 34 tools across 7 domains to close 2026-05-05 parity audit P1 findings
Closes findings P1-1..P1-35 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Before this
bundle, 35 operator-facing API endpoints had GUI surfaces but no MCP
counterpart — operators using AI assistants for cert lifecycle work in
regulated environments had to drop to curl for approve/reject, health-check
acknowledgement, renewal-policy CRUD, network-scan triggering, discovery
triage, intermediate-CA management, and job verification.
Tool count: 87→121 in tools.go (+34), 6 unchanged in tools_est.go.
Re-derive via grep -cE 'gomcp\\.AddTool\\(' internal/mcp/tools.go
internal/mcp/tools_est.go.
The 7 phases (matching the bundle prompt at
cowork/mcp-coverage-expansion-prompt.md):
Phase A — Approvals (P1-28..P1-31, 4 tools)
list_approvals, get_approval, approve_request, reject_request.
Two-person-integrity contract (ErrApproveBySameActor → HTTP 403)
is preserved automatically: the decided_by actor is derived
server-side from middleware.UserKey, NOT from request body, so
the MCP server's authenticated API-key identity becomes the
audit-trail actor. The MCP input schema deliberately omits any
actor_id field to prevent client-side spoofing.
Phase B — Health Checks (P1-20..P1-27, 8 tools)
list, summary, get, create, update, delete, history, acknowledge.
Mirrors the existing target-resource shape; acknowledge takes
optional 'actor' string captured in the audit row (handler defaults
to 'unknown' if absent).
Phase C — Renewal Policies (P1-1..P1-5, 5 tools)
Standard CRUD against /api/v1/renewal-policies. Distinct from the
legacy 'policy' tools that point at the same path — these expose
the renewal-policy domain explicitly with full alert_channels +
alert_severity_map field shape.
Phase D — Network Scan Targets (P1-14..P1-19, 6 tools)
CRUD + trigger_scan. trigger_network_scan returns the discovery-
scan body so the AI can chain into list_discovered_certificates
filtered by agent_id.
Phase E — Discovery read-side (P1-10..P1-13, 4 tools)
list_discovered_certificates, get_discovered_certificate,
list_discovery_scans, discovery_summary. Complements the
pre-existing claim/dismiss tools (registered alongside Health
historically per the I-2 closure).
Phase F — Intermediate CAs (P1-6..P1-9, 4 tools)
list, create (root + child via discriminator on body shape), get,
retire. The handler is admin-gated via middleware.IsAdmin; the
least-privilege boundary is enforced at the API layer (HTTP 403
for non-admin Bearer callers) — not by transport carve-out.
Phase G — Verification + deployments (P1-32, P1-34, P1-35, 3 tools)
list_certificate_deployments, verify_job, get_job_verification.
P1-33 (POST /api/v1/agents/{id}/discoveries) is intentionally
excluded — machine-to-machine push channel for agents reporting
filesystem-scan results, not an operator-driven flow. Documented
inline in the RegisterTools dispatch.
Implementation:
- 14 new input types in internal/mcp/types.go with jsonschema struct
tags driving LLM tool discovery.
- 7 register* functions in internal/mcp/tools.go each handling one
phase, wired into RegisterTools dispatch in declaration order.
- 34 new entries in tools_per_tool_test.go::allHappyPathCases —
the existing in-process MCP harness (TestMCP_AllTools_HappyPath +
TestMCP_AllTools_ErrorPath + TestMCP_RegisterTools_DispatchableToolCount)
auto-extends coverage to cover every new tool: happy-path round-
trip with fence-shape assertion, 5xx error-path with MCP_ERROR fence
propagation, and 'every registered tool is dispatchable' guard.
- docs/reference/mcp.md 'Available Tools' table expanded from 16 to
22 resource domains with current per-domain tool counts.
Acceptance gate (verified):
- go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/...
clean across all four production binaries.
- go vet ./... clean.
- go test -short -count=1 ./internal/mcp/... pass (TestMCP_AllTools_*
expanded to 127 tool round-trips).
- go test -short -count=1 ./... pass repo-wide.
- bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178,
OpenAPI 144, exceptions 36 — unchanged; we add MCP wrappers, not
routes).
- gofmt -l clean across the four touched files.
|
||
|
|
75097909e9 | |||
|
|
0f81c1b956 |
ci: re-fix CodeQL #32 + repair loadtest f5-mock build context
Two unrelated CI failures from run #25305811340; fixed in one
commit since neither needs the other to land first.
CodeQL alert #32 (go/log-injection at middleware.go:68) reopened
after
|
||
|
|
b0fc067317 |
security: close CodeQL #17 (log injection) + #23 (SSRF false-positive reopen)
Two CodeQL alerts in one sweep — both medium-impact follow-ups
on already-merged guards.
Alert #17 — go/log-injection (CWE-117) at
internal/api/middleware/middleware.go:58:
log.Printf("[%s] %s %s %d %v", requestID, r.Method, r.URL.Path, ...)
r.Method and r.URL.Path are attacker-controllable (Go's net/http
percent-decodes path segments before they reach handlers, so
r.URL.Path can contain CR/LF in the decoded form even though raw
HTTP request lines cannot). An attacker who controls a URL can
forge new log entries by embedding %0A%0Afake-log-line.
Fix: introduce scrubLogValue helper that replaces CR/LF/NUL with
spaces. Apply to both r.Method and r.URL.Path. Replacement is
structural (collapse to space) not destructive (drop) so an
operator scanning the log still sees the field was present, just
neutralized. Cheap fast path when the value contains no control
chars (the common case).
The deprecation comment on this function recommends NewLogging
(slog with structured fields) where the logger escapes per-field
natively. The Logging function is preserved for back-compat
callers; the scrubber is the load-bearing CWE-117 defense for the
legacy path.
Alert #23 — go/request-forgery (CWE-918) at scep_probe.go:271:
CodeQL reopened the alert after commit
|
||
|
|
9ef9f3cde3 |
refactor(scep+ejbca): drop dead conditionals on always-empty vars (CodeQL #18, #19)
Two CodeQL go/comparison-of-identical-expressions alerts in one sweep — both Warning severity, both real dead-code (not false positives). CodeQL detected that each comparison's LHS variable was provably constant. Alert #18 — internal/api/handler/scep.go:612 (extractCSRFields): challengePassword := "" transactionID := "" // ... loop populates challengePassword from CSR.Attributes ... for _, attr := range csr.Attributes { if attr.Type.Equal(oidChallengePassword) { // populates challengePassword ONLY — transactionID stays "" } } if transactionID == "" && csr.Subject.CommonName != "" { // ← always true transactionID = csr.Subject.CommonName } transactionID was initialized to "" and never reassigned before the check. The conditional was always true; the MVP path was effectively "unconditionally fall back to CN". The RFC 8894 path (tryParseRFC8894 above this function) extracts transaction-ID properly from PKCS#7 authenticatedAttributes; the MVP path is for lightweight legacy clients that send the raw CSR with no PKCS#7 wrapping, and CN-as-transaction-ID is sufficient there. Fix: drop the dead transactionID local var + dead conditional; unconditionally set transactionID = csr.Subject.CommonName. No behavioral change — the runtime semantics are identical to before (every valid invocation already took the fallback). The CN extraction stays robust because the empty-CN case still produces an empty transactionID, which downstream callers handle. Alert #19 — internal/connector/issuer/ejbca/ejbca.go:415 (RevokeCertificate): serial := request.Serial issuerDN := "" // (comment: "if we have time..." — TODO never followed up) revokeURL := fmt.Sprintf("%s/certificate/%s/%s/revoke", apiURL, issuerDN, serial) if issuerDN == "" { // ← always true revokeURL = fmt.Sprintf("%s/certificate/%s/revoke", apiURL, serial) } issuerDN was hardcoded to "" two lines above. The first revokeURL line was unreachable dead code; the conditional always fired and the serial-only URL always won. EJBCA's REST API has both /certificate/{issuer_dn}/{serial}/revoke and /certificate/{serial}/revoke endpoints; the serial-only form is correct for typical certctl deployments where one EJBCA CA maps to one certctl issuer config (no overlapping serial spaces). Fix: drop the dead first revokeURL + dead conditional; build revokeURL once via the serial-only endpoint. No behavioral change — the runtime URL was always the serial-only one. Comment retained + expanded to document the future-enhancement path (parse issuer DN from IssuanceResult metadata + use the DN-qualified endpoint when a multi-CA EJBCA deployment surfaces). Verified locally: gofmt: clean. go vet ./internal/api/handler/... + ./internal/connector/issuer/ejbca/...: exit 0. go test -short -count=1 ./internal/api/handler/... + ejbca/...: PASS. Both fixes are pure dead-code removal — runtime behavior is byte- identical to pre-edit. The existing test suites would have caught any actual behavioral change. References: https://github.com/certctl-io/certctl/security/code-scanning/18 https://github.com/certctl-io/certctl/security/code-scanning/19 Closes both alerts. |
||
|
|
2b4d0069d9 |
security(scep-intune): annotate verifyES256/RS256 SHA-256 as RFC-mandated (CodeQL #21 false positive)
CodeQL alert #21 (go/weak-sensitive-data-hashing, severity: High) flagged the sha256.Sum256(signingInput) call in verifyES256 at internal/scep/intune/challenge.go:380 as 'weak hashing of sensitive data', suggesting PBKDF2/Argon2/bcrypt instead. This is a CodeQL false positive. The CodeQL query triggers when SHA-256 is used near *x509.Certificate (the trust pool) and infers 'this might be password hashing.' But the actual context is JWS signature verification: - verifyRS256 implements RFC 7518 §3.3 — 'RSASSA-PKCS1-v1_5 using SHA-256'. SHA-256 is spec-mandated. - verifyES256 implements RFC 7518 §3.4 — 'ECDSA using P-256 and SHA-256'. SHA-256 is spec-mandated. - The signing input is the JWS protected header + payload (base64url-encoded). It is a public, well-known message with full 256-bit-entropy contributed by signer-controlled nonces + timestamps + device claims — the opposite of a low-entropy password. - The output is verified against an asymmetric signature (rsa.VerifyPKCS1v15 / ecdsa.Verify), not compared to a pre-computed hash digest. This is signature verification, not password hashing. - Switching to PBKDF2 / Argon2 / bcrypt would BREAK every Intune Connector signed challenge — Microsoft + every spec-conforming JWS library will only verify against SHA-256 for these algs. Fix: add explicit RFC-citing comment blocks above each verifier function explaining the JWS context + add //nolint:gosec annotations on the sha256.Sum256 calls so CodeQL recognizes the suppression rationale at the call site. The annotation cites the specific RFC clause (7518 §3.3 / §3.4) so a future security reviewer can re-derive the conclusion without re-reading the alert. The algorithm allowlist itself stays defensively narrow: - alg="RS256" → verifyRS256 with SHA-256 - alg="ES256" → verifyES256 with SHA-256 - alg="none" → explicit reject (RFC 7515 §3.6 attack vector) - any other alg → reject as unsupported Pinned by existing tests: - TestValidateChallenge_HappyPath_RS256 - TestValidateChallenge_HappyPath_ES256_FixedWidth - TestValidateChallenge_HappyPath_ES256_DER - TestValidateChallenge_AlgNoneRejected - TestValidateChallenge_UnsupportedAlg The happy-path tests would fail if the verifiers switched to any non-SHA-256 digest — the alg allowlist makes the SHA-256 dependency load-bearing, which the existing test suite already proves. Verified locally: gofmt: clean. go vet ./internal/scep/intune/...: exit 0. go test -short -count=1 ./internal/scep/intune/...: PASS (every existing challenge_test.go subtest still green). Reference: https://github.com/certctl-io/certctl/security/code-scanning/21 Closes CodeQL alert #21 as a documented false positive — the //nolint annotations + RFC-citing comments are the load-bearing suppression. Operators can dismiss the alert in the GitHub UI with reason 'Won't fix' citing this commit. |
||
|
|
d08982fc19 |
security(signer): bound FileDriver paths with SafeRoot + reject .. (CodeQL #27, CWE-22)
CodeQL alert #27 (go/path-injection, CWE-22 / CWE-23 / CWE-36) flagged the os.WriteFile sink at internal/crypto/signer/file_driver.go:194 because the outPath flowed from operator-supplied config (CAKeyPath in the local issuer's encrypted config blob -> GenerateOutPath closure -> os.WriteFile) without a containment check. Threat model: Production wiring (cmd/server/main.go) constructs &signer.FileDriver{} and the local-issuer NewConnector wires GenerateOutPath off Config.CAKeyPath. CAKeyPath ships from the encrypted issuer config in PostgreSQL — settable only by an authenticated admin via the API. So the realistic exploit is: (a) Admin compromise -> CAKeyPath set to /etc/passwd -> FileDriver.Generate overwrites system files. (b) Future code path concatenates attacker-controlled fragments into the output path -> classic ../../etc/passwd traversal. Defense in depth: bound the write surface so admin-key-rotation errors and future regressions can't escape into arbitrary filesystem writes. Fix: internal/crypto/signer/file_driver.go gains: - SafeRoot string field on FileDriver. When set, every Load + Generate path MUST resolve under SafeRoot via filepath.Abs + strings.HasPrefix on cleaned paths. - validateSafePath helper that: * rejects empty paths * filepath.Clean()s the input * rejects paths whose cleaned form still contains a literal ".." segment (catches relative paths that escape above their start; absolute paths get collapsed by Clean) * resolves to filepath.Abs and (when SafeRoot non-empty) verifies containment via filepath.Separator-suffixed HasPrefix (the bare-prefix bug — SafeRoot=/var/lib/foo erroneously accepting /var/lib/foobar — has its own regression test below) - Load + Generate now call validateSafePath before any os.ReadFile / os.WriteFile. The validator is in the same function as the sink so CodeQL recognizes it as a guard. Tests (internal/crypto/signer/signer_test.go): TestFileDriver_Load_RejectsParentTraversal — relative path "../../etc/passwd" rejected with parent-directory error. TestFileDriver_Load_RejectsEmptyPath — empty path rejected. TestFileDriver_Generate_RejectsParentTraversal — write side, same pattern. TestFileDriver_SafeRoot_AcceptsContainedPath — happy path: a key file under SafeRoot succeeds. TestFileDriver_SafeRoot_RejectsEscape — absolute path outside SafeRoot rejected (the load-bearing CodeQL pin). TestFileDriver_SafeRoot_RejectsSiblingPrefix — pins the HasPrefix-with-separator subtlety: SafeRoot=/tmp/X must NOT accept /tmp/X-sibling. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/crypto/signer/...: ok 1.605s go test -short -count=1 ./internal/connector/issuer/local/...: ok 4.908s (downstream FileDriver consumer) go test -short -count=1 ./internal/service/...: ok 4.029s Backwards-compat: when SafeRoot is unset, only the structural .. + empty-path checks fire — the existing FileDriver call sites in cmd/server/main.go and the existing unit tests pass unchanged. Production wiring SHOULD set SafeRoot via cmd/server/main.go in a follow-up commit (env-var-supplied CERTCTL_CA_KEY_DIR or similar). Reference: https://github.com/certctl-io/certctl/security/code-scanning/27 Closes CodeQL alert #27 (go/path-injection). |
||
|
|
af3ca3935b |
ci: convert literal Unicode in headers_test.go to \u escapes (ST1018)
CI run #448 (commit |
||
|
|
e6919cdaba |
security(scep_probe): re-validate URL inside scepHTTPGet to close CodeQL #23 (CWE-918)
CodeQL alert #23 (go/request-forgery, CWE-918 SSRF) flagged the client.Do(req) sink at internal/service/scep_probe.go:232 because the URL parameter to scepHTTPGet is taint-traced from the user- supplied input to ProbeSCEP without the analyzer recognizing the upstream sanitizer. The defense-in-depth was already in place: 1. validation.ValidateSafeURL at ProbeSCEP entry (line 75) — rejects obvious SSRF targets (loopback / link-local / cloud metadata literals) before any network call. 2. validation.SafeHTTPDialContext on the http.Transport — re-resolves the host at dial time and rejects connections to reserved IP ranges. This is the authoritative SSRF + DNS- rebinding guard. Even if step 1 was bypassed, the dial would still fail. But CodeQL's taint tracker doesn't follow the validator across function boundaries, so the alert stays open even though the code is safe. This commit re-runs validation.ValidateSafeURL inside scepHTTPGet immediately before http.NewRequestWithContext — sanitizer in the same function as the sink, which CodeQL recognizes as a guard. Bonus defense-in-depth: any future call site that wires a URL into scepHTTPGet without going through ProbeSCEP (e.g. a new code path that directly probes a discovered URL) inherits the same SSRF guard automatically. Fail-closed by default. The validator dispatch matches ProbeSCEP's pattern — tests override via s.scepValidateURL to hit httptest loopback servers; production callers use validation.ValidateSafeURL. The probe's existing httptest-based tests continue to work unchanged. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short ./internal/service/...: ok 4.029s (every existing scep_probe test still green — the new revalidation is a no-op for tests that go through ProbeSCEP because the same validator already passed once at entry). Reference: https://github.com/certctl-io/certctl/security/code-scanning/23 Closes CodeQL alert #23 (go/request-forgery). |
||
|
|
23c593089d |
security(email): sanitize body fields against content injection (CodeQL #11, CWE-640)
CodeQL alert #11 (go/email-injection, CWE-640 / OWASP Content Spoofing)
flagged the wc.Write(message) sink at internal/connector/notifier/email/
email.go:208 because attacker-controllable fields flow into the email
body unchecked.
Threat model:
Headers (From, To, Subject) were already protected by
validation.ValidateHeaderValue (CWE-113 SMTP header injection,
closed in commit
|
||
|
|
34adcfbbe5 |
api, handler: 4 admin-gated CA hierarchy endpoints + OpenAPI (Rank 8 commit 4)
Rank 8 commit 4 of 5. The API + RBAC layer that operators drive
the new hierarchy management surface from.
Endpoints (all admin-gated via middleware.IsAdmin; non-admin Bearer
callers get 403):
POST /api/v1/issuers/{id}/intermediates
Discriminator on body shape:
empty parent_ca_id + root_cert_pem + key_driver_id
→ CreateRoot (registers operator-supplied root CA).
parent_ca_id non-empty
→ CreateChild (signs new sub-CA cert under parent).
Service-layer error → HTTP code mapping:
ErrCANotSelfSigned → 400
ErrCAKeyMismatch → 400
ErrPathLenExceeded → 400
ErrNameConstraintExceeded → 400
ErrInvalidCertPEM → 400
ErrParentCANotActive → 409
ErrIntermediateCANotFound → 404
(other) → 500
GET /api/v1/issuers/{id}/intermediates
Returns flat list ordered by created_at; caller renders the
tree from each row's parent_ca_id (nil = root).
GET /api/v1/intermediates/{id}
Single-row detail.
POST /api/v1/intermediates/{id}/retire
Two-phase: confirm=false → active→retiring; confirm=true →
retiring→retired with active-children check (drain-first
semantics; ErrCAStillHasActiveChildren → 409).
Files changed:
internal/api/handler/intermediate_ca.go — 4 handlers
+ handler-defined
service interface
(dependency
inversion).
internal/api/handler/intermediate_ca_test.go — 8 test variants
(M-008 admin-
gate triplet
complete).
internal/api/handler/m008_admin_gate_test.go — register the
new admin-gated
handler in
AdminGatedHandlers
so the M-008
coherence
scanner stays
green.
internal/api/router/router.go — 4 r.Register
calls + new
IntermediateCAs
field on
HandlerRegistry.
cmd/server/main.go — wire the
postgres repo +
service +
handler. Reuses
the same
signer.FileDriver
instance the
OCSP responder
bootstrap path
feeds.
api/openapi.yaml — 4 new
operationIds,
full body
schema + status-
code dispatch.
Tests (8 in this commit):
TestIntermediateCA_Handler_NonAdmin_Returns403 (admin gate
— table-driven across all 4 endpoints)
TestIntermediateCA_Handler_AdminExplicitFalse_Returns403
(defensive: AdminKey present but false ≠ AdminKey absent)
TestIntermediateCA_Handler_AdminPermitted_ForwardsActor
(admin actor forwarded to service for audit attribution)
TestIntermediateCA_HandlerCreate_RootDispatch
(body discriminator: empty parent_ca_id → CreateRoot)
TestIntermediateCA_HandlerCreate_ChildDispatch
(body discriminator: parent_ca_id present → CreateChild)
TestIntermediateCA_HandlerCreate_BadRequestOnMissingRootBundle
(validation: no parent + no root bundle → 400)
TestIntermediateCA_HandlerCreate_ServiceErrorMappings
(table-driven: 7 service errors → expected HTTP codes)
TestIntermediateCA_HandlerRetire_TwoPhaseConfirm
(confirm=false then confirm=true forwarded correctly)
TestIntermediateCA_HandlerRetire_StillHasActiveChildren_Returns409
(drain-first contract — 409 not 500)
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 ./internal/api/handler/...: ok 4.498s.
bash scripts/ci-guards/openapi-handler-parity.sh: clean
(router routes: 182, openapi operations: 148; the +4 new routes
have +4 new operationIds — parity preserved).
bash scripts/ci-guards/* (all 24 guards): clean.
Out of scope of THIS commit (commit 5):
- web/src/pages/IssuerHierarchyPage.tsx (recursive tree render).
- docs/intermediate-ca-hierarchy.md sysadmin runbook (FedRAMP /
financial-services / internal-PKI patterns).
- docs/connectors.md hierarchy_mode row.
- WORKSPACE-ROADMAP entries (HSM-backed roots, automated
rotation, CRL chaining, NameConstraints templates, D3
dendrogram).
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 4.
|
||
|
|
ae597f7f8d |
local: tree-mode chain assembly + byte-equivalence pin (Rank 8 commit 3)
Rank 8 commit 3 of 5. Load-bearing connector rewrite that activates
the first-class CA hierarchy surface shipped by commits 1-2.
Local connector changes:
- New ChainAssembler interface (single-method seam) defined in the
connector package — *service.IntermediateCAService satisfies it
implicitly. Avoids the import cycle that would arise from
pulling internal/service into internal/connector/issuer/local.
- Three new optional fields on Connector: hierarchyMode,
chainAssembler, treeIssuingCAID. Default zero values keep the
pre-Rank-8 single-sub-CA flow byte-identical (no operator on
the historical path sees any change in wire bytes).
- Three new setters: SetHierarchyMode, SetChainAssembler,
SetTreeIssuingCAID. Wired in cmd/server/main.go in commit 4
when the issuer's HierarchyMode column is read at boot.
- resolveChainPEM helper centralizes the dispatch:
tree mode + ChainAssembler set + treeIssuingCAID set
→ call AssembleChain over intermediate_cas
otherwise (incl. tree mode with incomplete wiring)
→ fall back to historical c.caCertPEM
Defense in depth: a misconfigured operator gets a working
issuance, not a nil-deref panic.
- IssueCertificate + RenewCertificate both delegate ChainPEM
population to resolveChainPEM. The cert generation path
(generateCertificate) is untouched — same key, same template,
same signing.
Tests (internal/connector/issuer/local/local_hierarchy_test.go):
TestLocal_HierarchyMode_SingleVsTree_ByteIdentical ← LOAD-BEARING
THE refuse-to-ship pin. Two connectors against the same on-disk
CA cert+key:
- A: pre-Rank-8 single-sub-CA mode (HierarchyMode unset).
- B: tree mode wired against an in-memory ChainAssembler
whose 1-level chain matches A's caCertPEM byte-for-byte.
Asserts:
1. resA.ChainPEM == resB.ChainPEM (the byte-identical pin).
2. resA.ChainPEM == fixture root cert PEM (real fact about
the wire format, not internal consistency).
Operators on single mode keep getting byte-identical bytes.
Operators flipping to tree with a 1-level shim see no change.
Zero behavioral drift for unmigrated deployments.
TestLocal_HierarchyMode_Tree_LeafChainIncludesAllAncestors
Multi-level pin. 4-level synthetic chain (root → policy →
issuingA → issuingB-leaf-CA). Asserts:
- 4 CERTIFICATE blocks in ChainPEM.
- Leaf-first ordering (issuingB.CN, issuingA.CN, policy.CN,
root.CN at depths 0..3).
This is what tree mode buys operators in exchange for the
migration overhead.
TestLocal_HierarchyMode_FallsBackToSingleWhenWiringIncomplete
Defensive fallback pin. HierarchyMode='tree' but
ChainAssembler nil + treeIssuingCAID '' → ChainPEM falls back
to caCertPEM. No panic, no lying field.
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 -run TestLocal_HierarchyMode ./internal/connector/issuer/local/...
PASS (3/3, including the load-bearing byte-identical pin).
go test -short -count=1 ./internal/connector/issuer/local/...: ok 4.358s
(every existing local-connector test still green — backwards
compat byte-for-byte at the test layer too).
Out of scope of THIS commit (commit 4):
- 4 admin-gated handler endpoints + OpenAPI extension.
- cmd/server/main.go wiring that reads Issuer.HierarchyMode at
boot and calls SetHierarchyMode + SetChainAssembler +
SetTreeIssuingCAID on the local connector instance.
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 3.
|
||
|
|
62523fb845 |
service: 10 IntermediateCAService tests + in-memory fake repo (Rank 8 commit 2.5)
Service-layer pin for Rank 8. The fake IntermediateCARepository's
WalkAncestry mirrors the postgres recursive-CTE semantics
(leaf-first ordering, terminate at parent_ca_id IS NULL) so the
AssembleChain pin carries the same weight the production repo would.
Tests:
TestIntermediateCA_CreateRoot_RegistersOperatorSuppliedSelfSigned
Happy path. RFC 5280 §3.2 self-signed root + matching key gets
persisted with parent_ca_id=NULL, state=active, KeyDriverID=...
TestIntermediateCA_CreateRoot_RejectsNonSelfSigned
RFC 5280 §3.2 enforcement. Cert whose embedded public key
doesn't match the actual signer fails CheckSignatureFrom →
ErrCANotSelfSigned.
TestIntermediateCA_CreateRoot_RejectsKeyMismatch
Operator-boundary defense in depth. Cert is well-formed
self-signed but the supplied keyDriverID resolves to a
different key → ErrCAKeyMismatch.
TestIntermediateCA_CreateChild_PathLenTighteningEnforced
RFC 5280 §4.2.1.9 enforcement. Child whose path-len equals or
exceeds parent's → ErrPathLenExceeded. Strictly-tighter child
succeeds.
TestIntermediateCA_CreateChild_NameConstraintsSubset
RFC 5280 §4.2.1.10 enforcement. Widening rejected
("evil.com" outside parent's "example.com"); subdomain
narrowing succeeds ("internal.example.com").
TestIntermediateCA_AssembleChain_4DeepHierarchy ← LOAD-BEARING
The pin the local connector tree-mode delegates to. Builds
root → policy → issuing-A → issuing-B and asserts AssembleChain
returns 4 CERTIFICATE blocks in leaf-to-root order with
matching subject CommonNames at each depth.
TestIntermediateCA_Retire_RefusesIfActiveChildren
Drain-first semantics. retiring → retired with active children
refuses with ErrCAStillHasActiveChildren.
TestIntermediateCA_Retire_TwoPhaseConfirm
First call: active → retiring (no confirm). Second call without
confirm: surfaces "pass confirm=true". Second call with
confirm: retiring → retired.
TestIntermediateCA_MetricsRecordedPerOutcome
Snapshot pin. CreateRoot bumps create_root, CreateChild bumps
create_child, Retire(active) bumps retire_retiring, all
dimensioned by issuer_id.
TestIntermediateCA_LoadHierarchy_FlatList
Returns every CA for an issuer ordered by created_at; caller
renders the tree from parent_ca_id.
Test infrastructure:
fakeIntermediateCARepo — sync.Mutex-guarded map.
WalkAncestry walks
parent_ca_id from leafID
to root (or terminates on
cycle, defense-in-depth).
Compile-time interface
guard.
testCAFixture — mints a self-signed root
cert+key in process,
Adopt()s the key under
a stable ref so CreateRoot
can resolve it.
newTestService — wires IntermediateCAService
with fake repo +
signer.MemoryDriver +
mockAuditRepo (already
lives in testutil_test.go)
+ IntermediateCAMetrics.
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 -run TestIntermediateCA ./internal/service/...
PASS (10/10)
go test -short -count=1 ./internal/service/...: ok 3.844s
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 2.5.
|
||
|
|
fb54ebcb62 |
service: IntermediateCAService + IntermediateCAMetrics + RFC 5280 enforcement
Rank 8 of the 2026-05-03 deep-research deliverable, commit 2 of 5.
Service-layer wiring for first-class N-level CA hierarchy management.
The connector rewrite that activates this surface lands in commit 3.
Files added:
internal/service/intermediate_ca.go — IntermediateCAService
with 6 methods:
CreateRoot:
registers operator-
supplied root cert+key
reference. Validates
RFC 5280 §3.2 self-
signed (subject ==
issuer + signature
verifies). Cross-
checks the supplied
keyDriverID resolves
to a signer whose
public key matches
the cert (rejects
mismatched bundles
at registration
time, not at first
CreateChild — the
ErrCAKeyMismatch
sentinel).
CreateChild:
generates child key
via signer.Driver,
signs the cert via
the parent's signer.
Enforces RFC 5280
§4.2.1.9 (path-len
tightening) +
§4.2.1.10
(NameConstraints
subset semantics) at
service layer fail-
closed. Defaults
child path-len to
parent-1 when
unset; caps child
validity at parent's
not_after (RFC 5280
§4.1.2.5).
Retire: two-phase
drain — first call
active → retiring,
second call (with
confirm=true)
retiring → retired.
Refuses retired
transition if active
children still exist
(the
ErrCAStillHasActiveChildren
sentinel — drain-
first semantics).
Get / LoadHierarchy:
thin repo wrappers.
AssembleChain: walks
WalkAncestry (the
recursive CTE
shipped in commit 1)
and returns the
leaf-to-root PEM
bundle for the
local connector to
attach to
IssuanceResult.
internal/service/intermediate_ca_metrics.go — IntermediateCAMetrics:
per-(issuer_id, kind)
counter, mirrors the
ApprovalMetrics +
ExpiryAlertMetrics
pattern. RecordCreate
(root/child) +
RecordRetire
(retiring/retired).
SnapshotIntermediateCA
for the Prometheus
exposer.
Defense in depth retained:
- NEVER persist CA private key bytes in the row. KeyDriverID is the
only key reference; signer.Driver.Load resolves it at signing time.
- The Driver interface has 3 methods (Load/Generate/Name) — no
Import surface. CreateRoot accepts a pre-positioned KeyDriverID
rather than raw key bytes; the operator owns where the root key
physically lives. Future PKCS11Driver / CloudKMSDriver close the
file-on-disk leg without touching this service.
Verified locally:
gofmt: clean.
go vet ./internal/service/...: exit 0.
go build ./internal/service/...: exit 0.
Deferred to commit 2.5 (or fold into commit 3, operator's call):
- 9 service-level tests including:
* TestIntermediateCA_CreateRoot_RegistersOperatorSuppliedSelfSigned
* TestIntermediateCA_CreateRoot_RejectsNonSelfSigned
* TestIntermediateCA_CreateRoot_RejectsKeyMismatch
* TestIntermediateCA_CreateChild_PathLenTighteningEnforced
* TestIntermediateCA_CreateChild_NameConstraintsSubset
* TestIntermediateCA_AssembleChain_4DeepHierarchy ← LOAD-BEARING
* TestIntermediateCA_Retire_RefusesIfActiveChildren
* TestIntermediateCA_Retire_TwoPhaseConfirm
* TestIntermediateCA_MetricsRecordedPerOutcome
Test setup needs: in-memory IntermediateCARepository fake +
signer.MemoryDriver (already exists) + helper to generate test root
cert+key. Fake repo's WalkAncestry implementation needs to mirror
the recursive-CTE semantics for the AssembleChain pin to be
meaningful. Total ~500 lines of test code; non-trivial setup.
Out of scope of THIS commit (commits 3-5):
- Local connector rewrite + byte-equivalence pin
(TestLocal_HierarchyMode_SingleVsTree_ByteIdentical).
- 4 admin-gated handler endpoints + OpenAPI extension.
- web/src/pages/IssuerHierarchyPage.tsx.
- docs/intermediate-ca-hierarchy.md sysadmin runbook.
- cmd/server/main.go wiring.
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md.
|