certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 18:01:37 +00:00

Author	SHA1	Message	Date
shankar0123	21aeed4f4e	legal: addlicense headers + normalize legacy variants (Phase 0 RED-4) Phase 0 closure (Path B2, post-rewrite): addlicense sweep — adds the canonical certctl LLC copyright + BUSL-1.1 SPDX header to every production Go file. Template: // Copyright 2026 certctl LLC. All rights reserved. // SPDX-License-Identifier: BUSL-1.1 Coverage: 338 / 338 production Go files (cmd/ + internal/, excluding _test.go and /testdata/). Pre-sweep coverage was 22 / 338 (6.5%); post-sweep is 338 / 338 (100%). Normalized 22 pre-existing legacy headers (`// Copyright (c) certctl` + `// SPDX-License-Identifier: BSL-1.1`) and 1 file using a `Certctl Contributors` attribution. The legacy SPDX ID `BSL-1.1` is non-standard; the official SPDX identifier for Business Source License 1.1 is `BUSL-1.1` (capital U). All 338 files now share the canonical form. Generated via: addlicense -c "certctl LLC" -y 2026 \ -f cowork/legal/copyright-header.tpl \ -ignore '/testdata/' -ignore '/_test.go' \ cmd/ internal/ Verification: find cmd internal -name '.go' -not -name '_test.go' \ -not -path '/testdata/' \ -exec grep -L '^// Copyright 2026 certctl LLC' {} \; \| wc -l Returns: 0 gofmt clean. Header additions are comments only, no compile impact. Closes: cowork/certctl-architecture-diligence-audit.html#fix-RED-4	2026-05-13 21:23:35 +00:00
shankar0123	0152bdf567	fix(auth/rbac): scope-aware ActorRole revoke (A-4) HIGH-10's UNIQUE (actor, role, scope_type, scope_id, tenant) uniqueness extension lets an operator grant the same role to the same actor at multiple scopes (e.g. r-operator on profile=p-acme AND profile=p-globex). But ActorRoleRepository.Revoke's WHERE clause omitted (scope_type, scope_id) — a single call deleted every variant. Selective revoke was unrepresentable; operators had to drop all and re-grant N-1, opening a race window where the actor's access was briefly different. Closure across all layers (handler → service → repo → MCP → GUI client), preserving the legacy "revoke all variants" contract for unmodified callers: internal/repository/auth.go - New ActorRoleRevokeOptions struct. Zero value = legacy semantic; non-empty ScopeType narrows to one variant. - New ErrActorRoleNotFound sentinel for scoped no-match (HTTP 404). internal/repository/postgres/auth.go - Revoke signature extended with opts. Empty opts.ScopeType uses the legacy SQL (no scope WHERE), zero-row delete = no error. - Non-empty narrows with `scope_type = $5 AND scope_id IS NOT DISTINCT FROM $6` — the IS-NOT-DISTINCT-FROM is load-bearing, vanilla `=` would silently miss the (global, NULL) case because NULL ≠ NULL in standard SQL. - Selective revoke with zero matching rows returns ErrActorRoleNotFound; operators get feedback on typos. internal/service/auth/actor_role_service.go - Revoke takes opts. Audit row's details map records the scope so SIEMs can distinguish wide-vs-selective revokes: `scope: "all_variants"` for the legacy path, or `scope_type` + `scope_id` for selective. Privilege check (auth.role.assign) and reserved-actor guard unchanged. internal/api/handler/auth.go - RevokeRoleFromKey parses optional `?scope_type=` / `?scope_id=` query params via new parseRevokeScope helper. - Validation mirrors AssignRoleToKey: scope_id forbidden with scope_type=global, required with profile/issuer, invalid scope_type → 400. scope_id without scope_type also → 400. - writeAuthError maps ErrActorRoleNotFound to 404. internal/mcp/tools_auth.go + types.go - AuthRevokeKeyRoleInput gains optional ScopeType + ScopeID with jsonschema descriptions explaining the dual-mode contract. - Tool call site appends URL-encoded query params when ScopeType is set; legacy callers (no scope_type) emit the bare DELETE path unchanged. web/src/api/client.ts - authRevokeKeyRole signature: optional 3rd argument `{ scope_type?, scope_id? }`. Pre-A-4 call sites (no opts arg) keep firing the bare DELETE — fully backward compatible. The GUI KeysPage's per-row revoke button (still one row per role, pre-Fix-12) continues to use the legacy shape; future GUI work can pass scope params for per-variant rows. docs/operator/rbac.md - New "Revoke: legacy 'all variants' vs scope-selective" subsection under "From the HTTP API" with curl examples for both modes plus the audit-row payload shape that lets SOC/SIEM tell them apart. Regression coverage: Repository (testcontainers, skipped under -short — 6 tests in internal/repository/postgres/auth_revoke_scope_test.go): TestRevokeActorRole_NoOpts_RemovesAllVariants TestRevokeActorRole_WithScope_RemovesOnlyMatching TestRevokeActorRole_WithGlobalScope_RemovesOnlyGlobal — pins the IS-NOT-DISTINCT-FROM branch (global, NULL) TestRevokeActorRole_NoMatch_ReturnsNotFound — pins the new sentinel TestRevokeActorRole_NoOpts_NoMatch_IsNoOp — pins the legacy idempotence contract TestRevokeActorRole_IssuerScope_RemovesOnlyMatching — pin the issuer-scope half (profile + issuer are symmetric scope types) Handler (7 new tests in auth_test.go): TestAuthHandler_RevokeRoleFromKey — extended to assert no scope filter is forwarded when query string is empty (legacy behaviour) TestAuthHandler_RevokeRoleFromKey_A4_ScopedProfile TestAuthHandler_RevokeRoleFromKey_A4_ScopedGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithGlobal TestAuthHandler_RevokeRoleFromKey_A4_RejectsMissingScopeID TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithoutScopeType TestAuthHandler_RevokeRoleFromKey_A4_RejectsInvalidScopeType TestAuthHandler_RevokeRoleFromKey_A4_ScopedNotFoundReturns404 MCP (2 new table rows in tools_per_tool_test.go): Scoped revoke with scope_type=profile + scope_id=p-acme → `?scope_type=profile&scope_id=p-acme` Scoped revoke with scope_type=global (no scope_id) → `?scope_type=global` Service-layer test plumbing (service_test.go) updated for new opts arg: 4 existing call sites pass repository.ActorRoleRevokeOptions{} to keep their pre-A-4 semantics; the fakeActorRoleRepo.Revoke implementation now mirrors the postgres scope-aware behaviour (legacy zero-value vs scoped narrowing + ErrActorRoleNotFound on no-match). Verify gate green: gofmt clean, go vet clean, go test -short across repository/postgres, service/auth, api/handler, and mcp. The pre-existing KeysPage.test.tsx failure observed on the baseline commit (reproduced via `git stash` earlier in Fix 03) is unrelated; my client.ts change adds an optional third argument and is fully backward-compatible. Spec at cowork/auth-bundles-fixes-2026-05-11/04-high-actor-role-revoke-scope.md. Audit doc updated: new row A-4 (2026-05-11) CLOSED appended to the status table at the bottom of cowork/auth-bundles-audit-2026-05-10.md. Operator-visible advisory in CHANGELOG.md v2.1.0 release notes under Security (non-BREAKING — legacy callers are unchanged). Depends on Fix 01 (the scope-aware EffectivePermissions read path on branch fix/audit-2026-05-11/crit-actor-role-scope-reads). This fix makes the inverse op selectively reversible; without Fix 01 the read side would mis-evaluate scoped grants anyway, making selective revoke moot at runtime.	2026-05-11 10:50:34 +00:00
shankar0123	a123263498	fix(auth/rbac): close HIGH-10 lying field — EffectivePermissions reads actor-role scope (A-1) Audit 2026-05-11 A-1 closure. Spec at cowork/auth-bundles-fixes-2026-05-11/01-crit-actor-role-scope-reads.md. WHAT. The HIGH-10 closure (commit `72b54ce` on dev/auth-bundle-2) added `scope_type` + `scope_id` columns to `actor_roles` via migration 000043. The handler accepted them on POST /api/v1/auth/keys/{id}/roles. The repo Grant INSERTed them. The uniqueness tuple was extended to include them. The GUI exposed them as form inputs. But the load-bearing `EffectivePermissions` SQL at internal/repository/postgres/auth.go:470 never read them. The query only JOINed against rp.scope_type/rp.scope_id (role-permission scope) and ignored ar.scope_type/ar.scope_id (actor-role scope). Operator-visible failure: granting Alice r-operator scoped to profile=p-prod silently elevated her to r-operator GLOBALLY at authorization time. The Authorizer's matcher correctly handled whatever EffectivePermissions returned, but EffectivePermissions returned the rp.scope (typically global), not the ar.scope narrowing. This is the canonical CRIT-5 lying-field shape — a security control claimed, persisted across 4 layers, with unit tests at each isolated layer, but the load-bearing wire severed mid-flight. CLAUDE.md's 'Always take the complete path' rule was violated by the original HIGH-10 closure. Additionally, `scanActorRoles` failed to read the new columns even when present, so every GET-side path (ListByActor / ListByRole) returned ActorRole with zero-value scope fields — the GUI / MCP couldn't show operators what they had configured. HOW. internal/repository/postgres/auth.go: - EffectivePermissions SQL extended to intersect ar.scope with rp.scope via a CASE-in-subquery. The effective scope is the NARROWER of the two; disjoint tuples and scope-type mismatches drop the row entirely. WHERE filter on effective_scope_type IS NOT NULL excludes dropped rows. Match matrix (encoded by the CASE): ar.scope rp.scope effective_scope ───────── ───────── ────────────────── global global global / NULL global profile=X profile=X (rp narrows) profile=X global profile=X (ar narrows) profile=X profile=X profile=X (both agree) profile=X profile=Y ROW DROPPED (disjoint) profile=X issuer=* ROW DROPPED (type mismatch) - ListByActor + ListByRole SELECTs extended with scope_type + scope_id columns so the read-side surfaces what was persisted. - scanActorRoles reads the new columns into ActorRole.ScopeType + ScopeID via the existing sql.NullString + ScopeType cast pattern (mirrors RolePermission scan). internal/repository/postgres/auth_scope_test.go (NEW): Testcontainer-backed regression matrix. 8 cases: 1. ActorRoleGlobal_RolePermGlobal — trivial happy path. 2. ActorRoleGlobal_RolePermProfile — rp narrows. 3. ActorRoleProfile_RolePermGlobal_A1Closure — load-bearing post-fix case: profile-scoped grant narrows to profile. 4. BothScopedSameTuple_Matches — exact-match collapse. 5. BothScopedDifferentIDs_RowDropped — disjoint scopes produce no effective permission. 6. ScopeTypeMismatch_RowDropped — profile vs issuer mismatch. 7. ExpiredGrant_Excluded — pre-fix behavior preserved. 8. ListByActor_ReturnsScopeColumns — read-side surface check. Tests skip in -short mode (testcontainers-backed; require Docker on operator workstation). internal/service/auth/service_test.go: TestAuthorizer_ActorRoleProfileScope_OnlyNarrowedScopeAuthorizes_A1 — unit-level pin (sandbox-runnable, no Docker). Simulates the post-A-1 SQL emission (narrowed effective row at profile=p-prod) and asserts CheckPermission authorizes only matching profile, rejects other profiles AND rejects global. Existing matcher code is unchanged; this proves the integration point. CHANGELOG.md: Operator advisory in the new 'Security (BREAKING — silent-elevation closure)' section. Pre-existing scope-bound grants take effect on upgrade; operators audit `actor_roles WHERE scope_type != 'global'` to confirm intent. cowork/auth-bundles-audit-2026-05-10.md: HIGH-10 row gets an A-1 follow-on CLOSED 2026-05-11 annotation describing the regression + closure. VERIFY. - gofmt -l <changed files> (no diff) - go vet ./internal/repository/postgres/... ./internal/service/auth/... ./internal/api/handler/... ./internal/auth/... ./cmd/server/... PASS - go test -short -count=1 ./internal/service/auth/... ./internal/repository/postgres/... ./internal/api/handler/... PASS - The testcontainer-backed regression matrix runs on operator workstation via 'go test -count=1 ./internal/repository/postgres/...' (skip in -short). Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-10 (A-1 follow-on) cowork/auth-bundles-fixes-2026-05-11/01-crit-actor-role-scope-reads.md CLAUDE.md 'Always take the complete path' rule	2026-05-11 02:02:39 +00:00
shankar0123	f5ba17114d	fix(audit): close silence-leg of HIGH-6; emit WARN on audit-write failure Audit 2026-05-10 HIGH-6 partial closure (silence leg). The audit identified two distinct gaps in the auth surface's audit-emit pattern: (1) silence — `_ = audit.RecordEventWithCategory(...)` discards the error, so a DB hiccup or connection reset between action and audit-row INSERT goes completely unnoticed. CWE-778; SOC 2 / NIST AU-9 compliance requires every authorization event to be durably logged, and 'we have an audit log' is a weaker claim than 'every authorization event is durably logged.' (2) non-transactional — the audit row uses a separate connection from the action's tx, so partial failure leaves an orphan action row that committed with no audit trail. Decision 8 of the auth-bundles-index requires action + audit row atomic. This commit closes leg (1) fully across all six audit-emit call sites in the auth surface: - internal/service/auth/actor_role_service.go::recordAudit - internal/service/auth/role_service.go::recordAudit - internal/auth/bootstrap/service.go::ValidateAndMint - internal/auth/breakglass/service.go::recordAudit - internal/auth/session/service.go::recordAudit - internal/api/handler/auth_session_oidc.go::recordAudit - internal/service/profile.go::Update (Phase 9 approval-bypass) Each `_ = ...` swallow is replaced with: if err := audit.RecordEventWithCategory(...); err != nil { slog.WarnContext(ctx, '<surface> audit write failed (action committed; audit row may be missing)', 'action', action, 'actor_id', actor, 'resource_id', resource, 'err', err) } Operators monitoring audit-write failures now see structured WARN logs with action + actor + resource attribution; missing audit rows can be cross-referenced against monitoring without manual SELECT-from- audit-table. Infrastructure for leg (2) (transactional commit) is also landed in this commit: - service.AuditService.RecordEventWithCategoryWithTx (new method; accepts repository.Querier from postgres.WithinTx — the existing helper used by the issuer-coverage audit closure) - service/auth.AuditService interface declares the new method - test stub fakeAudit.RecordEventWithCategoryWithTx satisfies the extended interface The eight per-path WithinTx-refactors documented in cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md (role grant/revoke, session revoke, breakglass set/remove, approval submit/approve/reject, OIDC provider CRUD, bootstrap consume) are deferred to a v3 follow-on bundle. Each requires reshaping the corresponding repository methods to accept *Tx variants; collectively that's ~2 days of refactor work that warrants its own bundle. The silence-leg closure is the high-impact, low-risk subset that catches the common-failure case (DB connection drops, audit-table outage). Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-6 Spec: cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md	2026-05-10 21:24:29 +00:00
shankar0123	5d79e53ad0	auth-bundle-1 follow-on: close coverage gaps to clear Phase 12 floors CI run #486 (post-Bundle-1 merge + Go 1.25.10 bump) failed three coverage-threshold gates: internal/api/handler 74.7% < floor 75 (-0.3pp) internal/auth 66.3% < floor 85 (-18.7pp) internal/service/auth 51.1% < floor 85 (-33.9pp) The Phase 12 gate file's "85% with negative-test coverage" claim turned out to be aspirational — the read-side and Update-path methods on RoleService / PermissionService / ActorRoleService had zero unit-test coverage, and internal/auth's keystore + HasPermission helper had zero tests. This commit closes the gap without lowering the gate. Per-package CI-style averages after this commit (per scripts/check-coverage-thresholds.sh's per-function-mean): internal/api/handler 76.1% (+1.4pp, margin +1.1pp) internal/auth 90.5% (+24.2pp, margin +5.5pp) internal/service/auth 93.7% (+42.6pp, margin +8.7pp) Tests added: internal/service/auth/service_test.go (+18 tests, +518 LOC): PermissionService.List, PermissionService.GetByName, RoleService.Get (4 paths), RoleService.List (system caller), RoleService.Update (4 paths), RoleService.ListPermissions (3 paths), RoleService.AddPermission/RemovePermission round-trip + gate paths, RoleService.Delete (success + nil-caller + no-perm + audit), RoleService.Create (nil-caller), ActorRoleService.ListForActor (self-bypass + cross-actor + nil-caller + system + with-perm), ActorRoleService.Effective- Permissions (same shape), ActorRoleService.ListKeys (3 paths + system bypass), ActorRoleService.Revoke (4 paths), Authorizer edge cases (empty actorID short-circuit, empty tenantID default, scoped-grant-without-scope-id no-match invariant, repo-error wrap-and-return, HoldsAnyOf early-exit), recordAudit nil-arm short-circuits. internal/auth/keystore_test.go (NEW, +175 LOC): StaticKeyStore.Len, StaticKeyStore.LookupByHash hit + miss, MutableKeyStore seeded lookup + Len, Add registers new key, AddHashed registers from precomputed hash, AddHashed replaces on duplicate hash (idempotent boot-loader contract), HasPermission no-actor / default-actor-type / checker-error / scoped-check threading. internal/auth/bootstrap/service_test.go (+36 LOC): Service.Available nil-receiver/nil-strategy short-circuit, Service.Available delegates to Strategy when configured. internal/api/handler/auth_test.go (+208 LOC): GetRole returns role + permissions, GetRole 404 + 401, UpdateRole 200 + invalid-JSON-400 + 401, ListKeys returns actor list + 401, RemoveRolePermission 204 (global + scoped) + 401, rolePermToResponse scope encoding pin via GetRole. Verified: gofmt -l . clean (touched files only). go vet ./internal/auth/... ./internal/service/auth/... ./internal/api/handler/ rc=0. go test -count=1 -short on the four packages green. CI-style per-function averages computed via the live scripts/check-coverage-thresholds.sh arithmetic — all three gated packages clear their floors with margin. Per CLAUDE.md "complete path" + "do not lower the gate to make CI green": gate file unchanged. The 85/85/75 floors stand.	2026-05-10 02:04:36 +00:00
shankar0123	cbb47aaf5d	auth-bundle-1 Phase 11 + 12: RBAC MCP tools + negative-test coverage gate # Phase 11 — RBAC MCP tools 12 new tools in internal/mcp/tools_auth.go mirroring the Phase-4 + Phase-7 HTTP surface so operators driving certctl from Claude / VS Code / any MCP client get the same management capability the GUI + CLI already expose: certctl_auth_me GET /v1/auth/me certctl_auth_list_roles GET /v1/auth/roles certctl_auth_get_role GET /v1/auth/roles/{id} certctl_auth_create_role POST /v1/auth/roles certctl_auth_update_role PUT /v1/auth/roles/{id} certctl_auth_delete_role DELETE /v1/auth/roles/{id} certctl_auth_list_permissions GET /v1/auth/permissions certctl_auth_add_permission_to_role POST /v1/auth/roles/{id}/permissions certctl_auth_remove_permission_from_role DELETE /v1/auth/roles/{id}/permissions/{perm} certctl_auth_list_keys GET /v1/auth/keys certctl_auth_assign_role_to_key POST /v1/auth/keys/{id}/roles certctl_auth_revoke_role_from_key DELETE /v1/auth/keys/{id}/roles/{role_id} Each tool routes through the existing HTTP client (no parallel business logic), so permission gates fire server-side: a non-admin caller's MCP tool invocation returns whatever 403 the underlying HTTP handler emits, fenced via errorResult for LLM- prompt-injection defense. Input types in internal/mcp/types.go (AuthRoleIDInput, AuthCreateRoleInput, AuthUpdateRoleInput, AuthRolePermissionGrantInput, AuthRolePermissionRevokeInput, AuthAssignKeyRoleInput, AuthRevokeKeyRoleInput) carry jsonschema descriptions so the MCP consumer's tool catalogue shows operator-friendly hints. internal/mcp/tools_auth_test.go ships 14 tests: - TestAuthMCP_AllToolsRegister (registration must not panic) - TestAuthMCP_PathsAndMethods (table-driven, 12 rows pinning each tool's HTTP method + URL) - TestAuthMCP_ForbiddenSurfacesFencedError (12 tools × 403 mock → error surface) internal/mcp/tools_per_tool_test.go's allHappyPathCases extended with the 12 new rows so the in-memory dispatch coverage gate (TestMCP_RegisterTools_DispatchableToolCount) stays green at the new total of 139 registered tools. Re-derived total via 'grep -cE "gomcp\.AddTool\(" internal/mcp/tools.go': 133 (121 in tools.go + 12 in tools_auth.go). # Phase 12 — negative-test coverage gate Audit of the prompt's 12 negative-test paths against existing coverage: 1. Missing actor → 401 ✓ TestRequirePermission_NoActorReturns401, TestRBACGate_NoActorReturns401 2. No roles → 403 ✓ TestRequirePermission_DeniedActorReturns403, TestRBACGate_AuditorRole_403sOnAdminRoutes 3. Role lacks specific perm → 403 ✓ same suite 4. Wrong scope → 403 ✓ TestAuthorizer_SpecificScopeMatchesExactID (wrongID arm) 5. Self-grant w/o auth.role.assign → 403 ✓ TestActorRoleService_GrantRequiresAuthRoleAssign 6. Bootstrap token wrong → 401 ✓ TestEnvTokenStrategy_WrongTokenReturnsInvalidToken, TestBootstrapHandler_Mint_WrongToken_401 7. Bootstrap used twice → 410 ✓ TestEnvTokenStrategy_OneShotConsumption, TestBootstrapHandler_Mint_TwiceReturns410 8. Bootstrap when admin exists → 410 ✓ TestEnvTokenStrategy_AdminExistsClosesPath, TestBootstrapHandler_Mint_AdminExists410 9. Role delete with assignees → 409 NEW: TestRoleService_DeleteWithActorsAssignedReturns409 10. Profile-edit loophole → gated ✓ TestProfileEdit_RequiresApprovalLoopholeClosed 11. Permission not in catalog → 400 ✓ TestRoleService_AddPermissionRejectsNonCanonical 12. Scope ID for nonexistent resource → 404 (validation deferred — no FK constraint between role_permissions.scope_id and the resource tables; documented for a future bundle) Filled the gap at #9 with TestRoleService_DeleteWithActorsAssignedReturns409 which pins the repository sentinel pass-through (postgres FK ON DELETE RESTRICT → repository.ErrAuthRoleInUse → service returns the sentinel verbatim → handler maps to HTTP 409). # Coverage gates .github/coverage-thresholds.yml gains 2 entries: - internal/auth: floor 85 - internal/service/auth: floor 85 .github/workflows/ci.yml's coverage test command extended with ./internal/auth/... and ./internal/api/router/... so the threshold check has data to evaluate. # Protocol-endpoint not-gated test (Category F) internal/api/router/phase12_protocol_allowlist_test.go (new) adds 3 router-level invariant tests: - TestPhase12_ProtocolEndpointsNotGated: AST-walks router.go, asserts no rbacGate(...) call references a path under any protocol-endpoint prefix (/acme, /scep, /.well-known/est, /.well-known/pki/ocsp, /.well-known/pki/crl). - TestPhase12_IsProtocolEndpoint_CoversCanonicalPrefixes: pins auth.IsProtocolEndpoint against the canonical prefix set; if a future protocol lands without lockstep allowlist update, this fails. - TestPhase12_RBACGateRoutesAreUnderAPIv1: belt-and-braces — every rbacGate-wrapped route MUST start with /api/v1/. Catches accidental cross-prefix wraps. Complements the existing TestRequirePermission_ProtocolEndpointBypassesGate (middleware-level) + TestRouter_AuthExemptAllowlist_PinsActualRegistrations (allowlist drift) so the Category F invariant is pinned at all three layers (middleware + router + dispatch). # Verifications gofmt clean repo-wide. * go vet ./... clean. * staticcheck across internal/auth + handler + router + cli + service + repository + cmd + domain + mcp: clean. * go test -short -count=1 green across internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service (incl. auth), internal/domain/auth, internal/mcp, cmd/server, cmd/cli.	2026-05-09 23:46:01 +00:00
shankar0123	3ef45e2ad4	auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split # Phase 6 — day-0 admin bootstrap * internal/auth/bootstrap/ (new package): Strategy interface + EnvTokenStrategy with constant-time compare, one-shot consumption via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC- first-admin will plug in alongside as an alternate Strategy. * BootstrapService.ValidateAndMint: validates the operator's CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API key value, persists the SHA-256 hash to api_keys, grants r-admin via actor_roles, AddHashed's the runtime keystore so the just- minted key authenticates the next request without restart, and records bootstrap.consume to the audit trail with category=auth. * internal/auth/keystore.go (new): KeyStore interface + StaticKeyStore (immutable env-var-only path) + MutableKeyStore (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth middleware now consumes a KeyStore so the bootstrap path can extend the lookup table at runtime. * migrations/000031_api_keys.up/down.sql: api_keys table with (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by, created_at, expires_at, last_used_at). Idempotent. * /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both routes documented in api/openapi.yaml + AuthExemptRouterRoutes allowlist updated. The token never leaves internal/auth/bootstrap; the minted plaintext key flows only into the HTTP response body. * Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND admin actors already exist (config drift signal). * Tests: 4 strategy invariants (empty token born disabled, wrong token=ErrInvalidToken without consumption, one-shot consumption, admin-exists closes path), 5 service tests (happy path + actor- name validation + propagation of strategy errors + nil-deps guard + 32-byte entropy budget), 8 HTTP-handler tests (status 201/410/401/400 mapping + token-leak hygiene scan of slog + audit details + Location header). Token-leak test redirects slog.Default to a buffer for the test scope. # Phase 7 — API-key migration + scope-down CLI * GET /v1/auth/keys handler + service method ListKeys backed by ActorRoleRepository.ListDistinctActors. Returns one row per (actor_id, actor_type) pair with the slice of role IDs they hold. Permission: auth.role.list. * internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown (interactive), AuthScopeDownNonInteractive (JSON config), AuthScopeDownSuggest (--suggest with optional --apply). The synthetic actor-demo-anon is filtered out of every interactive / bulk path; non-interactive flow logs and skips it explicitly. * SuggestRoleFromAuditEvents (pure function): walks 30 days of audit events per actor and returns the narrowest matching role (admin / mcp / viewer / agent / operator) plus a one-line reason. Classification: any admin-shaped action wins; otherwise all-MCP → mcp; all-read-only → viewer; all-agent-shaped → agent; otherwise operator. Test table pins all six classifications. * CLI subcommand tree extended: 'auth keys list' + 'auth keys scope-down [--non-interactive <cfg>] [--suggest [--apply]]'. * CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS call-out + four flow examples. # Phase 8 — auditor role + event_category column * migrations/000032_audit_category.up/down.sql: ALTER TABLE audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT 'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config) + (event_category) and (event_category, timestamp DESC) indexes for the auditor-filter query path. WORM trigger from migration 000018 continues to enforce append-only at the DB layer (DDL is not blocked). * domain.AuditEvent gains EventCategory string (omitempty); domain.EventCategoryCertLifecycle / Auth / Config constants. * AuditService.RecordEventWithCategory sibling of RecordEvent; legacy callers stay on RecordEvent (defaults to cert_lifecycle). Auth callers (RoleService, ActorRoleService, BootstrapService) switched to RecordEventWithCategory(..., 'auth', ...). * GET /v1/audit?category=<cat>: handler accepts the optional query param, validates against the enum (400 on invalid value), dispatches through ListAuditEventsByCategory. OpenAPI updated with the new query param + AuditEvent.event_category schema. * Postgres AuditRepository.Create now writes event_category; AuditRepository.List filters on it; AuditFilter.EventCategory gates the WHERE clause. * Tests: 5 audit-category-filter HTTP tests (dispatch routing, back-compat fallback, 400 for invalid values, all 3 enum values accepted, page+category combine, JSON output surfaces the field). 3 auditor-role invariants (auditor holds exactly audit.read+audit.export, no mutating perms, disjoint from viewer except audit.read). # Cross-phase wiring * HandlerRegistry.Bootstrap field added; cmd/server/main.go wires the bootstrap service ahead of RegisterHandlers (extracted assembleNamedAPIKeys helper into auth_backfill.go, moved the keystore + bootstrap construction up alongside the auth repos). * AuthCheckResolver / AuthActorRoleService extended with ListKeys to satisfy the Phase 7 surface; existing fakes updated. * fakeAudit + mockAuditService stubs in tests gain RecordEventWithCategory + ListAuditEventsByCategory; existing tests untouched. # Verifications * gofmt -l: clean across every modified file. * go vet ./...: clean. * staticcheck across internal/auth + handler + router + cli + service + repository + cmd + domain: clean. * go test -short -count=1: green across every Bundle-1-touched package — internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service/auth, internal/service, internal/domain/auth, internal/repository/postgres, cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware, cmd/agent, internal/mcp.	2026-05-09 20:15:43 +00:00
shankar0123	bd54d5f7fa	auth-bundle-1 Phase 2: RBAC service layer + Authorizer primitive Bundle 1 / Phase 2: ships PermissionService, RoleService, ActorRoleService, and the Authorizer primitive that Phase 3 RequirePermission middleware calls on every gated request. Authorizer.CheckPermission semantics: a grant matches when (a) the permission name equals the requested permission AND (b) the grant is global-scoped OR the grant scope_type+scope_id exactly match the request. Global beats specific; per-resource grants widen the effective set rather than shadowing global. Hot-path query is one ActorRoleRepository.EffectivePermissions JOIN call (already shipped in Phase 1) plus an in-memory walk; Phase 12 will add benchmarks + caching if the JOIN cost shows up at scale. Privilege-escalation guard: ActorRoleService.Grant and Revoke require the caller to hold auth.role.assign globally. Without it, ErrSelfRoleAssignment. System callers (AsSystemCaller()) bypass the check; bootstrap, migrations, scheduler-initiated grants use this path. Reserved actor actor-demo-anon is rejected on Grant + Revoke so the demo path stays alive even after a misclick (ErrAuthReservedActor). Caller abstraction: every service entry point takes *Caller (ActorID, ActorType, TenantID, IsSystem). CallerFromContext is a stub returning ErrUnauthenticated; Phase 3 wires the middleware-context bridge that fills the Caller from request context. The contract is pinned by TestCallerFromContext_Phase2ReturnsUnauthenticated so the Phase 3 upgrade is observable. Audit recording: every mutating service operation calls AuditService.RecordEvent. Bundle 1 Phase 8 adds the event_category column + parameter and back-fills 'auth' for these calls; until then the rows go in with the default category. Test coverage: in-memory fakeRoleRepo / fakePermissionRepo / fakeActorRoleRepo / fakeAudit pin the privilege-escalation invariants (ErrUnauthenticated for nil caller, ErrForbidden for missing perm, ErrInvalidPermission for non-canonical permission name, ErrSelfRoleAssignment for Grant without auth.role.assign, ErrAuthReservedActor for actor-demo-anon mutations, system-caller bypass) without requiring testcontainers. Phase 12 will add live-Postgres integration coverage. Branch: dev/auth-bundle-1. Phase 1 was `19497ee` (RBAC schema + repo). Phase 3 (middleware integration) is the next commit on this branch.	2026-05-09 16:20:04 +00:00

8 Commits