Commit Graph

8 Commits

Author SHA1 Message Date
shankar0123 21aeed4f4e legal: addlicense headers + normalize legacy variants (Phase 0 RED-4)
Phase 0 closure (Path B2, post-rewrite):

addlicense sweep — adds the canonical certctl LLC copyright + BUSL-1.1
SPDX header to every production Go file. Template:

  // Copyright 2026 certctl LLC. All rights reserved.
  // SPDX-License-Identifier: BUSL-1.1

Coverage: 338 / 338 production Go files (cmd/ + internal/, excluding
*_test.go and **/testdata/**). Pre-sweep coverage was 22 / 338 (6.5%);
post-sweep is 338 / 338 (100%).

Normalized 22 pre-existing legacy headers (`// Copyright (c) certctl`
+ `// SPDX-License-Identifier: BSL-1.1`) and 1 file using a
`Certctl Contributors` attribution. The legacy SPDX ID `BSL-1.1`
is non-standard; the official SPDX identifier for Business Source
License 1.1 is `BUSL-1.1` (capital U). All 338 files now share the
canonical form.

Generated via:
  addlicense -c "certctl LLC" -y 2026 \
    -f cowork/legal/copyright-header.tpl \
    -ignore '**/testdata/**' -ignore '**/*_test.go' \
    cmd/ internal/

Verification:
  find cmd internal -name '*.go' -not -name '*_test.go' \
    -not -path '*/testdata/*' \
    -exec grep -L '^// Copyright 2026 certctl LLC' {} \; | wc -l

  Returns: 0

gofmt clean. Header additions are comments only, no compile impact.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-RED-4
2026-05-13 21:23:35 +00:00
shankar0123 0152bdf567 fix(auth/rbac): scope-aware ActorRole revoke (A-4)
HIGH-10's UNIQUE (actor, role, scope_type, scope_id, tenant) uniqueness
extension lets an operator grant the same role to the same actor at
multiple scopes (e.g. r-operator on profile=p-acme AND profile=p-globex).
But ActorRoleRepository.Revoke's WHERE clause omitted (scope_type,
scope_id) — a single call deleted every variant. Selective revoke was
unrepresentable; operators had to drop all and re-grant N-1, opening
a race window where the actor's access was briefly different.

Closure across all layers (handler → service → repo → MCP → GUI client),
preserving the legacy "revoke all variants" contract for unmodified
callers:

  internal/repository/auth.go
    - New ActorRoleRevokeOptions struct. Zero value = legacy semantic;
      non-empty ScopeType narrows to one variant.
    - New ErrActorRoleNotFound sentinel for scoped no-match (HTTP 404).

  internal/repository/postgres/auth.go
    - Revoke signature extended with opts. Empty opts.ScopeType uses
      the legacy SQL (no scope WHERE), zero-row delete = no error.
    - Non-empty narrows with `scope_type = $5 AND scope_id IS NOT
      DISTINCT FROM $6` — the IS-NOT-DISTINCT-FROM is load-bearing,
      vanilla `=` would silently miss the (global, NULL) case because
      NULL ≠ NULL in standard SQL.
    - Selective revoke with zero matching rows returns
      ErrActorRoleNotFound; operators get feedback on typos.

  internal/service/auth/actor_role_service.go
    - Revoke takes opts. Audit row's details map records the scope so
      SIEMs can distinguish wide-vs-selective revokes:
      `scope: "all_variants"` for the legacy path, or
      `scope_type` + `scope_id` for selective. Privilege check
      (auth.role.assign) and reserved-actor guard unchanged.

  internal/api/handler/auth.go
    - RevokeRoleFromKey parses optional `?scope_type=` / `?scope_id=`
      query params via new parseRevokeScope helper.
    - Validation mirrors AssignRoleToKey: scope_id forbidden with
      scope_type=global, required with profile/issuer, invalid
      scope_type → 400. scope_id without scope_type also → 400.
    - writeAuthError maps ErrActorRoleNotFound to 404.

  internal/mcp/tools_auth.go + types.go
    - AuthRevokeKeyRoleInput gains optional ScopeType + ScopeID with
      jsonschema descriptions explaining the dual-mode contract.
    - Tool call site appends URL-encoded query params when ScopeType
      is set; legacy callers (no scope_type) emit the bare DELETE
      path unchanged.

  web/src/api/client.ts
    - authRevokeKeyRole signature: optional 3rd argument
      `{ scope_type?, scope_id? }`. Pre-A-4 call sites (no opts arg)
      keep firing the bare DELETE — fully backward compatible. The
      GUI KeysPage's per-row revoke button (still one row per role,
      pre-Fix-12) continues to use the legacy shape; future GUI work
      can pass scope params for per-variant rows.

  docs/operator/rbac.md
    - New "Revoke: legacy 'all variants' vs scope-selective" subsection
      under "From the HTTP API" with curl examples for both modes plus
      the audit-row payload shape that lets SOC/SIEM tell them apart.

Regression coverage:

  Repository (testcontainers, skipped under -short — 6 tests in
  internal/repository/postgres/auth_revoke_scope_test.go):
    TestRevokeActorRole_NoOpts_RemovesAllVariants
    TestRevokeActorRole_WithScope_RemovesOnlyMatching
    TestRevokeActorRole_WithGlobalScope_RemovesOnlyGlobal — pins the
      IS-NOT-DISTINCT-FROM branch (global, NULL)
    TestRevokeActorRole_NoMatch_ReturnsNotFound — pins the new sentinel
    TestRevokeActorRole_NoOpts_NoMatch_IsNoOp — pins the legacy
      idempotence contract
    TestRevokeActorRole_IssuerScope_RemovesOnlyMatching — pin the
      issuer-scope half (profile + issuer are symmetric scope types)

  Handler (7 new tests in auth_test.go):
    TestAuthHandler_RevokeRoleFromKey — extended to assert no scope
      filter is forwarded when query string is empty (legacy behaviour)
    TestAuthHandler_RevokeRoleFromKey_A4_ScopedProfile
    TestAuthHandler_RevokeRoleFromKey_A4_ScopedGlobal
    TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithGlobal
    TestAuthHandler_RevokeRoleFromKey_A4_RejectsMissingScopeID
    TestAuthHandler_RevokeRoleFromKey_A4_RejectsScopeIDWithoutScopeType
    TestAuthHandler_RevokeRoleFromKey_A4_RejectsInvalidScopeType
    TestAuthHandler_RevokeRoleFromKey_A4_ScopedNotFoundReturns404

  MCP (2 new table rows in tools_per_tool_test.go):
    Scoped revoke with scope_type=profile + scope_id=p-acme →
      `?scope_type=profile&scope_id=p-acme`
    Scoped revoke with scope_type=global (no scope_id) →
      `?scope_type=global`

Service-layer test plumbing (service_test.go) updated for new opts
arg: 4 existing call sites pass repository.ActorRoleRevokeOptions{}
to keep their pre-A-4 semantics; the fakeActorRoleRepo.Revoke
implementation now mirrors the postgres scope-aware behaviour
(legacy zero-value vs scoped narrowing + ErrActorRoleNotFound on
no-match).

Verify gate green: gofmt clean, go vet clean, go test -short across
repository/postgres, service/auth, api/handler, and mcp. The
pre-existing KeysPage.test.tsx failure observed on the baseline
commit (reproduced via `git stash` earlier in Fix 03) is unrelated;
my client.ts change adds an optional third argument and is fully
backward-compatible.

Spec at cowork/auth-bundles-fixes-2026-05-11/04-high-actor-role-revoke-scope.md.
Audit doc updated: new row A-4 (2026-05-11) CLOSED appended to the
status table at the bottom of cowork/auth-bundles-audit-2026-05-10.md.
Operator-visible advisory in CHANGELOG.md v2.1.0 release notes under
Security (non-BREAKING — legacy callers are unchanged).

Depends on Fix 01 (the scope-aware EffectivePermissions read path on
branch fix/audit-2026-05-11/crit-actor-role-scope-reads). This fix
makes the inverse op selectively reversible; without Fix 01 the read
side would mis-evaluate scoped grants anyway, making selective revoke
moot at runtime.
2026-05-11 10:50:34 +00:00
shankar0123 a123263498 fix(auth/rbac): close HIGH-10 lying field — EffectivePermissions reads actor-role scope (A-1)
Audit 2026-05-11 A-1 closure. Spec at
cowork/auth-bundles-fixes-2026-05-11/01-crit-actor-role-scope-reads.md.

WHAT.

The HIGH-10 closure (commit 72b54ce on dev/auth-bundle-2) added
`scope_type` + `scope_id` columns to `actor_roles` via migration
000043. The handler accepted them on POST /api/v1/auth/keys/{id}/roles.
The repo Grant INSERTed them. The uniqueness tuple was extended to
include them. The GUI exposed them as form inputs.

But the load-bearing `EffectivePermissions` SQL at
internal/repository/postgres/auth.go:470 never read them. The query
only JOINed against rp.scope_type/rp.scope_id (role-permission
scope) and ignored ar.scope_type/ar.scope_id (actor-role scope).

Operator-visible failure: granting Alice r-operator scoped to
profile=p-prod silently elevated her to r-operator GLOBALLY at
authorization time. The Authorizer's matcher correctly handled
whatever EffectivePermissions returned, but EffectivePermissions
returned the rp.scope (typically global), not the ar.scope
narrowing.

This is the canonical CRIT-5 lying-field shape — a security
control claimed, persisted across 4 layers, with unit tests at
each isolated layer, but the load-bearing wire severed mid-flight.
CLAUDE.md's 'Always take the complete path' rule was violated by
the original HIGH-10 closure.

Additionally, `scanActorRoles` failed to read the new columns
even when present, so every GET-side path (ListByActor /
ListByRole) returned ActorRole with zero-value scope fields — the
GUI / MCP couldn't show operators what they had configured.

HOW.

internal/repository/postgres/auth.go:
  - EffectivePermissions SQL extended to intersect ar.scope with
    rp.scope via a CASE-in-subquery. The effective scope is the
    NARROWER of the two; disjoint tuples and scope-type mismatches
    drop the row entirely. WHERE filter on effective_scope_type
    IS NOT NULL excludes dropped rows.

    Match matrix (encoded by the CASE):
      ar.scope    rp.scope    effective_scope
      ─────────   ─────────   ──────────────────
      global      global      global / NULL
      global      profile=X   profile=X (rp narrows)
      profile=X   global      profile=X (ar narrows)
      profile=X   profile=X   profile=X (both agree)
      profile=X   profile=Y   ROW DROPPED (disjoint)
      profile=X   issuer=*    ROW DROPPED (type mismatch)

  - ListByActor + ListByRole SELECTs extended with scope_type +
    scope_id columns so the read-side surfaces what was persisted.
  - scanActorRoles reads the new columns into ActorRole.ScopeType
    + ScopeID via the existing sql.NullString + ScopeType cast
    pattern (mirrors RolePermission scan).

internal/repository/postgres/auth_scope_test.go (NEW):
  Testcontainer-backed regression matrix. 8 cases:
  1. ActorRoleGlobal_RolePermGlobal — trivial happy path.
  2. ActorRoleGlobal_RolePermProfile — rp narrows.
  3. ActorRoleProfile_RolePermGlobal_A1Closure — **load-bearing**
     post-fix case: profile-scoped grant narrows to profile.
  4. BothScopedSameTuple_Matches — exact-match collapse.
  5. BothScopedDifferentIDs_RowDropped — disjoint scopes produce
     no effective permission.
  6. ScopeTypeMismatch_RowDropped — profile vs issuer mismatch.
  7. ExpiredGrant_Excluded — pre-fix behavior preserved.
  8. ListByActor_ReturnsScopeColumns — read-side surface check.

  Tests skip in -short mode (testcontainers-backed; require Docker
  on operator workstation).

internal/service/auth/service_test.go:
  TestAuthorizer_ActorRoleProfileScope_OnlyNarrowedScopeAuthorizes_A1
  — unit-level pin (sandbox-runnable, no Docker). Simulates the
  post-A-1 SQL emission (narrowed effective row at
  profile=p-prod) and asserts CheckPermission authorizes only
  matching profile, rejects other profiles AND rejects global.
  Existing matcher code is unchanged; this proves the integration
  point.

CHANGELOG.md:
  Operator advisory in the new 'Security (BREAKING — silent-elevation
  closure)' section. Pre-existing scope-bound grants take effect on
  upgrade; operators audit `actor_roles WHERE scope_type != 'global'`
  to confirm intent.

cowork/auth-bundles-audit-2026-05-10.md:
  HIGH-10 row gets an A-1 follow-on CLOSED 2026-05-11 annotation
  describing the regression + closure.

VERIFY.

- gofmt -l <changed files>                                       (no diff)
- go vet ./internal/repository/postgres/... ./internal/service/auth/...
  ./internal/api/handler/... ./internal/auth/... ./cmd/server/...  PASS
- go test -short -count=1 ./internal/service/auth/...
  ./internal/repository/postgres/... ./internal/api/handler/...    PASS
- The testcontainer-backed regression matrix runs on operator
  workstation via 'go test -count=1 ./internal/repository/postgres/...'
  (skip in -short).

Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-10 (A-1 follow-on)
      cowork/auth-bundles-fixes-2026-05-11/01-crit-actor-role-scope-reads.md
      CLAUDE.md 'Always take the complete path' rule
2026-05-11 02:02:39 +00:00
shankar0123 f5ba17114d fix(audit): close silence-leg of HIGH-6; emit WARN on audit-write failure
Audit 2026-05-10 HIGH-6 partial closure (silence leg). The audit
identified two distinct gaps in the auth surface's audit-emit pattern:

  (1) silence — `_ = audit.RecordEventWithCategory(...)` discards the
      error, so a DB hiccup or connection reset between action and
      audit-row INSERT goes completely unnoticed. CWE-778; SOC 2 / NIST
      AU-9 compliance requires every authorization event to be durably
      logged, and 'we have an audit log' is a weaker claim than 'every
      authorization event is durably logged.'

  (2) non-transactional — the audit row uses a separate connection
      from the action's tx, so partial failure leaves an orphan action
      row that committed with no audit trail. Decision 8 of the
      auth-bundles-index requires action + audit row atomic.

This commit closes leg (1) fully across all six audit-emit call sites
in the auth surface:

  - internal/service/auth/actor_role_service.go::recordAudit
  - internal/service/auth/role_service.go::recordAudit
  - internal/auth/bootstrap/service.go::ValidateAndMint
  - internal/auth/breakglass/service.go::recordAudit
  - internal/auth/session/service.go::recordAudit
  - internal/api/handler/auth_session_oidc.go::recordAudit
  - internal/service/profile.go::Update (Phase 9 approval-bypass)

Each `_ = ...` swallow is replaced with:

  if err := audit.RecordEventWithCategory(...); err != nil {
      slog.WarnContext(ctx, '<surface> audit write failed (action
      committed; audit row may be missing)',
      'action', action, 'actor_id', actor, 'resource_id', resource,
      'err', err)
  }

Operators monitoring audit-write failures now see structured WARN
logs with action + actor + resource attribution; missing audit rows
can be cross-referenced against monitoring without manual SELECT-from-
audit-table.

Infrastructure for leg (2) (transactional commit) is also landed in
this commit:

  - service.AuditService.RecordEventWithCategoryWithTx (new method;
    accepts repository.Querier from postgres.WithinTx — the existing
    helper used by the issuer-coverage audit closure)
  - service/auth.AuditService interface declares the new method
  - test stub fakeAudit.RecordEventWithCategoryWithTx satisfies the
    extended interface

The eight per-path WithinTx-refactors documented in
cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
(role grant/revoke, session revoke, breakglass set/remove, approval
submit/approve/reject, OIDC provider CRUD, bootstrap consume) are
deferred to a v3 follow-on bundle. Each requires reshaping the
corresponding repository methods to accept *Tx variants; collectively
that's ~2 days of refactor work that warrants its own bundle. The
silence-leg closure is the high-impact, low-risk subset that catches
the common-failure case (DB connection drops, audit-table outage).

Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-6
Spec: cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
2026-05-10 21:24:29 +00:00
shankar0123 5d79e53ad0 auth-bundle-1 follow-on: close coverage gaps to clear Phase 12 floors
CI run #486 (post-Bundle-1 merge + Go 1.25.10 bump) failed three
coverage-threshold gates:

  internal/api/handler   74.7% < floor 75 (-0.3pp)
  internal/auth          66.3% < floor 85 (-18.7pp)
  internal/service/auth  51.1% < floor 85 (-33.9pp)

The Phase 12 gate file's "85% with negative-test coverage" claim
turned out to be aspirational — the read-side and Update-path
methods on RoleService / PermissionService / ActorRoleService had
zero unit-test coverage, and internal/auth's keystore +
HasPermission helper had zero tests. This commit closes the gap
without lowering the gate.

Per-package CI-style averages after this commit (per
scripts/check-coverage-thresholds.sh's per-function-mean):

  internal/api/handler   76.1% (+1.4pp,  margin +1.1pp)
  internal/auth          90.5% (+24.2pp, margin +5.5pp)
  internal/service/auth  93.7% (+42.6pp, margin +8.7pp)

Tests added:

  internal/service/auth/service_test.go (+18 tests, +518 LOC):
    PermissionService.List, PermissionService.GetByName,
    RoleService.Get (4 paths), RoleService.List (system caller),
    RoleService.Update (4 paths), RoleService.ListPermissions
    (3 paths), RoleService.AddPermission/RemovePermission round-trip
    + gate paths, RoleService.Delete (success + nil-caller +
    no-perm + audit), RoleService.Create (nil-caller),
    ActorRoleService.ListForActor (self-bypass + cross-actor +
    nil-caller + system + with-perm), ActorRoleService.Effective-
    Permissions (same shape), ActorRoleService.ListKeys (3 paths +
    system bypass), ActorRoleService.Revoke (4 paths), Authorizer
    edge cases (empty actorID short-circuit, empty tenantID
    default, scoped-grant-without-scope-id no-match invariant,
    repo-error wrap-and-return, HoldsAnyOf early-exit), recordAudit
    nil-arm short-circuits.

  internal/auth/keystore_test.go (NEW, +175 LOC):
    StaticKeyStore.Len, StaticKeyStore.LookupByHash hit + miss,
    MutableKeyStore seeded lookup + Len, Add registers new key,
    AddHashed registers from precomputed hash, AddHashed replaces
    on duplicate hash (idempotent boot-loader contract),
    HasPermission no-actor / default-actor-type / checker-error /
    scoped-check threading.

  internal/auth/bootstrap/service_test.go (+36 LOC):
    Service.Available nil-receiver/nil-strategy short-circuit,
    Service.Available delegates to Strategy when configured.

  internal/api/handler/auth_test.go (+208 LOC):
    GetRole returns role + permissions, GetRole 404 + 401, UpdateRole
    200 + invalid-JSON-400 + 401, ListKeys returns actor list + 401,
    RemoveRolePermission 204 (global + scoped) + 401,
    rolePermToResponse scope encoding pin via GetRole.

Verified:
  gofmt -l . clean (touched files only).
  go vet ./internal/auth/... ./internal/service/auth/...
       ./internal/api/handler/ rc=0.
  go test -count=1 -short on the four packages green.
  CI-style per-function averages computed via the live
       scripts/check-coverage-thresholds.sh arithmetic — all three
       gated packages clear their floors with margin.

Per CLAUDE.md "complete path" + "do not lower the gate to make CI
green": gate file unchanged. The 85/85/75 floors stand.
2026-05-10 02:04:36 +00:00
shankar0123 cbb47aaf5d auth-bundle-1 Phase 11 + 12: RBAC MCP tools + negative-test coverage gate
# Phase 11 — RBAC MCP tools

12 new tools in internal/mcp/tools_auth.go mirroring the Phase-4
+ Phase-7 HTTP surface so operators driving certctl from Claude
/ VS Code / any MCP client get the same management capability
the GUI + CLI already expose:

  certctl_auth_me                          GET    /v1/auth/me
  certctl_auth_list_roles                  GET    /v1/auth/roles
  certctl_auth_get_role                    GET    /v1/auth/roles/{id}
  certctl_auth_create_role                 POST   /v1/auth/roles
  certctl_auth_update_role                 PUT    /v1/auth/roles/{id}
  certctl_auth_delete_role                 DELETE /v1/auth/roles/{id}
  certctl_auth_list_permissions            GET    /v1/auth/permissions
  certctl_auth_add_permission_to_role      POST   /v1/auth/roles/{id}/permissions
  certctl_auth_remove_permission_from_role DELETE /v1/auth/roles/{id}/permissions/{perm}
  certctl_auth_list_keys                   GET    /v1/auth/keys
  certctl_auth_assign_role_to_key          POST   /v1/auth/keys/{id}/roles
  certctl_auth_revoke_role_from_key        DELETE /v1/auth/keys/{id}/roles/{role_id}

Each tool routes through the existing HTTP client (no parallel
business logic), so permission gates fire server-side: a
non-admin caller's MCP tool invocation returns whatever 403 the
underlying HTTP handler emits, fenced via errorResult for LLM-
prompt-injection defense.

Input types in internal/mcp/types.go (AuthRoleIDInput,
AuthCreateRoleInput, AuthUpdateRoleInput,
AuthRolePermissionGrantInput, AuthRolePermissionRevokeInput,
AuthAssignKeyRoleInput, AuthRevokeKeyRoleInput) carry
jsonschema descriptions so the MCP consumer's tool catalogue
shows operator-friendly hints.

internal/mcp/tools_auth_test.go ships 14 tests:
  - TestAuthMCP_AllToolsRegister (registration must not panic)
  - TestAuthMCP_PathsAndMethods (table-driven, 12 rows pinning
    each tool's HTTP method + URL)
  - TestAuthMCP_ForbiddenSurfacesFencedError (12 tools × 403
    mock → error surface)

internal/mcp/tools_per_tool_test.go's allHappyPathCases extended
with the 12 new rows so the in-memory dispatch coverage gate
(TestMCP_RegisterTools_DispatchableToolCount) stays green at the
new total of 139 registered tools.

Re-derived total via 'grep -cE "gomcp\.AddTool\(" internal/mcp/tools*.go':
133 (121 in tools.go + 12 in tools_auth.go).

# Phase 12 — negative-test coverage gate

Audit of the prompt's 12 negative-test paths against existing
coverage:

  1.  Missing actor → 401          ✓ TestRequirePermission_NoActorReturns401, TestRBACGate_NoActorReturns401
  2.  No roles → 403               ✓ TestRequirePermission_DeniedActorReturns403, TestRBACGate_AuditorRole_403sOnAdminRoutes
  3.  Role lacks specific perm → 403 ✓ same suite
  4.  Wrong scope → 403            ✓ TestAuthorizer_SpecificScopeMatchesExactID (wrongID arm)
  5.  Self-grant w/o auth.role.assign → 403 ✓ TestActorRoleService_GrantRequiresAuthRoleAssign
  6.  Bootstrap token wrong → 401  ✓ TestEnvTokenStrategy_WrongTokenReturnsInvalidToken, TestBootstrapHandler_Mint_WrongToken_401
  7.  Bootstrap used twice → 410   ✓ TestEnvTokenStrategy_OneShotConsumption, TestBootstrapHandler_Mint_TwiceReturns410
  8.  Bootstrap when admin exists → 410 ✓ TestEnvTokenStrategy_AdminExistsClosesPath, TestBootstrapHandler_Mint_AdminExists410
  9.  Role delete with assignees → 409 NEW: TestRoleService_DeleteWithActorsAssignedReturns409
  10. Profile-edit loophole → gated ✓ TestProfileEdit_RequiresApprovalLoopholeClosed
  11. Permission not in catalog → 400 ✓ TestRoleService_AddPermissionRejectsNonCanonical
  12. Scope ID for nonexistent resource → 404 (validation deferred — no FK constraint between role_permissions.scope_id and the resource tables; documented for a future bundle)

Filled the gap at #9 with TestRoleService_DeleteWithActorsAssignedReturns409
which pins the repository sentinel pass-through (postgres FK
ON DELETE RESTRICT → repository.ErrAuthRoleInUse → service
returns the sentinel verbatim → handler maps to HTTP 409).

# Coverage gates

.github/coverage-thresholds.yml gains 2 entries:
  - internal/auth: floor 85
  - internal/service/auth: floor 85

.github/workflows/ci.yml's coverage test command extended with
./internal/auth/... and ./internal/api/router/... so the
threshold check has data to evaluate.

# Protocol-endpoint not-gated test (Category F)

internal/api/router/phase12_protocol_allowlist_test.go (new)
adds 3 router-level invariant tests:

  - TestPhase12_ProtocolEndpointsNotGated: AST-walks router.go,
    asserts no rbacGate(...) call references a path under any
    protocol-endpoint prefix (/acme, /scep, /.well-known/est,
    /.well-known/pki/ocsp, /.well-known/pki/crl).
  - TestPhase12_IsProtocolEndpoint_CoversCanonicalPrefixes:
    pins auth.IsProtocolEndpoint against the canonical prefix
    set; if a future protocol lands without lockstep allowlist
    update, this fails.
  - TestPhase12_RBACGateRoutesAreUnderAPIv1: belt-and-braces —
    every rbacGate-wrapped route MUST start with /api/v1/.
    Catches accidental cross-prefix wraps.

Complements the existing TestRequirePermission_ProtocolEndpointBypassesGate
(middleware-level) + TestRouter_AuthExemptAllowlist_PinsActualRegistrations
(allowlist drift) so the Category F invariant is pinned at all
three layers (middleware + router + dispatch).

# Verifications

* gofmt clean repo-wide.
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain + mcp: clean.
* go test -short -count=1 green across internal/auth (incl.
  bootstrap), internal/api/handler, internal/api/router,
  internal/cli, internal/service (incl. auth),
  internal/domain/auth, internal/mcp, cmd/server, cmd/cli.
2026-05-09 23:46:01 +00:00
shankar0123 3ef45e2ad4 auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap

* internal/auth/bootstrap/ (new package): Strategy interface +
  EnvTokenStrategy with constant-time compare, one-shot consumption
  via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC-
  first-admin will plug in alongside as an alternate Strategy.
* BootstrapService.ValidateAndMint: validates the operator's
  CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API
  key value, persists the SHA-256 hash to api_keys, grants r-admin
  via actor_roles, AddHashed's the runtime keystore so the just-
  minted key authenticates the next request without restart, and
  records bootstrap.consume to the audit trail with category=auth.
* internal/auth/keystore.go (new): KeyStore interface +
  StaticKeyStore (immutable env-var-only path) + MutableKeyStore
  (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth
  middleware now consumes a KeyStore so the bootstrap path can
  extend the lookup table at runtime.
* migrations/000031_api_keys.up/down.sql: api_keys table with
  (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by,
  created_at, expires_at, last_used_at). Idempotent.
* /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both
  routes documented in api/openapi.yaml + AuthExemptRouterRoutes
  allowlist updated. The token never leaves internal/auth/bootstrap;
  the minted plaintext key flows only into the HTTP response body.
* Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND
  admin actors already exist (config drift signal).
* Tests: 4 strategy invariants (empty token born disabled, wrong
  token=ErrInvalidToken without consumption, one-shot consumption,
  admin-exists closes path), 5 service tests (happy path + actor-
  name validation + propagation of strategy errors + nil-deps
  guard + 32-byte entropy budget), 8 HTTP-handler tests (status
  201/410/401/400 mapping + token-leak hygiene scan of slog +
  audit details + Location header). Token-leak test redirects
  slog.Default to a buffer for the test scope.

# Phase 7 — API-key migration + scope-down CLI

* GET /v1/auth/keys handler + service method ListKeys backed by
  ActorRoleRepository.ListDistinctActors. Returns one row per
  (actor_id, actor_type) pair with the slice of role IDs they hold.
  Permission: auth.role.list.
* internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown
  (interactive), AuthScopeDownNonInteractive (JSON config),
  AuthScopeDownSuggest (--suggest with optional --apply). The
  synthetic actor-demo-anon is filtered out of every interactive /
  bulk path; non-interactive flow logs and skips it explicitly.
* SuggestRoleFromAuditEvents (pure function): walks 30 days of
  audit events per actor and returns the narrowest matching role
  (admin / mcp / viewer / agent / operator) plus a one-line reason.
  Classification: any admin-shaped action wins; otherwise all-MCP
  → mcp; all-read-only → viewer; all-agent-shaped → agent;
  otherwise operator. Test table pins all six classifications.
* CLI subcommand tree extended: 'auth keys list' + 'auth keys
  scope-down [--non-interactive <cfg>] [--suggest [--apply]]'.
* CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS
  call-out + four flow examples.

# Phase 8 — auditor role + event_category column

* migrations/000032_audit_category.up/down.sql: ALTER TABLE
  audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT
  'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config)
  + (event_category) and (event_category, timestamp DESC) indexes
  for the auditor-filter query path. WORM trigger from migration
  000018 continues to enforce append-only at the DB layer (DDL is
  not blocked).
* domain.AuditEvent gains EventCategory string (omitempty);
  domain.EventCategoryCertLifecycle / Auth / Config constants.
* AuditService.RecordEventWithCategory sibling of RecordEvent;
  legacy callers stay on RecordEvent (defaults to cert_lifecycle).
  Auth callers (RoleService, ActorRoleService, BootstrapService)
  switched to RecordEventWithCategory(..., 'auth', ...).
* GET /v1/audit?category=<cat>: handler accepts the optional query
  param, validates against the enum (400 on invalid value),
  dispatches through ListAuditEventsByCategory. OpenAPI updated
  with the new query param + AuditEvent.event_category schema.
* Postgres AuditRepository.Create now writes event_category;
  AuditRepository.List filters on it; AuditFilter.EventCategory
  gates the WHERE clause.
* Tests: 5 audit-category-filter HTTP tests (dispatch routing,
  back-compat fallback, 400 for invalid values, all 3 enum values
  accepted, page+category combine, JSON output surfaces the
  field). 3 auditor-role invariants (auditor holds exactly
  audit.read+audit.export, no mutating perms, disjoint from
  viewer except audit.read).

# Cross-phase wiring

* HandlerRegistry.Bootstrap field added; cmd/server/main.go wires
  the bootstrap service ahead of RegisterHandlers (extracted
  assembleNamedAPIKeys helper into auth_backfill.go, moved the
  keystore + bootstrap construction up alongside the auth repos).
* AuthCheckResolver / AuthActorRoleService extended with ListKeys
  to satisfy the Phase 7 surface; existing fakes updated.
* fakeAudit + mockAuditService stubs in tests gain
  RecordEventWithCategory + ListAuditEventsByCategory; existing
  tests untouched.

# Verifications

* gofmt -l: clean across every modified file.
* go vet ./...: clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* go test -short -count=1: green across every Bundle-1-touched
  package — internal/auth (incl. bootstrap), internal/api/handler,
  internal/api/router, internal/cli, internal/service/auth,
  internal/service, internal/domain/auth, internal/repository/postgres,
  cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware,
  cmd/agent, internal/mcp.
2026-05-09 20:15:43 +00:00
shankar0123 bd54d5f7fa auth-bundle-1 Phase 2: RBAC service layer + Authorizer primitive
Bundle 1 / Phase 2: ships PermissionService, RoleService, ActorRoleService, and the Authorizer primitive that Phase 3 RequirePermission middleware calls on every gated request.

Authorizer.CheckPermission semantics: a grant matches when (a) the permission name equals the requested permission AND (b) the grant is global-scoped OR the grant scope_type+scope_id exactly match the request. Global beats specific; per-resource grants widen the effective set rather than shadowing global. Hot-path query is one ActorRoleRepository.EffectivePermissions JOIN call (already shipped in Phase 1) plus an in-memory walk; Phase 12 will add benchmarks + caching if the JOIN cost shows up at scale.

Privilege-escalation guard: ActorRoleService.Grant and Revoke require the caller to hold auth.role.assign globally. Without it, ErrSelfRoleAssignment. System callers (AsSystemCaller()) bypass the check; bootstrap, migrations, scheduler-initiated grants use this path. Reserved actor actor-demo-anon is rejected on Grant + Revoke so the demo path stays alive even after a misclick (ErrAuthReservedActor).

Caller abstraction: every service entry point takes *Caller (ActorID, ActorType, TenantID, IsSystem). CallerFromContext is a stub returning ErrUnauthenticated; Phase 3 wires the middleware-context bridge that fills the Caller from request context. The contract is pinned by TestCallerFromContext_Phase2ReturnsUnauthenticated so the Phase 3 upgrade is observable.

Audit recording: every mutating service operation calls AuditService.RecordEvent. Bundle 1 Phase 8 adds the event_category column + parameter and back-fills 'auth' for these calls; until then the rows go in with the default category.

Test coverage: in-memory fakeRoleRepo / fakePermissionRepo / fakeActorRoleRepo / fakeAudit pin the privilege-escalation invariants (ErrUnauthenticated for nil caller, ErrForbidden for missing perm, ErrInvalidPermission for non-canonical permission name, ErrSelfRoleAssignment for Grant without auth.role.assign, ErrAuthReservedActor for actor-demo-anon mutations, system-caller bypass) without requiring testcontainers. Phase 12 will add live-Postgres integration coverage.

Branch: dev/auth-bundle-1. Phase 1 was 19497ee (RBAC schema + repo). Phase 3 (middleware integration) is the next commit on this branch.
2026-05-09 16:20:04 +00:00