Commit Graph

4 Commits

Author SHA1 Message Date
shankar0123 9cce2ab043 harden(auth): LOW + Nit batch — bootstrap audit, crypto/rand, XFF trust, CSRF check, protocol-prefix unify (Batch 1)
Audit 2026-05-10 — close 8 LOWs + 2 Nits in-bundle. Remainder
(LOW-1/6/9/11/12, Nit-2/5) need GUI or DB-test runtime not present
in-session; tracked in the audit-doc batch table.

LOW-2: bootstrap.ValidateAndMint now emits 'bootstrap.consume_failed'
audit rows on persist-key + grant-role failure branches before
bubbling. Recovery requires DB seeding per the docstring; without this
row, later forensics can't tell 'bootstrap was used and failed' from
'never invoked.'

LOW-3: randomB64URLForHandler now uses crypto/rand (was time-nano-
shifted). Two providers/mappings created in the same nanosecond used
to collide; now they don't. Time-nano fallback retained for the
unlikely crypto/rand-broken path.

LOW-4: breakglass.verifyDummy uses s.readRand(salt) for the dummy
Argon2id verify. Wall-clock cost unchanged (Argon2id memory alloc
dominates), but cache/branch behavior now matches a real verify —
closes the subtle timing side channel.

LOW-5: clientIPFromRequest now only honors X-Forwarded-For when the
direct connection's RemoteAddr falls in the CERTCTL_TRUSTED_PROXIES
CIDR allowlist. Default-deny: empty list means XFF is ignored.
SetTrustedProxies wired in cmd/server/main.go from cfg.Auth.TrustedProxies.

LOW-7: internal/auth/protocol_endpoints.go::ProtocolEndpointPrefixes
now carries /scep-mtls + /.well-known/est-mtls (previously only in
router.AuthExemptDispatchPrefixes; the two lists had drifted). The
canonical-prefix coverage test in Phase 12 still pins the set.

LOW-8: docs/operator/rbac.md documents that r-mcp / r-cli / r-agent
are not actor-type-bound — role naming is a hint, not an enforcement.
Operators wanting hard binding must apply periodic audit queries.
Native binding is on the v2 roadmap.

LOW-10: Session.Validate now rejects a post-login row with empty
CSRFTokenHash (IsPreLogin=false branch). validSession test fixture
updated with a valid 64-hex CSRF hash.

Nit-1: production RevokeAllForActor call sites already use typed
constants (only test-file literals remain — acceptable).

Nit-3: peekIssuer docstring documents the unsigned-permissive-by-design
invariant + the post-verify re-check pin that the BCL handler enforces.
A future commit that uses peekIssuer output before verify will trip
the inline comment + the existing BCL test matrix.

Status table updated in cowork/auth-bundles-audit-2026-05-10.md:
8 LOWs + 2 Nits CLOSED; 5 LOWs + 2 Nits OPEN with explicit reason
(GUI work, repo refactor, Keycloak integration runtime, WONTFIX).

Refs: cowork/auth-bundles-audit-2026-05-10.md LOW-2/3/4/5/7/8/10
      cowork/auth-bundles-audit-2026-05-10.md Nit-1/3
2026-05-10 22:26:12 +00:00
shankar0123 f5ba17114d fix(audit): close silence-leg of HIGH-6; emit WARN on audit-write failure
Audit 2026-05-10 HIGH-6 partial closure (silence leg). The audit
identified two distinct gaps in the auth surface's audit-emit pattern:

  (1) silence — `_ = audit.RecordEventWithCategory(...)` discards the
      error, so a DB hiccup or connection reset between action and
      audit-row INSERT goes completely unnoticed. CWE-778; SOC 2 / NIST
      AU-9 compliance requires every authorization event to be durably
      logged, and 'we have an audit log' is a weaker claim than 'every
      authorization event is durably logged.'

  (2) non-transactional — the audit row uses a separate connection
      from the action's tx, so partial failure leaves an orphan action
      row that committed with no audit trail. Decision 8 of the
      auth-bundles-index requires action + audit row atomic.

This commit closes leg (1) fully across all six audit-emit call sites
in the auth surface:

  - internal/service/auth/actor_role_service.go::recordAudit
  - internal/service/auth/role_service.go::recordAudit
  - internal/auth/bootstrap/service.go::ValidateAndMint
  - internal/auth/breakglass/service.go::recordAudit
  - internal/auth/session/service.go::recordAudit
  - internal/api/handler/auth_session_oidc.go::recordAudit
  - internal/service/profile.go::Update (Phase 9 approval-bypass)

Each `_ = ...` swallow is replaced with:

  if err := audit.RecordEventWithCategory(...); err != nil {
      slog.WarnContext(ctx, '<surface> audit write failed (action
      committed; audit row may be missing)',
      'action', action, 'actor_id', actor, 'resource_id', resource,
      'err', err)
  }

Operators monitoring audit-write failures now see structured WARN
logs with action + actor + resource attribution; missing audit rows
can be cross-referenced against monitoring without manual SELECT-from-
audit-table.

Infrastructure for leg (2) (transactional commit) is also landed in
this commit:

  - service.AuditService.RecordEventWithCategoryWithTx (new method;
    accepts repository.Querier from postgres.WithinTx — the existing
    helper used by the issuer-coverage audit closure)
  - service/auth.AuditService interface declares the new method
  - test stub fakeAudit.RecordEventWithCategoryWithTx satisfies the
    extended interface

The eight per-path WithinTx-refactors documented in
cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
(role grant/revoke, session revoke, breakglass set/remove, approval
submit/approve/reject, OIDC provider CRUD, bootstrap consume) are
deferred to a v3 follow-on bundle. Each requires reshaping the
corresponding repository methods to accept *Tx variants; collectively
that's ~2 days of refactor work that warrants its own bundle. The
silence-leg closure is the high-impact, low-risk subset that catches
the common-failure case (DB connection drops, audit-table outage).

Refs: cowork/auth-bundles-audit-2026-05-10.md HIGH-6
Spec: cowork/auth-bundles-fixes-2026-05-10/10-high-6-atomic-audit-commit.md
2026-05-10 21:24:29 +00:00
shankar0123 5d79e53ad0 auth-bundle-1 follow-on: close coverage gaps to clear Phase 12 floors
CI run #486 (post-Bundle-1 merge + Go 1.25.10 bump) failed three
coverage-threshold gates:

  internal/api/handler   74.7% < floor 75 (-0.3pp)
  internal/auth          66.3% < floor 85 (-18.7pp)
  internal/service/auth  51.1% < floor 85 (-33.9pp)

The Phase 12 gate file's "85% with negative-test coverage" claim
turned out to be aspirational — the read-side and Update-path
methods on RoleService / PermissionService / ActorRoleService had
zero unit-test coverage, and internal/auth's keystore +
HasPermission helper had zero tests. This commit closes the gap
without lowering the gate.

Per-package CI-style averages after this commit (per
scripts/check-coverage-thresholds.sh's per-function-mean):

  internal/api/handler   76.1% (+1.4pp,  margin +1.1pp)
  internal/auth          90.5% (+24.2pp, margin +5.5pp)
  internal/service/auth  93.7% (+42.6pp, margin +8.7pp)

Tests added:

  internal/service/auth/service_test.go (+18 tests, +518 LOC):
    PermissionService.List, PermissionService.GetByName,
    RoleService.Get (4 paths), RoleService.List (system caller),
    RoleService.Update (4 paths), RoleService.ListPermissions
    (3 paths), RoleService.AddPermission/RemovePermission round-trip
    + gate paths, RoleService.Delete (success + nil-caller +
    no-perm + audit), RoleService.Create (nil-caller),
    ActorRoleService.ListForActor (self-bypass + cross-actor +
    nil-caller + system + with-perm), ActorRoleService.Effective-
    Permissions (same shape), ActorRoleService.ListKeys (3 paths +
    system bypass), ActorRoleService.Revoke (4 paths), Authorizer
    edge cases (empty actorID short-circuit, empty tenantID
    default, scoped-grant-without-scope-id no-match invariant,
    repo-error wrap-and-return, HoldsAnyOf early-exit), recordAudit
    nil-arm short-circuits.

  internal/auth/keystore_test.go (NEW, +175 LOC):
    StaticKeyStore.Len, StaticKeyStore.LookupByHash hit + miss,
    MutableKeyStore seeded lookup + Len, Add registers new key,
    AddHashed registers from precomputed hash, AddHashed replaces
    on duplicate hash (idempotent boot-loader contract),
    HasPermission no-actor / default-actor-type / checker-error /
    scoped-check threading.

  internal/auth/bootstrap/service_test.go (+36 LOC):
    Service.Available nil-receiver/nil-strategy short-circuit,
    Service.Available delegates to Strategy when configured.

  internal/api/handler/auth_test.go (+208 LOC):
    GetRole returns role + permissions, GetRole 404 + 401, UpdateRole
    200 + invalid-JSON-400 + 401, ListKeys returns actor list + 401,
    RemoveRolePermission 204 (global + scoped) + 401,
    rolePermToResponse scope encoding pin via GetRole.

Verified:
  gofmt -l . clean (touched files only).
  go vet ./internal/auth/... ./internal/service/auth/...
       ./internal/api/handler/ rc=0.
  go test -count=1 -short on the four packages green.
  CI-style per-function averages computed via the live
       scripts/check-coverage-thresholds.sh arithmetic — all three
       gated packages clear their floors with margin.

Per CLAUDE.md "complete path" + "do not lower the gate to make CI
green": gate file unchanged. The 85/85/75 floors stand.
2026-05-10 02:04:36 +00:00
shankar0123 3ef45e2ad4 auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap

* internal/auth/bootstrap/ (new package): Strategy interface +
  EnvTokenStrategy with constant-time compare, one-shot consumption
  via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC-
  first-admin will plug in alongside as an alternate Strategy.
* BootstrapService.ValidateAndMint: validates the operator's
  CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API
  key value, persists the SHA-256 hash to api_keys, grants r-admin
  via actor_roles, AddHashed's the runtime keystore so the just-
  minted key authenticates the next request without restart, and
  records bootstrap.consume to the audit trail with category=auth.
* internal/auth/keystore.go (new): KeyStore interface +
  StaticKeyStore (immutable env-var-only path) + MutableKeyStore
  (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth
  middleware now consumes a KeyStore so the bootstrap path can
  extend the lookup table at runtime.
* migrations/000031_api_keys.up/down.sql: api_keys table with
  (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by,
  created_at, expires_at, last_used_at). Idempotent.
* /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both
  routes documented in api/openapi.yaml + AuthExemptRouterRoutes
  allowlist updated. The token never leaves internal/auth/bootstrap;
  the minted plaintext key flows only into the HTTP response body.
* Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND
  admin actors already exist (config drift signal).
* Tests: 4 strategy invariants (empty token born disabled, wrong
  token=ErrInvalidToken without consumption, one-shot consumption,
  admin-exists closes path), 5 service tests (happy path + actor-
  name validation + propagation of strategy errors + nil-deps
  guard + 32-byte entropy budget), 8 HTTP-handler tests (status
  201/410/401/400 mapping + token-leak hygiene scan of slog +
  audit details + Location header). Token-leak test redirects
  slog.Default to a buffer for the test scope.

# Phase 7 — API-key migration + scope-down CLI

* GET /v1/auth/keys handler + service method ListKeys backed by
  ActorRoleRepository.ListDistinctActors. Returns one row per
  (actor_id, actor_type) pair with the slice of role IDs they hold.
  Permission: auth.role.list.
* internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown
  (interactive), AuthScopeDownNonInteractive (JSON config),
  AuthScopeDownSuggest (--suggest with optional --apply). The
  synthetic actor-demo-anon is filtered out of every interactive /
  bulk path; non-interactive flow logs and skips it explicitly.
* SuggestRoleFromAuditEvents (pure function): walks 30 days of
  audit events per actor and returns the narrowest matching role
  (admin / mcp / viewer / agent / operator) plus a one-line reason.
  Classification: any admin-shaped action wins; otherwise all-MCP
  → mcp; all-read-only → viewer; all-agent-shaped → agent;
  otherwise operator. Test table pins all six classifications.
* CLI subcommand tree extended: 'auth keys list' + 'auth keys
  scope-down [--non-interactive <cfg>] [--suggest [--apply]]'.
* CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS
  call-out + four flow examples.

# Phase 8 — auditor role + event_category column

* migrations/000032_audit_category.up/down.sql: ALTER TABLE
  audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT
  'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config)
  + (event_category) and (event_category, timestamp DESC) indexes
  for the auditor-filter query path. WORM trigger from migration
  000018 continues to enforce append-only at the DB layer (DDL is
  not blocked).
* domain.AuditEvent gains EventCategory string (omitempty);
  domain.EventCategoryCertLifecycle / Auth / Config constants.
* AuditService.RecordEventWithCategory sibling of RecordEvent;
  legacy callers stay on RecordEvent (defaults to cert_lifecycle).
  Auth callers (RoleService, ActorRoleService, BootstrapService)
  switched to RecordEventWithCategory(..., 'auth', ...).
* GET /v1/audit?category=<cat>: handler accepts the optional query
  param, validates against the enum (400 on invalid value),
  dispatches through ListAuditEventsByCategory. OpenAPI updated
  with the new query param + AuditEvent.event_category schema.
* Postgres AuditRepository.Create now writes event_category;
  AuditRepository.List filters on it; AuditFilter.EventCategory
  gates the WHERE clause.
* Tests: 5 audit-category-filter HTTP tests (dispatch routing,
  back-compat fallback, 400 for invalid values, all 3 enum values
  accepted, page+category combine, JSON output surfaces the
  field). 3 auditor-role invariants (auditor holds exactly
  audit.read+audit.export, no mutating perms, disjoint from
  viewer except audit.read).

# Cross-phase wiring

* HandlerRegistry.Bootstrap field added; cmd/server/main.go wires
  the bootstrap service ahead of RegisterHandlers (extracted
  assembleNamedAPIKeys helper into auth_backfill.go, moved the
  keystore + bootstrap construction up alongside the auth repos).
* AuthCheckResolver / AuthActorRoleService extended with ListKeys
  to satisfy the Phase 7 surface; existing fakes updated.
* fakeAudit + mockAuditService stubs in tests gain
  RecordEventWithCategory + ListAuditEventsByCategory; existing
  tests untouched.

# Verifications

* gofmt -l: clean across every modified file.
* go vet ./...: clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* go test -short -count=1: green across every Bundle-1-touched
  package — internal/auth (incl. bootstrap), internal/api/handler,
  internal/api/router, internal/cli, internal/service/auth,
  internal/service, internal/domain/auth, internal/repository/postgres,
  cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware,
  cmd/agent, internal/mcp.
2026-05-09 20:15:43 +00:00