Commit Graph

3 Commits

Author SHA1 Message Date
shankar0123 3ef45e2ad4 auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap

* internal/auth/bootstrap/ (new package): Strategy interface +
  EnvTokenStrategy with constant-time compare, one-shot consumption
  via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC-
  first-admin will plug in alongside as an alternate Strategy.
* BootstrapService.ValidateAndMint: validates the operator's
  CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API
  key value, persists the SHA-256 hash to api_keys, grants r-admin
  via actor_roles, AddHashed's the runtime keystore so the just-
  minted key authenticates the next request without restart, and
  records bootstrap.consume to the audit trail with category=auth.
* internal/auth/keystore.go (new): KeyStore interface +
  StaticKeyStore (immutable env-var-only path) + MutableKeyStore
  (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth
  middleware now consumes a KeyStore so the bootstrap path can
  extend the lookup table at runtime.
* migrations/000031_api_keys.up/down.sql: api_keys table with
  (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by,
  created_at, expires_at, last_used_at). Idempotent.
* /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both
  routes documented in api/openapi.yaml + AuthExemptRouterRoutes
  allowlist updated. The token never leaves internal/auth/bootstrap;
  the minted plaintext key flows only into the HTTP response body.
* Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND
  admin actors already exist (config drift signal).
* Tests: 4 strategy invariants (empty token born disabled, wrong
  token=ErrInvalidToken without consumption, one-shot consumption,
  admin-exists closes path), 5 service tests (happy path + actor-
  name validation + propagation of strategy errors + nil-deps
  guard + 32-byte entropy budget), 8 HTTP-handler tests (status
  201/410/401/400 mapping + token-leak hygiene scan of slog +
  audit details + Location header). Token-leak test redirects
  slog.Default to a buffer for the test scope.

# Phase 7 — API-key migration + scope-down CLI

* GET /v1/auth/keys handler + service method ListKeys backed by
  ActorRoleRepository.ListDistinctActors. Returns one row per
  (actor_id, actor_type) pair with the slice of role IDs they hold.
  Permission: auth.role.list.
* internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown
  (interactive), AuthScopeDownNonInteractive (JSON config),
  AuthScopeDownSuggest (--suggest with optional --apply). The
  synthetic actor-demo-anon is filtered out of every interactive /
  bulk path; non-interactive flow logs and skips it explicitly.
* SuggestRoleFromAuditEvents (pure function): walks 30 days of
  audit events per actor and returns the narrowest matching role
  (admin / mcp / viewer / agent / operator) plus a one-line reason.
  Classification: any admin-shaped action wins; otherwise all-MCP
  → mcp; all-read-only → viewer; all-agent-shaped → agent;
  otherwise operator. Test table pins all six classifications.
* CLI subcommand tree extended: 'auth keys list' + 'auth keys
  scope-down [--non-interactive <cfg>] [--suggest [--apply]]'.
* CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS
  call-out + four flow examples.

# Phase 8 — auditor role + event_category column

* migrations/000032_audit_category.up/down.sql: ALTER TABLE
  audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT
  'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config)
  + (event_category) and (event_category, timestamp DESC) indexes
  for the auditor-filter query path. WORM trigger from migration
  000018 continues to enforce append-only at the DB layer (DDL is
  not blocked).
* domain.AuditEvent gains EventCategory string (omitempty);
  domain.EventCategoryCertLifecycle / Auth / Config constants.
* AuditService.RecordEventWithCategory sibling of RecordEvent;
  legacy callers stay on RecordEvent (defaults to cert_lifecycle).
  Auth callers (RoleService, ActorRoleService, BootstrapService)
  switched to RecordEventWithCategory(..., 'auth', ...).
* GET /v1/audit?category=<cat>: handler accepts the optional query
  param, validates against the enum (400 on invalid value),
  dispatches through ListAuditEventsByCategory. OpenAPI updated
  with the new query param + AuditEvent.event_category schema.
* Postgres AuditRepository.Create now writes event_category;
  AuditRepository.List filters on it; AuditFilter.EventCategory
  gates the WHERE clause.
* Tests: 5 audit-category-filter HTTP tests (dispatch routing,
  back-compat fallback, 400 for invalid values, all 3 enum values
  accepted, page+category combine, JSON output surfaces the
  field). 3 auditor-role invariants (auditor holds exactly
  audit.read+audit.export, no mutating perms, disjoint from
  viewer except audit.read).

# Cross-phase wiring

* HandlerRegistry.Bootstrap field added; cmd/server/main.go wires
  the bootstrap service ahead of RegisterHandlers (extracted
  assembleNamedAPIKeys helper into auth_backfill.go, moved the
  keystore + bootstrap construction up alongside the auth repos).
* AuthCheckResolver / AuthActorRoleService extended with ListKeys
  to satisfy the Phase 7 surface; existing fakes updated.
* fakeAudit + mockAuditService stubs in tests gain
  RecordEventWithCategory + ListAuditEventsByCategory; existing
  tests untouched.

# Verifications

* gofmt -l: clean across every modified file.
* go vet ./...: clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* go test -short -count=1: green across every Bundle-1-touched
  package — internal/auth (incl. bootstrap), internal/api/handler,
  internal/api/router, internal/cli, internal/service/auth,
  internal/service, internal/domain/auth, internal/repository/postgres,
  cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware,
  cmd/agent, internal/mcp.
2026-05-09 20:15:43 +00:00
shankar0123 7ff2e2de08 auth-bundle-1 Phase 3.5: handler IsAdmin -> router-wrapped RequirePermission
Phase 3.5 atomic conversion. The five legacy admin-gated handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) had their in-body auth.IsAdmin checks removed; the gate moved to router.go via auth.RequirePermission middleware wrapping each route. Non-admin operators with the right scoped permission can now reach these endpoints; legacy in-body admin checks no longer block them.

Migration 000030_rbac_admin_perms.up.sql ships five admin-only fine-grained permissions: cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage. All five are seeded into r-admin only; operator/viewer/agent/mcp/cli/auditor do not receive them by default. Operators can grant any of these to a custom role via the Phase 4 RBAC API. Idempotent + transaction-wrapped.

internal/domain/auth/validate.go::CanonicalPermissions extended with the five new entries so RoleService.AddPermission accepts them.

internal/api/router/router.go: HandlerRegistry gains a Checker field (auth.PermissionChecker). New rbacGate(checker, perm, handler) helper wraps a handler with auth.RequirePermission middleware; nil-checker fall-through preserves test/demo deployments without the RBAC stack. 12 admin routes wrapped: cert.bulk_revoke (POST /api/v1/certificates/bulk-revoke + POST /api/v1/est/certificates/bulk-revoke), crl.admin (GET /api/v1/admin/crl/cache), scep.admin (GET /api/v1/admin/scep/profiles + GET /api/v1/admin/scep/intune/stats + POST /api/v1/admin/scep/intune/reload-trust), est.admin (GET /api/v1/admin/est/profiles + POST /api/v1/admin/est/reload-trust), ca.hierarchy.manage (POST /api/v1/issuers/{id}/intermediates + GET /api/v1/issuers/{id}/intermediates + POST /api/v1/intermediates/{id}/retire + GET /api/v1/intermediates/{id}).

cmd/server/main.go: HandlerRegistry.Checker wired with the same authPermissionCheckerAdapter shim Phase 4 introduced for AuthHandler. Same adapter; one source of truth.

Handler bodies: removed eight in-body auth.IsAdmin checks across the 5 files. bulk_revocation.go's BulkRevoke + BulkRevokeEST, admin_crl_cache.go::ListCache, admin_scep_intune.go's three methods, admin_est.go's two methods, intermediate_ca.go's four methods. Replaced each with a comment naming the new gate location. Unused 'github.com/certctl-io/certctl/internal/auth' imports removed.

Test triplet rewrite: deleted obsolete _NonAdmin_Returns403 and _AdminExplicitFalse_Returns403 tests across 6 test files (5 handler tests + bulk_revocation_est_test.go) — they tested the now-removed in-body gate. _AdminPermitted_ForwardsActor tests stay intact: they pin the actor-passthrough invariant which is still relevant. Added internal/api/router/rbac_gate_integration_test.go with four router-level integration tests pinning the new gate: deny → 403 + handler not reached, permit → 200 + handler reached, nil-checker → fall-through, no-actor → 401.

M-008 admin-gate registry: AdminGatedHandlers map now empty (Phase 3.5 invariant: zero in-handler auth.IsAdmin call sites; only health.go's informational caller remains). m008_admin_gate_test.go retains the scan to enforce the invariant going forward; new admin-gated routes must wrap at router.go::rbacGate, not gate in-handler. Updated error message to direct future contributors to the new pattern.

Verifications: gofmt clean across all touched files; go vet ./... clean; go test -short across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, cmd/server all green.

Branch: dev/auth-bundle-1. Commit chain: 99a012e (Phase 0 extract) -> 19497ee (Phase 1 schema + repo) -> bd54d5f (Phase 2 service) -> d473398 (Phase 3 primitive) -> b169f25 (Phase 4 + 5) -> THIS (Phase 3.5 conversion). Phase 6+ (bootstrap, scope-down, auditor, approval-bypass closure, GUI, docs) on subsequent sessions.
2026-05-09 17:00:30 +00:00
shankar0123 19497eef87 auth-bundle-1 Phase 1: RBAC schema + domain types + repository layer
Bundle 1 / Phase 1: ships the RBAC primitive as schema + domain types + repo layer. Service-layer wiring lands in Phase 2; middleware integration in Phase 3.

Schema (migrations/000029_rbac.up.sql, 272 lines, idempotent, transaction-wrapped):

tenants, roles, permissions, role_permissions, actor_roles. TEXT primary keys with prefixes (t-, r-, p-, ar-) per CLAUDE.md Architecture Decisions. TIMESTAMPTZ time columns. FK cascade explicit (tenant CASCADE, role RESTRICT, actor CASCADE). Three-value scope_type CHECK ('global', 'profile', 'issuer') matched 1:1 with internal/domain/auth.ScopeType. UNIQUE(tenant_id, name) on roles, UNIQUE(name) on permissions, UNIQUE(actor_id, actor_type, role_id, tenant_id) on actor_roles.

Seeds: t-default tenant, 7 default roles (admin, operator, viewer, agent, mcp, cli, auditor), 33-permission canonical catalogue (cert.* / profile.* / issuer.* / target.* / agent.* / audit.* / auth.role.* / auth.key.* / auth.bootstrap.use), full role->permission grant matrix at global scope. Demo-mode preservation: actor-demo-anon seeded with admin role unconditionally; Phase 3 wires the auth middleware to inject this actor into the context when CERTCTL_AUTH_TYPE=none. Reserved system actor; Phase 4 API rejects mutations / deletions targeting it with 409 Conflict.

Domain types (internal/domain/auth/{types,validate,validate_test}.go):

Tenant, Role, Permission, RolePermission, ActorRole structs with JSON tags. ScopeType enum (global/profile/issuer). ActorTypeValue mirrors internal/domain.ActorType to avoid an import cycle. CanonicalPermissions slice + DefaultRoles map are the single source of truth referenced by the migration; validate_test.go pins (a) no duplicate permissions, (b) every default-role permission is canonical, (c) admin holds the full catalogue, (d) seeded IDs carry the prefix convention, (e) ScopeType enum has exactly 3 values matching the CHECK constraint.

Extended internal/domain/audit.go: added ActorTypeAPIKey + ActorTypeAnonymous to the existing User/System/Agent enum so the audit trail can distinguish API-key requests from federated humans (Bundle 2 OIDC) and demo-mode (CERTCTL_AUTH_TYPE=none). Existing code that records actor_type=User keeps working; new APIKey value used by Bundle 1 Phase 3 middleware.

Repository layer (internal/repository/auth.go + internal/repository/postgres/auth.go):

TenantRepository (Get, List, EnsureDefault). RoleRepository (Get, GetByName, List, Create, Update, Delete with ErrAuthRoleInUse on FK RESTRICT, ListPermissions, AddPermission idempotent, RemovePermission). PermissionRepository (List, GetByName, IsCanonical for fail-fast catalog check). ActorRoleRepository (ListByActor, ListByRole, Grant idempotent, Revoke, EffectivePermissions which is the JOIN that auth.RequirePermission will use in Phase 3 — returns deduplicated (permission, scope) triples honouring the not-yet-expired predicate so future time-bound grants work without code change). Sentinel errors ErrAuthNotFound, ErrAuthDuplicateName, ErrAuthRoleInUse, ErrAuthReservedActor, ErrAuthUnknownPermission for handler-layer 404/409/400 mapping.

Verification: gofmt clean, go vet ./... clean, go test -short ./internal/domain/auth ./internal/repository/postgres pass. Integration tests against a live Postgres are gated by testing.Short() per repo convention; Phase 12 wires the testcontainers harness for full e2e coverage.

Branch: dev/auth-bundle-1. Phase 0 was 99a012e (extract internal/auth/). Phase 2 (service layer) is the next bundle.
2026-05-09 16:00:08 +00:00