auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split

# Phase 6 — day-0 admin bootstrap

* internal/auth/bootstrap/ (new package): Strategy interface +
  EnvTokenStrategy with constant-time compare, one-shot consumption
  via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC-
  first-admin will plug in alongside as an alternate Strategy.
* BootstrapService.ValidateAndMint: validates the operator's
  CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API
  key value, persists the SHA-256 hash to api_keys, grants r-admin
  via actor_roles, AddHashed's the runtime keystore so the just-
  minted key authenticates the next request without restart, and
  records bootstrap.consume to the audit trail with category=auth.
* internal/auth/keystore.go (new): KeyStore interface +
  StaticKeyStore (immutable env-var-only path) + MutableKeyStore
  (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth
  middleware now consumes a KeyStore so the bootstrap path can
  extend the lookup table at runtime.
* migrations/000031_api_keys.up/down.sql: api_keys table with
  (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by,
  created_at, expires_at, last_used_at). Idempotent.
* /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both
  routes documented in api/openapi.yaml + AuthExemptRouterRoutes
  allowlist updated. The token never leaves internal/auth/bootstrap;
  the minted plaintext key flows only into the HTTP response body.
* Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND
  admin actors already exist (config drift signal).
* Tests: 4 strategy invariants (empty token born disabled, wrong
  token=ErrInvalidToken without consumption, one-shot consumption,
  admin-exists closes path), 5 service tests (happy path + actor-
  name validation + propagation of strategy errors + nil-deps
  guard + 32-byte entropy budget), 8 HTTP-handler tests (status
  201/410/401/400 mapping + token-leak hygiene scan of slog +
  audit details + Location header). Token-leak test redirects
  slog.Default to a buffer for the test scope.

# Phase 7 — API-key migration + scope-down CLI

* GET /v1/auth/keys handler + service method ListKeys backed by
  ActorRoleRepository.ListDistinctActors. Returns one row per
  (actor_id, actor_type) pair with the slice of role IDs they hold.
  Permission: auth.role.list.
* internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown
  (interactive), AuthScopeDownNonInteractive (JSON config),
  AuthScopeDownSuggest (--suggest with optional --apply). The
  synthetic actor-demo-anon is filtered out of every interactive /
  bulk path; non-interactive flow logs and skips it explicitly.
* SuggestRoleFromAuditEvents (pure function): walks 30 days of
  audit events per actor and returns the narrowest matching role
  (admin / mcp / viewer / agent / operator) plus a one-line reason.
  Classification: any admin-shaped action wins; otherwise all-MCP
  → mcp; all-read-only → viewer; all-agent-shaped → agent;
  otherwise operator. Test table pins all six classifications.
* CLI subcommand tree extended: 'auth keys list' + 'auth keys
  scope-down [--non-interactive <cfg>] [--suggest [--apply]]'.
* CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS
  call-out + four flow examples.

# Phase 8 — auditor role + event_category column

* migrations/000032_audit_category.up/down.sql: ALTER TABLE
  audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT
  'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config)
  + (event_category) and (event_category, timestamp DESC) indexes
  for the auditor-filter query path. WORM trigger from migration
  000018 continues to enforce append-only at the DB layer (DDL is
  not blocked).
* domain.AuditEvent gains EventCategory string (omitempty);
  domain.EventCategoryCertLifecycle / Auth / Config constants.
* AuditService.RecordEventWithCategory sibling of RecordEvent;
  legacy callers stay on RecordEvent (defaults to cert_lifecycle).
  Auth callers (RoleService, ActorRoleService, BootstrapService)
  switched to RecordEventWithCategory(..., 'auth', ...).
* GET /v1/audit?category=<cat>: handler accepts the optional query
  param, validates against the enum (400 on invalid value),
  dispatches through ListAuditEventsByCategory. OpenAPI updated
  with the new query param + AuditEvent.event_category schema.
* Postgres AuditRepository.Create now writes event_category;
  AuditRepository.List filters on it; AuditFilter.EventCategory
  gates the WHERE clause.
* Tests: 5 audit-category-filter HTTP tests (dispatch routing,
  back-compat fallback, 400 for invalid values, all 3 enum values
  accepted, page+category combine, JSON output surfaces the
  field). 3 auditor-role invariants (auditor holds exactly
  audit.read+audit.export, no mutating perms, disjoint from
  viewer except audit.read).

# Cross-phase wiring

* HandlerRegistry.Bootstrap field added; cmd/server/main.go wires
  the bootstrap service ahead of RegisterHandlers (extracted
  assembleNamedAPIKeys helper into auth_backfill.go, moved the
  keystore + bootstrap construction up alongside the auth repos).
* AuthCheckResolver / AuthActorRoleService extended with ListKeys
  to satisfy the Phase 7 surface; existing fakes updated.
* fakeAudit + mockAuditService stubs in tests gain
  RecordEventWithCategory + ListAuditEventsByCategory; existing
  tests untouched.

# Verifications

* gofmt -l: clean across every modified file.
* go vet ./...: clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* go test -short -count=1: green across every Bundle-1-touched
  package — internal/auth (incl. bootstrap), internal/api/handler,
  internal/api/router, internal/cli, internal/service/auth,
  internal/service, internal/domain/auth, internal/repository/postgres,
  cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware,
  cmd/agent, internal/mcp.
This commit is contained in:
shankar0123
2026-05-09 20:15:43 +00:00
parent 60a589ab96
commit 3ef45e2ad4
38 changed files with 3159 additions and 140 deletions
+17 -5
View File
@@ -39,14 +39,21 @@ func (r *AuditRepository) CreateWithTx(ctx context.Context, q repository.Querier
if event.ID == "" {
event.ID = uuid.New().String()
}
// Bundle 1 Phase 8: empty EventCategory defaults to
// cert_lifecycle (matches the migration's DEFAULT clause + the
// DB CHECK constraint). The boundary catches callers that
// haven't yet been migrated to the categorized API.
if event.EventCategory == "" {
event.EventCategory = domain.EventCategoryCertLifecycle
}
err := q.QueryRowContext(ctx, `
INSERT INTO audit_events (
id, actor, actor_type, action, resource_type, resource_id, details, timestamp
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
id, actor, actor_type, action, resource_type, resource_id, details, timestamp, event_category
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
RETURNING id
`, event.ID, event.Actor, event.ActorType, event.Action, event.ResourceType,
event.ResourceID, event.Details, event.Timestamp).Scan(&event.ID)
event.ResourceID, event.Details, event.Timestamp, event.EventCategory).Scan(&event.ID)
if err != nil {
return fmt.Errorf("failed to create audit event: %w", err)
@@ -104,6 +111,11 @@ func (r *AuditRepository) List(ctx context.Context, filter *repository.AuditFilt
args = append(args, filter.To)
argCount++
}
if filter.EventCategory != "" {
whereConditions = append(whereConditions, fmt.Sprintf("event_category = $%d", argCount))
args = append(args, filter.EventCategory)
argCount++
}
whereClause := ""
if len(whereConditions) > 0 {
@@ -120,7 +132,7 @@ func (r *AuditRepository) List(ctx context.Context, filter *repository.AuditFilt
// Get paginated results
offset := (filter.Page - 1) * filter.PerPage
query := fmt.Sprintf(`
SELECT id, actor, actor_type, action, resource_type, resource_id, details, timestamp
SELECT id, actor, actor_type, action, resource_type, resource_id, details, timestamp, event_category
FROM audit_events
%s
ORDER BY timestamp DESC
@@ -139,7 +151,7 @@ func (r *AuditRepository) List(ctx context.Context, filter *repository.AuditFilt
for rows.Next() {
var event domain.AuditEvent
if err := rows.Scan(&event.ID, &event.Actor, &event.ActorType, &event.Action,
&event.ResourceType, &event.ResourceID, &event.Details, &event.Timestamp); err != nil {
&event.ResourceType, &event.ResourceID, &event.Details, &event.Timestamp, &event.EventCategory); err != nil {
return nil, fmt.Errorf("failed to scan audit event: %w", err)
}
events = append(events, &event)
+167
View File
@@ -388,6 +388,61 @@ func (r *ActorRoleRepository) Revoke(ctx context.Context, actorID string, actorT
return nil
}
func (r *ActorRoleRepository) ListDistinctActors(ctx context.Context, tenantID string) ([]repository.ActorWithRoles, error) {
if tenantID == "" {
tenantID = authdomain.DefaultTenantID
}
rows, err := r.db.QueryContext(ctx, `
SELECT actor_id, actor_type,
array_agg(role_id ORDER BY role_id) AS role_ids
FROM actor_roles
WHERE tenant_id = $1
AND (expires_at IS NULL OR expires_at > NOW())
GROUP BY actor_id, actor_type
ORDER BY actor_id ASC
`, tenantID)
if err != nil {
return nil, fmt.Errorf("actorRole.listDistinctActors: %w", err)
}
defer rows.Close()
var out []repository.ActorWithRoles
for rows.Next() {
var a repository.ActorWithRoles
var actorType string
// pq.StringArray decodes the postgres array_agg result.
var roles pq.StringArray
if err := rows.Scan(&a.ActorID, &actorType, &roles); err != nil {
return nil, fmt.Errorf("actorRole.listDistinctActors scan: %w", err)
}
a.ActorType = authdomain.ActorTypeValue(actorType)
a.TenantID = tenantID
a.RoleIDs = []string(roles)
out = append(out, a)
}
return out, rows.Err()
}
func (r *ActorRoleRepository) AdminExists(ctx context.Context, tenantID string) (bool, error) {
if tenantID == "" {
tenantID = authdomain.DefaultTenantID
}
// Exclude the seeded synthetic demo actor so a demo deploy that
// later switches to api-key mode can still bootstrap the first
// real admin. Matches the carve-out documented on the interface.
var count int
err := r.db.QueryRowContext(ctx, `
SELECT COUNT(*) FROM actor_roles
WHERE role_id = $1
AND tenant_id = $2
AND actor_id != $3
AND (expires_at IS NULL OR expires_at > NOW())
`, authdomain.RoleIDAdmin, tenantID, authdomain.DemoAnonActorID).Scan(&count)
if err != nil {
return false, fmt.Errorf("actorRole.adminExists: %w", err)
}
return count > 0, nil
}
func (r *ActorRoleRepository) EffectivePermissions(ctx context.Context, actorID string, actorType authdomain.ActorTypeValue, tenantID string) ([]repository.EffectivePermission, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT DISTINCT p.name, rp.scope_type, rp.scope_id
@@ -440,3 +495,115 @@ func scanActorRoles(rows *sql.Rows) ([]*authdomain.ActorRole, error) {
}
return out, rows.Err()
}
// =============================================================================
// APIKeyRepository (Bundle 1 Phase 6 — bootstrap path)
// =============================================================================
// APIKeyRepository is the postgres implementation of
// repository.APIKeyRepository. Stores SHA-256 hashes only; the
// plaintext key value is never persisted.
type APIKeyRepository struct {
db *sql.DB
}
// NewAPIKeyRepository constructs an APIKeyRepository.
func NewAPIKeyRepository(db *sql.DB) *APIKeyRepository {
return &APIKeyRepository{db: db}
}
func (r *APIKeyRepository) Create(ctx context.Context, k *authdomain.APIKey) error {
if k.ID == "" {
k.ID = "ak-" + uuid.NewString()
}
if k.TenantID == "" {
k.TenantID = authdomain.DefaultTenantID
}
if k.CreatedAt.IsZero() {
k.CreatedAt = time.Now().UTC()
}
var expires interface{}
if k.ExpiresAt != nil {
expires = *k.ExpiresAt
}
_, err := r.db.ExecContext(ctx, `
INSERT INTO api_keys (id, name, key_hash, tenant_id, admin, created_by, created_at, expires_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
`, k.ID, k.Name, k.KeyHash, k.TenantID, k.Admin, k.CreatedBy, k.CreatedAt, expires)
if err != nil {
// Translate UNIQUE-constraint violations to the canonical
// auth sentinel so the service layer can return 409.
if pqErr, ok := err.(*pq.Error); ok && pqErr.Code == "23505" {
return repository.ErrAuthDuplicateName
}
return fmt.Errorf("apiKey.create: %w", err)
}
return nil
}
func (r *APIKeyRepository) GetByName(ctx context.Context, name string) (*authdomain.APIKey, error) {
row := r.db.QueryRowContext(ctx, `
SELECT id, name, key_hash, tenant_id, admin, created_by, created_at, expires_at, last_used_at
FROM api_keys WHERE name = $1
`, name)
var k authdomain.APIKey
var expires, lastUsed sql.NullTime
if err := row.Scan(&k.ID, &k.Name, &k.KeyHash, &k.TenantID, &k.Admin, &k.CreatedBy, &k.CreatedAt, &expires, &lastUsed); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, repository.ErrAuthNotFound
}
return nil, fmt.Errorf("apiKey.getByName: %w", err)
}
if expires.Valid {
t := expires.Time
k.ExpiresAt = &t
}
if lastUsed.Valid {
t := lastUsed.Time
k.LastUsedAt = &t
}
return &k, nil
}
func (r *APIKeyRepository) List(ctx context.Context, tenantID string) ([]*authdomain.APIKey, error) {
if tenantID == "" {
tenantID = authdomain.DefaultTenantID
}
rows, err := r.db.QueryContext(ctx, `
SELECT id, name, key_hash, tenant_id, admin, created_by, created_at, expires_at, last_used_at
FROM api_keys WHERE tenant_id = $1 ORDER BY created_at DESC
`, tenantID)
if err != nil {
return nil, fmt.Errorf("apiKey.list: %w", err)
}
defer rows.Close()
var out []*authdomain.APIKey
for rows.Next() {
var k authdomain.APIKey
var expires, lastUsed sql.NullTime
if err := rows.Scan(&k.ID, &k.Name, &k.KeyHash, &k.TenantID, &k.Admin, &k.CreatedBy, &k.CreatedAt, &expires, &lastUsed); err != nil {
return nil, fmt.Errorf("apiKey.list scan: %w", err)
}
if expires.Valid {
t := expires.Time
k.ExpiresAt = &t
}
if lastUsed.Valid {
t := lastUsed.Time
k.LastUsedAt = &t
}
out = append(out, &k)
}
return out, rows.Err()
}
func (r *APIKeyRepository) Delete(ctx context.Context, name string) error {
res, err := r.db.ExecContext(ctx, `DELETE FROM api_keys WHERE name = $1`, name)
if err != nil {
return fmt.Errorf("apiKey.delete: %w", err)
}
if n, _ := res.RowsAffected(); n == 0 {
return repository.ErrAuthNotFound
}
return nil
}