Files
certctl/internal/auth/bootstrap/service.go
T
shankar0123 3ef45e2ad4 auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap

* internal/auth/bootstrap/ (new package): Strategy interface +
  EnvTokenStrategy with constant-time compare, one-shot consumption
  via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC-
  first-admin will plug in alongside as an alternate Strategy.
* BootstrapService.ValidateAndMint: validates the operator's
  CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API
  key value, persists the SHA-256 hash to api_keys, grants r-admin
  via actor_roles, AddHashed's the runtime keystore so the just-
  minted key authenticates the next request without restart, and
  records bootstrap.consume to the audit trail with category=auth.
* internal/auth/keystore.go (new): KeyStore interface +
  StaticKeyStore (immutable env-var-only path) + MutableKeyStore
  (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth
  middleware now consumes a KeyStore so the bootstrap path can
  extend the lookup table at runtime.
* migrations/000031_api_keys.up/down.sql: api_keys table with
  (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by,
  created_at, expires_at, last_used_at). Idempotent.
* /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both
  routes documented in api/openapi.yaml + AuthExemptRouterRoutes
  allowlist updated. The token never leaves internal/auth/bootstrap;
  the minted plaintext key flows only into the HTTP response body.
* Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND
  admin actors already exist (config drift signal).
* Tests: 4 strategy invariants (empty token born disabled, wrong
  token=ErrInvalidToken without consumption, one-shot consumption,
  admin-exists closes path), 5 service tests (happy path + actor-
  name validation + propagation of strategy errors + nil-deps
  guard + 32-byte entropy budget), 8 HTTP-handler tests (status
  201/410/401/400 mapping + token-leak hygiene scan of slog +
  audit details + Location header). Token-leak test redirects
  slog.Default to a buffer for the test scope.

# Phase 7 — API-key migration + scope-down CLI

* GET /v1/auth/keys handler + service method ListKeys backed by
  ActorRoleRepository.ListDistinctActors. Returns one row per
  (actor_id, actor_type) pair with the slice of role IDs they hold.
  Permission: auth.role.list.
* internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown
  (interactive), AuthScopeDownNonInteractive (JSON config),
  AuthScopeDownSuggest (--suggest with optional --apply). The
  synthetic actor-demo-anon is filtered out of every interactive /
  bulk path; non-interactive flow logs and skips it explicitly.
* SuggestRoleFromAuditEvents (pure function): walks 30 days of
  audit events per actor and returns the narrowest matching role
  (admin / mcp / viewer / agent / operator) plus a one-line reason.
  Classification: any admin-shaped action wins; otherwise all-MCP
  → mcp; all-read-only → viewer; all-agent-shaped → agent;
  otherwise operator. Test table pins all six classifications.
* CLI subcommand tree extended: 'auth keys list' + 'auth keys
  scope-down [--non-interactive <cfg>] [--suggest [--apply]]'.
* CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS
  call-out + four flow examples.

# Phase 8 — auditor role + event_category column

* migrations/000032_audit_category.up/down.sql: ALTER TABLE
  audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT
  'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config)
  + (event_category) and (event_category, timestamp DESC) indexes
  for the auditor-filter query path. WORM trigger from migration
  000018 continues to enforce append-only at the DB layer (DDL is
  not blocked).
* domain.AuditEvent gains EventCategory string (omitempty);
  domain.EventCategoryCertLifecycle / Auth / Config constants.
* AuditService.RecordEventWithCategory sibling of RecordEvent;
  legacy callers stay on RecordEvent (defaults to cert_lifecycle).
  Auth callers (RoleService, ActorRoleService, BootstrapService)
  switched to RecordEventWithCategory(..., 'auth', ...).
* GET /v1/audit?category=<cat>: handler accepts the optional query
  param, validates against the enum (400 on invalid value),
  dispatches through ListAuditEventsByCategory. OpenAPI updated
  with the new query param + AuditEvent.event_category schema.
* Postgres AuditRepository.Create now writes event_category;
  AuditRepository.List filters on it; AuditFilter.EventCategory
  gates the WHERE clause.
* Tests: 5 audit-category-filter HTTP tests (dispatch routing,
  back-compat fallback, 400 for invalid values, all 3 enum values
  accepted, page+category combine, JSON output surfaces the
  field). 3 auditor-role invariants (auditor holds exactly
  audit.read+audit.export, no mutating perms, disjoint from
  viewer except audit.read).

# Cross-phase wiring

* HandlerRegistry.Bootstrap field added; cmd/server/main.go wires
  the bootstrap service ahead of RegisterHandlers (extracted
  assembleNamedAPIKeys helper into auth_backfill.go, moved the
  keystore + bootstrap construction up alongside the auth repos).
* AuthCheckResolver / AuthActorRoleService extended with ListKeys
  to satisfy the Phase 7 surface; existing fakes updated.
* fakeAudit + mockAuditService stubs in tests gain
  RecordEventWithCategory + ListAuditEventsByCategory; existing
  tests untouched.

# Verifications

* gofmt -l: clean across every modified file.
* go vet ./...: clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* go test -short -count=1: green across every Bundle-1-touched
  package — internal/auth (incl. bootstrap), internal/api/handler,
  internal/api/router, internal/cli, internal/service/auth,
  internal/service, internal/domain/auth, internal/repository/postgres,
  cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware,
  cmd/agent, internal/mcp.
2026-05-09 20:15:43 +00:00

205 lines
7.9 KiB
Go

package bootstrap
import (
"context"
"crypto/rand"
"encoding/hex"
"fmt"
"regexp"
"time"
"github.com/certctl-io/certctl/internal/domain"
authdomain "github.com/certctl-io/certctl/internal/domain/auth"
)
// actorNameRe matches the operator-supplied admin-key name. Constraints:
// 3-64 chars, lowercase alphanumeric + hyphen + underscore. Strict
// charset prevents audit-attribution shenanigans (control characters,
// log-injection sequences, mixed-case look-alikes for an existing
// admin actor's name).
var actorNameRe = regexp.MustCompile(`^[a-z0-9][a-z0-9_-]{2,63}$`)
// APIKeyMinter is the slice of APIKeyRepository the bootstrap service
// needs. Pulled out as a small interface so the service can be unit-
// tested with an in-memory fake.
type APIKeyMinter interface {
Create(ctx context.Context, key *authdomain.APIKey) error
GetByName(ctx context.Context, name string) (*authdomain.APIKey, error)
}
// RoleGranter is the slice of ActorRoleRepository the bootstrap
// service needs.
type RoleGranter interface {
Grant(ctx context.Context, ar *authdomain.ActorRole) error
}
// AuditRecorder is the slice of AuditService the bootstrap service
// needs. Phase 8 ships RecordEventWithCategory which classifies the
// row's event_category column directly; the bootstrap path always
// emits with category=auth.
type AuditRecorder interface {
RecordEventWithCategory(ctx context.Context, actor string, actorType domain.ActorType, action, eventCategory, resourceType, resourceID string, details map[string]interface{}) error
}
// KeyStoreAdder is the runtime hook the bootstrap service uses to
// register the just-minted key with the auth middleware so the next
// request authenticates without a process restart. The HTTP-layer
// auth middleware exposes this via internal/auth.MutableKeyStore.
type KeyStoreAdder interface {
AddHashed(name, hashHex string, admin bool)
}
// Service ties the bootstrap Strategy to the persistence layer. Kept
// separate from the HTTP handler so unit tests can drive it without
// httptest, and so the same service can back a future
// `certctl auth bootstrap` CLI command.
type Service struct {
strategy Strategy
keys APIKeyMinter
roles RoleGranter
audit AuditRecorder
keyStore KeyStoreAdder
hashAPIKey func(string) string // injected so the auth package's HashAPIKey doesn't import this package
}
// NewService constructs a bootstrap Service.
//
// hashAPIKey takes the plaintext key and returns the SHA-256 hex used
// by the auth middleware's keystore lookup. Pass internal/auth.HashAPIKey
// at the production wire site; tests can pass a deterministic hash for
// matching against MutableKeyStore lookups.
//
// keyStore is optional. Production wires the same MutableKeyStore the
// auth middleware reads from so the minted key authenticates the next
// request; when nil the bootstrap still persists the key to the DB
// but the operator must restart to pick it up via the boot loader.
func NewService(strategy Strategy, keys APIKeyMinter, roles RoleGranter, audit AuditRecorder, keyStore KeyStoreAdder, hashAPIKey func(string) string) *Service {
return &Service{
strategy: strategy,
keys: keys,
roles: roles,
audit: audit,
keyStore: keyStore,
hashAPIKey: hashAPIKey,
}
}
// MintResult is the success payload returned to the HTTP handler. Key
// is the plaintext value the operator must capture before the response
// is dropped — the server holds it for ~milliseconds and never logs it.
type MintResult struct {
APIKey *authdomain.APIKey
KeyValue string
}
// Available reports whether the bootstrap endpoint is currently
// callable. Returns the strategy's verdict plus a sentinel
// (ErrDisabled) when not. The HTTP handler maps the sentinel to 410
// Gone before reading any token from the request body so a probing
// attacker can't distinguish "no token configured" from "wrong
// token".
func (s *Service) Available(ctx context.Context) (bool, error) {
if s == nil || s.strategy == nil {
return false, ErrDisabled
}
return s.strategy.Available(ctx)
}
// ValidateAndMint consumes the strategy's credential and persists the
// first admin API key. The response carries the plaintext key value
// once; the operator MUST capture it before the response goes out the
// wire. Subsequent calls return ErrDisabled (one-shot semantics).
//
// Side effects:
// 1. Strategy.Validate atomically flips its consumed state.
// 2. A new row is written to api_keys (id, name, sha256(key), admin=true).
// 3. A new row is written to actor_roles (actor=name, role=r-admin).
// 4. The MutableKeyStore (if wired) gains a runtime entry so the next
// request authenticates without a restart.
// 5. An audit event records the bootstrap consumption with
// event_category=auth, action=bootstrap.consume.
//
// The plaintext key is NEVER logged. It exists in three places:
// - the random buffer this function generates,
// - the MintResult.KeyValue field (the handler writes it to the
// response then discards),
// - the HTTP response body itself.
//
// If the persistence calls fail AFTER the strategy is consumed, the
// service does NOT roll back the strategy state — by design. A failed
// ValidateAndMint call leaves bootstrap closed; the operator must
// recover via DB seeding (insert into actor_roles directly) rather
// than retry. The alternative (retry) opens a window for a successful
// validate-then-fail sequence to mint two admin keys on retry, which
// silently widens the trust radius.
func (s *Service) ValidateAndMint(ctx context.Context, token, actorName string) (*MintResult, error) {
if s == nil || s.strategy == nil || s.keys == nil || s.roles == nil {
return nil, ErrDisabled
}
if !actorNameRe.MatchString(actorName) {
return nil, ErrInvalidActorName
}
if err := s.strategy.Validate(ctx, token); err != nil {
return nil, err
}
// Strategy is now consumed; if anything below fails the operator
// has to recover via DB. See the docstring on MintFirstAdmin.
keyValue, err := generateAPIKey()
if err != nil {
return nil, fmt.Errorf("bootstrap: random key generation: %w", err)
}
keyHash := s.hashAPIKey(keyValue)
now := time.Now().UTC()
apiKey := &authdomain.APIKey{
Name: actorName,
KeyHash: keyHash,
TenantID: authdomain.DefaultTenantID,
Admin: true,
CreatedBy: "bootstrap",
CreatedAt: now,
}
if err := s.keys.Create(ctx, apiKey); err != nil {
return nil, fmt.Errorf("bootstrap: persist key: %w", err)
}
if err := s.roles.Grant(ctx, &authdomain.ActorRole{
ActorID: actorName,
ActorType: authdomain.ActorTypeValue(domain.ActorTypeAPIKey),
RoleID: authdomain.RoleIDAdmin,
TenantID: authdomain.DefaultTenantID,
GrantedBy: "bootstrap",
}); err != nil {
return nil, fmt.Errorf("bootstrap: grant admin role: %w", err)
}
if s.keyStore != nil {
s.keyStore.AddHashed(actorName, keyHash, true)
}
if s.audit != nil {
// Phase 8 promotes event_category to a first-class column.
// Bootstrap is unambiguously an auth event. Errors from the
// audit write are intentionally ignored: the bootstrap mint
// succeeded and the consequent audit-row miss is preferable
// to surfacing a 500 to the operator after the admin-key
// already landed in the DB. The audit-row gap is detectable
// in monitoring (every successful mint should have a paired
// bootstrap.consume row).
_ = s.audit.RecordEventWithCategory(ctx, "bootstrap-token", domain.ActorTypeSystem,
"bootstrap.consume", domain.EventCategoryAuth, "api_key", apiKey.ID,
map[string]interface{}{
"actor_name": actorName,
"role_id": authdomain.RoleIDAdmin,
})
}
return &MintResult{APIKey: apiKey, KeyValue: keyValue}, nil
}
// generateAPIKey returns 32 random bytes hex-encoded (64-char output).
// Same entropy budget as `openssl rand -hex 32` which the agent
// bootstrap docs recommend.
func generateAPIKey() (string, error) {
buf := make([]byte, 32)
if _, err := rand.Read(buf); err != nil {
return "", err
}
return hex.EncodeToString(buf), nil
}