mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:01:31 +00:00
3ef45e2ad4
# Phase 6 — day-0 admin bootstrap * internal/auth/bootstrap/ (new package): Strategy interface + EnvTokenStrategy with constant-time compare, one-shot consumption via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC- first-admin will plug in alongside as an alternate Strategy. * BootstrapService.ValidateAndMint: validates the operator's CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API key value, persists the SHA-256 hash to api_keys, grants r-admin via actor_roles, AddHashed's the runtime keystore so the just- minted key authenticates the next request without restart, and records bootstrap.consume to the audit trail with category=auth. * internal/auth/keystore.go (new): KeyStore interface + StaticKeyStore (immutable env-var-only path) + MutableKeyStore (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth middleware now consumes a KeyStore so the bootstrap path can extend the lookup table at runtime. * migrations/000031_api_keys.up/down.sql: api_keys table with (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by, created_at, expires_at, last_used_at). Idempotent. * /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both routes documented in api/openapi.yaml + AuthExemptRouterRoutes allowlist updated. The token never leaves internal/auth/bootstrap; the minted plaintext key flows only into the HTTP response body. * Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND admin actors already exist (config drift signal). * Tests: 4 strategy invariants (empty token born disabled, wrong token=ErrInvalidToken without consumption, one-shot consumption, admin-exists closes path), 5 service tests (happy path + actor- name validation + propagation of strategy errors + nil-deps guard + 32-byte entropy budget), 8 HTTP-handler tests (status 201/410/401/400 mapping + token-leak hygiene scan of slog + audit details + Location header). Token-leak test redirects slog.Default to a buffer for the test scope. # Phase 7 — API-key migration + scope-down CLI * GET /v1/auth/keys handler + service method ListKeys backed by ActorRoleRepository.ListDistinctActors. Returns one row per (actor_id, actor_type) pair with the slice of role IDs they hold. Permission: auth.role.list. * internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown (interactive), AuthScopeDownNonInteractive (JSON config), AuthScopeDownSuggest (--suggest with optional --apply). The synthetic actor-demo-anon is filtered out of every interactive / bulk path; non-interactive flow logs and skips it explicitly. * SuggestRoleFromAuditEvents (pure function): walks 30 days of audit events per actor and returns the narrowest matching role (admin / mcp / viewer / agent / operator) plus a one-line reason. Classification: any admin-shaped action wins; otherwise all-MCP → mcp; all-read-only → viewer; all-agent-shaped → agent; otherwise operator. Test table pins all six classifications. * CLI subcommand tree extended: 'auth keys list' + 'auth keys scope-down [--non-interactive <cfg>] [--suggest [--apply]]'. * CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS call-out + four flow examples. # Phase 8 — auditor role + event_category column * migrations/000032_audit_category.up/down.sql: ALTER TABLE audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT 'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config) + (event_category) and (event_category, timestamp DESC) indexes for the auditor-filter query path. WORM trigger from migration 000018 continues to enforce append-only at the DB layer (DDL is not blocked). * domain.AuditEvent gains EventCategory string (omitempty); domain.EventCategoryCertLifecycle / Auth / Config constants. * AuditService.RecordEventWithCategory sibling of RecordEvent; legacy callers stay on RecordEvent (defaults to cert_lifecycle). Auth callers (RoleService, ActorRoleService, BootstrapService) switched to RecordEventWithCategory(..., 'auth', ...). * GET /v1/audit?category=<cat>: handler accepts the optional query param, validates against the enum (400 on invalid value), dispatches through ListAuditEventsByCategory. OpenAPI updated with the new query param + AuditEvent.event_category schema. * Postgres AuditRepository.Create now writes event_category; AuditRepository.List filters on it; AuditFilter.EventCategory gates the WHERE clause. * Tests: 5 audit-category-filter HTTP tests (dispatch routing, back-compat fallback, 400 for invalid values, all 3 enum values accepted, page+category combine, JSON output surfaces the field). 3 auditor-role invariants (auditor holds exactly audit.read+audit.export, no mutating perms, disjoint from viewer except audit.read). # Cross-phase wiring * HandlerRegistry.Bootstrap field added; cmd/server/main.go wires the bootstrap service ahead of RegisterHandlers (extracted assembleNamedAPIKeys helper into auth_backfill.go, moved the keystore + bootstrap construction up alongside the auth repos). * AuthCheckResolver / AuthActorRoleService extended with ListKeys to satisfy the Phase 7 surface; existing fakes updated. * fakeAudit + mockAuditService stubs in tests gain RecordEventWithCategory + ListAuditEventsByCategory; existing tests untouched. # Verifications * gofmt -l: clean across every modified file. * go vet ./...: clean. * staticcheck across internal/auth + handler + router + cli + service + repository + cmd + domain: clean. * go test -short -count=1: green across every Bundle-1-touched package — internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service/auth, internal/service, internal/domain/auth, internal/repository/postgres, cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware, cmd/agent, internal/mcp.
195 lines
7.3 KiB
Go
195 lines
7.3 KiB
Go
// Package bootstrap ships the day-0 admin-creation primitive for Bundle 1
|
|
// Phase 6. The control plane comes up with no admin-roled actors; the
|
|
// operator hands the env-var token to a single curl call; the server
|
|
// mints the first admin API key, returns the key value once, then locks
|
|
// the bootstrap door behind it.
|
|
//
|
|
// The Strategy interface is the forward-compat seam: Bundle 2 plugs in an
|
|
// OIDC-first-admin strategy (the operator logs in via OIDC, the server
|
|
// recognizes their group claim, the first such login auto-grants r-admin)
|
|
// alongside the env-var-token strategy this file ships. Both implementations
|
|
// satisfy the same interface; the boot path picks one based on which
|
|
// CERTCTL_BOOTSTRAP_* env var is set.
|
|
package bootstrap
|
|
|
|
import (
|
|
"context"
|
|
"crypto/subtle"
|
|
"errors"
|
|
"sync"
|
|
)
|
|
|
|
// Sentinel errors the HTTP handler maps to status codes.
|
|
var (
|
|
// ErrDisabled is returned when the bootstrap path is not callable
|
|
// either because (a) no token was set, or (b) admin actors already
|
|
// exist, or (c) the token was already consumed by an earlier call.
|
|
// Maps to HTTP 410 Gone.
|
|
ErrDisabled = errors.New("bootstrap: endpoint disabled")
|
|
|
|
// ErrInvalidToken is returned when the supplied token does not
|
|
// match the env-var token (constant-time compared). Maps to HTTP
|
|
// 401 Unauthorized. Deliberately does NOT distinguish between
|
|
// "wrong token" and "no token configured" so callers cannot use
|
|
// timing or status to probe the server's bootstrap state.
|
|
ErrInvalidToken = errors.New("bootstrap: invalid token")
|
|
|
|
// ErrInvalidActorName is returned when the requested admin-key
|
|
// name is empty or contains characters that would break audit
|
|
// attribution. Maps to HTTP 400.
|
|
ErrInvalidActorName = errors.New("bootstrap: invalid actor name")
|
|
)
|
|
|
|
// Strategy is the bundle 1 -> bundle 2 forward-compat seam. Each
|
|
// strategy gates the day-0 admin path with a different credential type:
|
|
// Bundle 1 ships EnvTokenStrategy (CERTCTL_BOOTSTRAP_TOKEN); Bundle 2
|
|
// adds OIDCFirstAdminStrategy (CERTCTL_BOOTSTRAP_OIDC_GROUP). The
|
|
// service holds whichever strategy was wired at boot.
|
|
type Strategy interface {
|
|
// Available reports whether the strategy is currently callable.
|
|
// Returns false once the strategy is consumed (one-shot semantics)
|
|
// OR once the strategy detects an existing admin (via the
|
|
// AdminExistenceProbe). The HTTP handler maps !Available to 410
|
|
// Gone before doing any token validation, so probing for "is there
|
|
// a bootstrap path open" is safe.
|
|
Available(ctx context.Context) (bool, error)
|
|
|
|
// Validate consumes the credential and returns nil when the caller
|
|
// is permitted to mint the first admin. The strategy MUST atomic-
|
|
// flip its consumed state on first successful Validate so a
|
|
// concurrent racing call gets ErrDisabled. Returning a non-nil
|
|
// error MUST NOT mark the strategy consumed; the operator can
|
|
// retry with the correct credential.
|
|
Validate(ctx context.Context, token string) error
|
|
}
|
|
|
|
// AdminExistenceProbe is the callback the EnvTokenStrategy uses to ask
|
|
// the actor-role repository whether any actor holds r-admin. Lives at
|
|
// this package boundary so the strategy doesn't import internal/repository
|
|
// (would create a cycle: bootstrap -> repository -> postgres -> bootstrap
|
|
// when the postgres adapter is wired).
|
|
type AdminExistenceProbe func(ctx context.Context) (bool, error)
|
|
|
|
// EnvTokenStrategy is the env-var-token Bundle 1 implementation. The
|
|
// operator sets CERTCTL_BOOTSTRAP_TOKEN, the server boots with this
|
|
// strategy, the first valid Validate call atomically flips the
|
|
// `consumed` flag and the next call returns ErrDisabled.
|
|
//
|
|
// The token comparison is crypto/subtle.ConstantTimeCompare so timing
|
|
// attacks can't leak the token byte-by-byte. The token itself never
|
|
// leaves this package: the strategy holds it in memory, the handler
|
|
// receives only error sentinels, the audit row records the event but
|
|
// not the token value.
|
|
type EnvTokenStrategy struct {
|
|
token string // set once at construction; never mutated
|
|
probe AdminExistenceProbe // optional; nil = skip the existence probe
|
|
mu sync.Mutex // guards consumed
|
|
consumed bool // flipped to true after first successful Validate
|
|
tokenLength int // cached for early-reject fast path
|
|
}
|
|
|
|
// NewEnvTokenStrategy constructs the env-var-token strategy. token must
|
|
// be the raw value of CERTCTL_BOOTSTRAP_TOKEN. probe is optional; when
|
|
// non-nil it gates Available + Validate on "no admin exists yet" so the
|
|
// caller can't bootstrap a second admin after the fleet has stabilized.
|
|
//
|
|
// When token is empty the returned strategy is born consumed —
|
|
// Available returns false, Validate returns ErrDisabled. This matches
|
|
// the boot-path contract that an unset env var disables the endpoint.
|
|
func NewEnvTokenStrategy(token string, probe AdminExistenceProbe) *EnvTokenStrategy {
|
|
s := &EnvTokenStrategy{
|
|
token: token,
|
|
probe: probe,
|
|
tokenLength: len(token),
|
|
}
|
|
if token == "" {
|
|
s.consumed = true
|
|
}
|
|
return s
|
|
}
|
|
|
|
// Available implements Strategy.
|
|
func (s *EnvTokenStrategy) Available(ctx context.Context) (bool, error) {
|
|
s.mu.Lock()
|
|
consumed := s.consumed
|
|
s.mu.Unlock()
|
|
if consumed {
|
|
return false, nil
|
|
}
|
|
if s.probe != nil {
|
|
exists, err := s.probe(ctx)
|
|
if err != nil {
|
|
return false, err
|
|
}
|
|
if exists {
|
|
return false, nil
|
|
}
|
|
}
|
|
return true, nil
|
|
}
|
|
|
|
// Validate implements Strategy.
|
|
func (s *EnvTokenStrategy) Validate(ctx context.Context, token string) error {
|
|
// Fast-path: if the strategy is disabled, return Disabled before
|
|
// doing any constant-time compare. The state flip below acquires
|
|
// the same mutex so this read is safe.
|
|
s.mu.Lock()
|
|
if s.consumed {
|
|
s.mu.Unlock()
|
|
return ErrDisabled
|
|
}
|
|
// Refuse zero-length tokens up front. ConstantTimeCompare returns
|
|
// 1 when both inputs are empty, which would otherwise produce a
|
|
// permanent backdoor on misconfigured deployments where token=""
|
|
// at construction; NewEnvTokenStrategy already covers that, but
|
|
// belt-and-braces here in case a future caller passes the strategy
|
|
// raw.
|
|
if s.tokenLength == 0 || len(token) == 0 {
|
|
s.mu.Unlock()
|
|
return ErrInvalidToken
|
|
}
|
|
// Constant-time compare. Length-pad implicit: ConstantTimeCompare
|
|
// returns 0 when lengths differ (and runs in constant time
|
|
// relative to the shorter length).
|
|
if subtle.ConstantTimeCompare([]byte(s.token), []byte(token)) != 1 {
|
|
s.mu.Unlock()
|
|
return ErrInvalidToken
|
|
}
|
|
// External probe: respect the "admin already exists" gate even
|
|
// after a valid token was supplied. This closes the race where a
|
|
// fleet first-admin lands during the gap between Available and
|
|
// Validate.
|
|
if s.probe != nil {
|
|
// Drop the lock for the probe — repo calls may be slow and
|
|
// holding the mutex through I/O would serialize every
|
|
// concurrent bootstrap attempt. Re-acquire after.
|
|
s.mu.Unlock()
|
|
exists, err := s.probe(ctx)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
if exists {
|
|
return ErrDisabled
|
|
}
|
|
s.mu.Lock()
|
|
// Re-check consumed because a concurrent caller might have
|
|
// flipped it while we were probing.
|
|
if s.consumed {
|
|
s.mu.Unlock()
|
|
return ErrDisabled
|
|
}
|
|
}
|
|
s.consumed = true
|
|
s.mu.Unlock()
|
|
return nil
|
|
}
|
|
|
|
// IsConsumed reports whether the strategy has already been used. Test
|
|
// helper; production callers should use Available which also runs the
|
|
// admin-existence probe.
|
|
func (s *EnvTokenStrategy) IsConsumed() bool {
|
|
s.mu.Lock()
|
|
defer s.mu.Unlock()
|
|
return s.consumed
|
|
}
|