From e7a94b6080a347dd2faaee865dc5de348c30ebce Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Sun, 10 May 2026 00:10:15 +0000 Subject: [PATCH] auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the last Phase before the Bundle 1 Exit gate. Operators now have authoritative reference + threat model + migration guide covering every behavior change Bundles 0-12 introduced. # New docs * docs/operator/rbac.md (340 lines) — operator how-to: - Mental model (actors / roles / permissions / scopes) - 7 default roles seeded by migration 000029 + the 5 admin-only fine-grained perms seeded by 000030 - Permission catalogue table by namespace - Scope semantics (global beats specific) + the Bundle-2 deferral on scope_id FK enforcement - Granting / revoking access from GUI + CLI + HTTP API + MCP - The auditor pattern (audit-only, no resource read) - Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl → HTTP 410 thereafter) - Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production * docs/operator/auth-threat-model.md (180 lines) — what the controls defend against: - 5 threat actors (external, wrong-role, compromised key, insider operator, compromised auditor) - Per-defense walk-through (API-key auth, RBAC, bootstrap, approval workflow + Phase 9 closure, audit trail, protocol-endpoint allowlist) - 9 explicit deferrals (OIDC, sessions, local accounts, JIT elevation, MFA, etc.) — Bundle 2 / future scope - Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b), NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10) - 5 operator-runnable sanity checks (e.g., 'SELECT FROM audit_events WHERE actor=system-bypass' MUST return 0 in production) * docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x → v2.1.0 upgrade flow: - The SECURITY: AUDIT YOUR API KEYS callout - Migration list (000029-000033) + what each does - 4-mode scope-down flow (interactive / non-interactive JSON / --suggest / --suggest --apply) - What changes for code that called auth.IsAdmin - Helm-specific upgrade flow with example post-upgrade Job - Docker Compose upgrade flow + the 5 examples folders that ride demo mode unchanged - Verification queries + rollback flow # Updated docs * docs/operator/security.md — Last-reviewed bumped to 2026-05-09; existing Authentication-surface section extended to call out the Bundle 1 RBAC primitive, day-0 bootstrap path, and approval-bypass closure with cross-references to the new docs. * docs/reference/profiles.md — Last-reviewed header formatting fixed (added the > blockquote prefix used consistently across the docs tree). # docs/README.md navigation * Operator section gains 2 new rows (RBAC + auth-threat-model) and Approval-workflow row updated to mention Phase 9 closure. * Reference section gains the Profiles row. * Migration section gains the api-keys-to-rbac row with the AUDIT YOUR API KEYS callout in the link description. # CHANGELOG.md v2.1.0 section refreshed The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS callout. This commit appends the missing Phase 9-12 highlights: - Approval-bypass closure (profile-edit gate + flip-flop loophole + ErrApproveBySameActor invariant) - GUI: Roles / API Keys / Auth Settings / Approvals queue - 12 new MCP RBAC tools - Coverage gates on internal/auth + internal/service/auth - Protocol-endpoint allowlist pinned at 3 layers Trailing cross-reference block now points at all 4 new docs. # Verifications * Every internal link in the 4 new/modified docs validated by shell sweep (find broken links → 0 hits). * Every new doc carries 'Last reviewed: 2026-05-09' header with the > blockquote prefix matching the docs-tree convention. * go vet ./... clean. * staticcheck across every Bundle-1-touched Go package clean. * gofmt -l clean repo-wide. * go test -short -count=1 green across internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service (incl. auth), internal/domain/auth, internal/mcp, cmd/cli (cmd/server has 1 environmental failure on the sandbox virtiofs-tmp: TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends on tmpfs file-mode semantics that virtiofs propagates differently — pre-existing, unrelated to Bundle 1). * Frontend: 19 Vitest tests across src/pages/auth/ + AuditPage all pass; tsc --noEmit clean. --- CHANGELOG.md | 37 +++- docs/README.md | 8 +- docs/migration/api-keys-to-rbac.md | 296 +++++++++++++++++++++++++++++ docs/operator/auth-threat-model.md | 244 ++++++++++++++++++++++++ docs/operator/rbac.md | 280 +++++++++++++++++++++++++++ docs/operator/security.md | 55 +++++- docs/reference/profiles.md | 2 +- 7 files changed, 913 insertions(+), 9 deletions(-) create mode 100644 docs/migration/api-keys-to-rbac.md create mode 100644 docs/operator/auth-threat-model.md create mode 100644 docs/operator/rbac.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 9fae437..96f711d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -54,13 +54,48 @@ What else changed in v2.1.0: - **`/v1/auth/check` enrichment.** Response now includes the actor's standing roles and effective permissions, so the GUI gates affordances from a single fetch on app boot. +- **Approval-bypass closure.** Edits to a profile that has (or + would have) `RequiresApproval=true` now route through the + `ApprovalService` two-person integrity gate (Phase 9). Migration + 000033 adds `approval_kind` + `payload` to + `issuance_approval_requests` so cert-issuance and profile-edit + approvals share the same workflow. Same-actor self-approve is + rejected with `ErrApproveBySameActor` for both kinds. Closes the + flip-flop loophole where an admin could disable approval, mutate, + re-enable. Documented at + [`docs/reference/profiles.md`](docs/reference/profiles.md). +- **GUI: Roles / API Keys / Auth Settings / Approvals queue.** + Four new pages under `/auth/*` consume `/v1/auth/me` for + permission-aware rendering. The Approvals queue blocks + self-approve at the client layer (Approve/Reject buttons hidden + when requested_by == current actor_id) on top of the server-side + enforcement. AuditPage gains a category filter (cert_lifecycle / + auth / config) for the auditor view. +- **MCP server gains 12 RBAC tools.** Operators driving certctl + from Claude / VS Code / any MCP client get parity with the GUI + + CLI. Each tool routes through the same HTTP handler; permission + gates fire server-side. - **OpenAPI catalogues every new route.** Every Bundle 1 endpoint ships with an `operationId`; the parity test guards against drift. +- **Coverage gates.** `internal/auth/` and `internal/service/auth/` + now have ≥85% coverage floors in `.github/coverage-thresholds.yml`. + The 12-path negative-test list from the Bundle 1 prompt is + fully covered (path #12 deferred with in-tree TODO). +- **Protocol-endpoint allowlist pinned at three layers.** The + middleware bypass (`auth.IsProtocolEndpoint`), the router-level + `AuthExemptRouterRoutes` constant, and a new + `phase12_protocol_allowlist_test.go` AST scan all guard against + accidentally wrapping ACME / SCEP / EST / OCSP / CRL routes in + `rbacGate`. - **Bundle 2 (OIDC + sessions) starts after Bundle 1 lands on master.** Roadmap entry remains in `cowork/auth-bundle-2-prompt.md`. Migration ordering, idempotency, and downgrade are documented in -`docs/migration/api-keys-to-rbac.md`. +[`docs/migration/api-keys-to-rbac.md`](docs/migration/api-keys-to-rbac.md). +The threat model + compliance mapping live at +[`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md). +Day-2 RBAC operations live at +[`docs/operator/rbac.md`](docs/operator/rbac.md). ## v2.0.68 — Image registry path changed ⚠️ diff --git a/docs/README.md b/docs/README.md index 6fd3ade..891ad2d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -27,6 +27,7 @@ You're operating certctl in production or building integrations and need authori | Doc | What it covers | |---|---| | [Architecture](reference/architecture.md) | System design, data flow, security model, deployment topologies | +| [Profiles](reference/profiles.md) | CertificateProfile policy object — issuer wiring, EKUs, RequiresApproval gate (Phase 9 closure) | | [API](reference/api.md) | OpenAPI 3.1 spec, integration patterns, client SDK generation | | [CLI](reference/cli.md) | certctl-cli command reference and CI/CD integration patterns | | [Configuration](reference/configuration.md) | `CERTCTL_*` environment variable reference (scheduler, rate limits, deploy verify, audit, agent) | @@ -62,10 +63,12 @@ You're running certctl in production and need operational guidance. | Doc | What it covers | |---|---| -| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation | +| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation, RBAC primitive (Bundle 1), bootstrap | +| [RBAC operator reference](operator/rbac.md) | Roles, permissions, scopes, scope-down + bootstrap flow (Bundle 1) | +| [Auth threat model](operator/auth-threat-model.md) | API-key compromise, role-grant abuse, bootstrap-token leak, audit-mutation, compliance mapping (Bundle 1) | | [Control plane TLS](operator/tls.md) | Self-signed bootstrap, operator-supplied Secret, cert-manager Certificate CR | | [Database TLS](operator/database-tls.md) | PostgreSQL transport encryption | -| [Approval workflow](operator/approval-workflow.md) | Two-person integrity gate for high-stakes issuance | +| [Approval workflow](operator/approval-workflow.md) | Two-person integrity gate for high-stakes issuance + Phase 9 profile-edit closure | | [Helm deployment](operator/helm-deployment.md) | Kubernetes installation via the bundled chart | | [Performance baselines](operator/performance-baselines.md) | Operator-runnable benchmarks for regression spot checks | | [Legacy clients (TLS 1.2)](operator/legacy-clients-tls-1.2.md) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 | @@ -90,6 +93,7 @@ You're moving from another cert-management tool to certctl, or running both in p | Caddy ACME (point Caddy at certctl) | [migration/acme-from-caddy.md](migration/acme-from-caddy.md) | | cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) | | Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) | +| **API keys → RBAC (v2.0.x → v2.1.0)** | [migration/api-keys-to-rbac.md](migration/api-keys-to-rbac.md) — **AUDIT YOUR API KEYS** post-upgrade | ## Contributor diff --git a/docs/migration/api-keys-to-rbac.md b/docs/migration/api-keys-to-rbac.md new file mode 100644 index 0000000..122671b --- /dev/null +++ b/docs/migration/api-keys-to-rbac.md @@ -0,0 +1,296 @@ +# Migrating API keys to RBAC (v2.0.x → v2.1.0) + +> Last reviewed: 2026-05-09 + +This is the upgrade guide for an existing certctl deployment moving +from v2.0.x's "every API key is admin or not" model to v2.1.0's +RBAC primitive. Everything keeps working through the upgrade — the +Bundle 1 migration backfills every existing API key to the +`r-admin` role on first boot, so the pre-existing automation that +was using those keys does not change behavior. **However**, most +keys do not need full admin power; this guide walks the operator +through the post-upgrade scope-down flow. + +## ⚠️ SECURITY: AUDIT YOUR API KEYS + +Bundle 1 maps **every** existing `CERTCTL_API_KEYS_NAMED` entry +(and every legacy `CERTCTL_AUTH_SECRET`-synthesized key) to the +`r-admin` role on the first boot after migration 000029 applies. +This is the safe-for-back-compat default — your CI / agents / scripts +keep working without changes — but if you don't downgrade keys, every +key in your fleet has full admin permissions including bulk-revoke, +CRL admin, and CA hierarchy management. + +**Run the scope-down flow before tagging the next release.** The +release notes for v2.1.0 lead with this callout for a reason. + +## Upgrade flow + +### 1. Apply the migration + +The migration runner is idempotent. Re-applying is a no-op if the +schema is already at the target version. The Bundle 1 migrations +that ship with v2.1.0: + +| Migration | What it does | +|---|---| +| `000029_rbac.up.sql` | Creates `tenants`, `roles`, `permissions`, `role_permissions`, `actor_roles`. Seeds 7 default roles + 33-permission catalogue + the synthetic `actor-demo-anon` admin grant. Backfills every named API key into `actor_roles` with the `r-admin` role. | +| `000030_rbac_admin_perms.up.sql` | Seeds 5 admin-only fine-grained permissions (`cert.bulk_revoke`, `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage`) into `r-admin` only. | +| `000031_api_keys.up.sql` | Creates the `api_keys` table for runtime-minted keys (Bundle 1 Phase 6 bootstrap). | +| `000032_audit_category.up.sql` | Adds `event_category` column to `audit_events` with the closed enum (`cert_lifecycle` / `auth` / `config`). | +| `000033_approval_kinds.up.sql` | Adds `approval_kind` + `payload` to `issuance_approval_requests` for the Phase 9 approval-bypass closure. | + +The Bundle 1 server applies these on first boot. No operator +action is required other than running the upgrade. + +### 2. Verify the backfill landed + +```bash +# Inspect the seeded actor_roles rows. You should see one row per +# entry in CERTCTL_API_KEYS_NAMED (Admin=true keys → r-admin, +# Admin=false keys → r-viewer) plus the seeded actor-demo-anon +# admin row. +psql -d certctl -c "SELECT actor_id, role_id, granted_by, granted_at FROM actor_roles ORDER BY granted_at;" +``` + +If the table is empty, the boot-loader hook in +`cmd/server/auth_backfill.go::backfillNamedKeyActorRoles` did not +run; re-check that `CERTCTL_AUTH_TYPE` is `api-key` (the boot +hook is gated on `cfg.Auth.Type != none`). + +### 3. List + scope-down keys + +The `certctl-cli` ships a four-mode scope-down command. Pick the +mode that matches your fleet size + automation posture. + +#### Interactive walk + +```bash +certctl-cli auth keys scope-down +``` + +Walks every actor (skips the synthetic `actor-demo-anon`) and +prompts for a target role. Empty input keeps the existing role. +Type one of `admin`, `operator`, `viewer`, `agent`, `mcp`, `cli`, +`auditor` to replace. + +#### Non-interactive JSON config (Helm post-upgrade hook) + +```bash +cat > scope-down.json <", handler)` + in `router.go`. +2. Add the perm to `migrations/000030_rbac_admin_perms.up.sql` + (or `migrations/000029_rbac.up.sql`'s catalogue). +3. Grant the perm to the right default roles. + +The five admin-only fine-grained perms shipped in Phase 3.5 stay +on `r-admin` only by default. Operators delegate by creating +custom roles with the specific perm. + +## Helm-specific upgrade + +The certctl Helm chart applies migrations on container start via +the standard migrations runner. No chart changes are required; +the `helm upgrade` command runs identically: + +```bash +helm upgrade certctl certctl/certctl \ + --version \ + --reuse-values +``` + +Post-upgrade, the boot loader runs the named-key actor-role +backfill against the `CERTCTL_API_KEYS_NAMED` env-var-injected +into the deployment. The "AUDIT YOUR API KEYS" callout applies — +add a post-upgrade Job to your release pipeline that runs +`certctl-cli auth keys scope-down --non-interactive` against a +checked-in JSON config, so the role narrowing is deterministic +across upgrade rollouts. + +Example post-upgrade Job: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: certctl-scope-down +spec: + template: + spec: + containers: + - name: scope-down + image: ghcr.io/certctl-io/certctl-cli: + command: + - certctl-cli + - auth + - keys + - scope-down + - --non-interactive + - /config/scope-down.json + envFrom: + - secretRef: + name: certctl-cli-credentials + volumeMounts: + - name: scope-down-config + mountPath: /config + volumes: + - name: scope-down-config + configMap: + name: certctl-scope-down-config + restartPolicy: OnFailure +``` + +The ConfigMap holds the `{actor_id: role_id}` map; the Secret +holds the API key the Job uses to call `/v1/auth/keys/.../roles`. + +## Docker Compose-specific upgrade + +For `deploy/docker-compose.yml` deployments: + +1. Pull the new images: `docker compose pull` +2. Verify your `CERTCTL_AUTH_TYPE` value before restarting. If it + was `none` (the demo path), the post-upgrade server will boot + in demo mode again — the synthetic `actor-demo-anon` admin + covers every request, no scope-down is meaningful. If you're + moving from `none` to `api-key` mode, set + `CERTCTL_API_KEYS_NAMED` first, then restart. +3. `docker compose up -d` to apply. +4. `docker compose logs certctl-server | grep -i 'loaded persisted api_keys'` + to verify the boot loader ran. The first-boot log line includes + the count of keys loaded into the runtime keystore. +5. Run `certctl-cli auth keys scope-down` against the running + server. + +The five examples in `examples/` (acme-nginx, private-ca-traefik, +step-ca-haproxy, multi-issuer, acme-wildcard-dns01) all run in +demo mode (`CERTCTL_AUTH_TYPE=none`) and are unaffected by the +RBAC migration — the synthetic actor-demo-anon admin grant covers +every request. + +## Verifying the upgrade landed + +After the scope-down flow completes: + +1. `certctl-cli auth me` while authenticated as each named key + confirms the right `effective_permissions` for that role. +2. `psql -c "SELECT actor_id, array_agg(role_id ORDER BY role_id) FROM actor_roles GROUP BY actor_id;"` + gives the full picture in one query. +3. The audit trail + (`GET /api/v1/audit?category=auth`) + shows the `auth.role.assign` and `auth.role.revoke` rows for + every change you made — confirm via the GUI's + `/audit?category=auth` view. +4. Read the updated [`docs/operator/rbac.md`](../operator/rbac.md) + for day-2 RBAC management. + +## Rollback + +If the upgrade goes wrong, the down migrations exist in lockstep: + +```bash +# Roll back via your migration runner (golang-migrate, Atlas, etc.). +# Migrations 000029-000033 each have a .down.sql that reverses the +# .up.sql. Down migrations are destructive on data added by the up +# migration (api_keys rows, role grants on actors, profile-edit +# approvals); take a backup first. +``` + +After rollback, the v2.0.x binary works against the v2.0.x +schema unchanged. The operator's API keys still authenticate (the +in-memory hash table is rebuilt from `CERTCTL_API_KEYS_NAMED` on +boot regardless of schema version). + +## Cross-references + +- [`docs/operator/rbac.md`](../operator/rbac.md) — the operator + how-to for the new RBAC primitive +- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) — + what the new controls defend against +- [`docs/reference/profiles.md`](../reference/profiles.md) — the + Phase 9 approval-bypass closure +- [`docs/operator/security.md`](../operator/security.md) — the + full security posture +- `cowork/auth-bundle-1-prompt.md` — the design + phase plan +- `cowork/auth-bundles-index.md` — the per-phase status tracker +- `CHANGELOG.md` — the v2.1.0 release notes lead with this guide diff --git a/docs/operator/auth-threat-model.md b/docs/operator/auth-threat-model.md new file mode 100644 index 0000000..3425703 --- /dev/null +++ b/docs/operator/auth-threat-model.md @@ -0,0 +1,244 @@ +# Authentication & authorization threat model + +> Last reviewed: 2026-05-09 + +This document describes the attack surface around authentication and +authorization in certctl after Bundle 1 (the RBAC primitive) lands. +It complements [`rbac.md`](rbac.md) — that doc explains how to use +the controls; this one explains what those controls defend against +and which threats they explicitly do NOT close. + +For Bundle 2's OIDC + sessions extensions, this document will be +updated. The Bundle 1 boundary is "API-key auth + RBAC primitive + +day-0 bootstrap"; OIDC-federated humans, session cookies, +revocation lists, WebAuthn, and break-glass local accounts are +Bundle 2 scope. + +## Threat actors + +1. **External attacker with no credential** — probing the public + HTTP surface. The default trust boundary for everything except + the protocol-level endpoints (ACME / SCEP / EST / OCSP / CRL, + which authenticate via embedded credentials per their own RFCs). +2. **Authenticated caller with the wrong role** — has a valid API + key but the role doesn't grant the requested operation. The + primary RBAC threat model. +3. **Compromised API key** — attacker holds a valid Bearer token + that an honest operator originally provisioned. The key may + carry any role. +4. **Insider operator** — legitimate access; potentially trying + to escalate privilege or bypass the approval workflow. +5. **Compromised audit reviewer (auditor role)** — read-only + access to audit events but otherwise untrusted. + +## Defenses Bundle 1 ships + +### API-key authentication + +- API keys live in `CERTCTL_API_KEYS_NAMED` (env-var) or + `api_keys` (DB row, written by Bundle 1 Phase 6 bootstrap and + the future role-management API). Keys hash via SHA-256; the + middleware compares hashes via `crypto/subtle.ConstantTimeCompare` + to defeat timing attacks. +- The auth middleware populates `ActorIDKey` / `ActorTypeKey` / + `TenantIDKey` on every authenticated request context. Audit rows + attribute every action to the named-key actor instead of the + pre-Bundle-1 hardcoded `api-key-user` placeholder. +- Demo mode (`CERTCTL_AUTH_TYPE=none`) injects the synthetic + `actor-demo-anon` actor with admin grants. Production deploys + MUST NOT use demo mode. + +### Authorization (RBAC) + +- Every gated handler routes through `auth.RequirePermission` (or + the router-level `rbacGate` wrap from Phase 3.5). The middleware + resolves the actor's effective permissions via the + `Authorizer.CheckPermission` service-layer call; on miss, the + handler returns HTTP 403 BEFORE the body runs. This is the + load-bearing gate. +- The five admin-only fine-grained perms (`cert.bulk_revoke` / + `crl.admin` / `scep.admin` / `est.admin` / + `ca.hierarchy.manage`) are seeded into `r-admin` only. To + delegate one, an operator creates a custom role with the + specific perm and grants it to the right actor. +- The auditor split: `r-auditor` holds only `audit.read` + + `audit.export`. Pinned by the + `internal/domain/auth/auditor_test.go` invariants. A regulator + with the auditor key cannot read certificates, profiles, + issuers, or any mutating surface. +- The privilege-escalation guard: granting or revoking a role + requires the caller to hold `auth.role.assign` (enforced in + `internal/service/auth/actor_role_service.go`). A non-admin + cannot self-grant admin. +- The reserved-actor guard: mutations against `actor-demo-anon` + return HTTP 409 from the service layer + (`ErrAuthReservedActor`). The synthetic actor is operator- + inaccessible. + +### Day-0 bootstrap + +- `CERTCTL_BOOTSTRAP_TOKEN` is constant-time-compared by + `EnvTokenStrategy.Validate`. The strategy is one-shot via + `sync.Mutex`-guarded `consumed` bool; the second call returns + `ErrDisabled` (HTTP 410), not `ErrInvalidToken` (HTTP 401), so + a probing attacker cannot distinguish "wrong token, retry" + from "already consumed". +- The strategy also re-probes admin existence on every Validate. + If an admin actor lands during the gap between Available and + Validate, the second caller still gets HTTP 410. +- The minted plaintext key is written to the response body once. + It is NEVER logged. The token-leak hygiene test in + `internal/api/handler/auth_bootstrap_test.go` redirects + `slog.Default` to a buffer and grep-asserts that neither the + bootstrap token nor the minted key appears in any log line, + audit row, or HTTP header. +- The minted key is hashed before persistence. Lost key → + rotate via the regular RBAC API; the plaintext is not + recoverable from the DB. + +### Approval workflow + Phase 9 loophole closure + +- `CertificateProfile.RequiresApproval=true` gates two surfaces: + (a) issuance + renewal of every cert pointing at the profile, + (b) edits to the profile itself (Bundle 1 Phase 9). The Phase 9 + closure prevents the flip-flop bypass where an admin disables + approval, mutates, re-enables. +- Same-actor self-approve is rejected at the service layer with + `ErrApproveBySameActor` for both `cert_issuance` and + `profile_edit` kinds. Two-person integrity is the load-bearing + invariant; pinned by tests in + `internal/service/approval_test.go`. + +### Audit trail + +- Every mutating operation flows through `AuditService.RecordEvent` + or `RecordEventWithCategory`. Bundle 1 Phase 8 added the + `event_category` column with a `CHECK` constraint enforcing + the closed enum (`cert_lifecycle` / `auth` / `config`); the + category surfaces the auth-mutation slice to the auditor view. +- The WORM trigger from migration 000018 + (`audit_events_worm_trigger`) blocks `UPDATE` and `DELETE` at + the database layer. Even an admin DB user cannot tamper with + audit history without dropping the trigger. +- Bundle-6's redactor (`internal/service/audit_redact.go`) + scrubs credentials + PII from the `details` JSONB before + persistence; an `_redacted_keys` field surfaces what the + redactor took out for compliance review. + +### Protocol-endpoint allowlist + +ACME / SCEP / EST / OCSP / CRL endpoints authenticate via +embedded credentials defined by their own RFCs (JWS-signed, +challenge passwords, mTLS, public-by-RFC). The auth middleware +explicitly bypasses these via `IsProtocolEndpoint`. The Phase 12 +`internal/api/router/phase12_protocol_allowlist_test.go` pins +the invariant at three layers (middleware bypass, allowlist +constant, router-level no-rbacGate-wraps-protocol-paths). + +## Threats Bundle 1 does NOT close + +These are NOT defended; some are deferred to Bundle 2, others +are out-of-scope for the project entirely. + +1. **OIDC / SAML / WebAuthn federation** — Bundle 2. +2. **Session management** — there is no session cookie, no + server-side revocation list. Each Bearer token is the bearer + credential. To revoke a key, delete the `actor_roles` rows or + remove the env-var entry; there is no "log out everywhere" + button. Bundle 2. +3. **Local password accounts (break-glass)** — Bundle 2. +4. **Time-bound role grants / JIT elevation** — the schema + reserves `actor_roles.expires_at` but no UI/API to set it. + Bundle 2 or v3. +5. **MFA / hardware tokens for the operator console** — + Bundle 2. +6. **Rate limiting on the bootstrap endpoint** — the endpoint + is one-shot by construction (consumed flag + admin-existence + probe), so a brute-force attack on the token has at most the + single attempt before the path closes. Per-IP rate limiting + on the broader API is still in place via Bundle C's + `middleware.NewRateLimiter`. +7. **`scope_id` FK enforcement** — operators can grant a + permission at scope `profile`/`p-bogus` without the bogus + profile existing. The gate still works (no rows match at + request time) but a strict 404 on grant would be cleaner. See + `RoleRepository.AddPermission` `TODO(bundle-2)` comment in + `internal/repository/postgres/auth.go`. +8. **OIDC-first-admin bootstrap** — Bundle 1 ships only the + env-var-token strategy. Bundle 2 adds the OIDC-group-claim + strategy alongside (the `Strategy` interface in + `internal/auth/bootstrap/` is already in place). +9. **GUI E2E suite via Playwright** — the prompt asked for + nine end-to-end flow tests. Bundle 1 ships 19 React Testing + Library + Vitest tests covering the same surface; full + Playwright land in Phase 12-extended work. + +## Compliance mapping + +The control set in this document supports the following +framework requirements. This is a mapping; it is not a claim of +formal certification. + +- **SOC 2 CC6.1** (logical access controls) — RBAC primitive + with role-based gating on every mutating endpoint. +- **SOC 2 CC6.3** (privileged access management) — `r-admin` + role separation + role-grant audit trail with two-person + integrity on approval-tier profile edits. +- **HIPAA §164.312(b)** (audit controls) — `event_category` + column lets the auditor role review authentication / authorization + changes specifically. WORM trigger keeps the audit table + append-only at the database layer. +- **NIST SSDF PO.5.2** (separation of duties) — two-person + integrity for compliance-tier issuance via the + `RequiresApproval` flow + Bundle 1 Phase 9's closure of the + flip-flop bypass. +- **FedRAMP AU-9** (audit information protection) — WORM + enforcement + auditor-only read access (the auditor role + cannot mutate, the WORM trigger blocks UPDATE/DELETE). +- **PCI-DSS §10** (audit logging) — every mutating operation + emits an audit row with actor + action + resource + timestamp + + category. The audit table is append-only. + +## Operator-facing checks + +Run these periodically to verify the controls are working. + +1. `certctl-cli auth keys list` — confirm no unexpected actor + holds `r-admin`. Audit any new admin grants against the audit + log. +2. `SELECT actor, action, COUNT(*) FROM audit_events WHERE + action LIKE 'approval_%' AND timestamp > NOW() - INTERVAL '7 + days' GROUP BY actor, action;` — confirm approvals are + happening and not concentrated in a single approver. +3. `SELECT COUNT(*) FROM audit_events WHERE actor = + 'system-bypass';` — MUST return 0 in production. A non-zero + count means `CERTCTL_APPROVAL_BYPASS=true` was set; production + deploys MUST leave it unset. +4. `SELECT actor, COUNT(*) FROM audit_events WHERE action = + 'bootstrap.consume';` — MUST return at most one row per + tenant. Multiple rows means the bootstrap endpoint was called + more than once, which the strategy's one-shot guard should + have prevented; investigate. +5. `certctl-cli auth me` while authenticated as the auditor + key — `effective_permissions` must contain `audit.read` + + `audit.export` ONLY. Any other permission means a role grant + widened the auditor's surface; revoke immediately. + +## Cross-references + +- [`rbac.md`](rbac.md) — the operator how-to +- [`security.md`](security.md) — the wider security posture +- [`approval-workflow.md`](approval-workflow.md) — the two-person + integrity gate +- [`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md) — + upgrade flow +- `internal/auth/` — middleware + keystore + RequirePermission + + bootstrap +- `internal/service/auth/` — Authorizer + privilege-escalation + guard + reserved-actor guard +- `migrations/000029_rbac.up.sql` — schema + seed +- `migrations/000030_rbac_admin_perms.up.sql` — five admin-only + fine-grained perms +- `migrations/000032_audit_category.up.sql` — auditor surface +- `migrations/000033_approval_kinds.up.sql` — approval-bypass + closure diff --git a/docs/operator/rbac.md b/docs/operator/rbac.md new file mode 100644 index 0000000..5c702fd --- /dev/null +++ b/docs/operator/rbac.md @@ -0,0 +1,280 @@ +# RBAC operator reference + +> Last reviewed: 2026-05-09 + +This is the operator-facing reference for the role-based access +control primitive that ships with Bundle 1 (auth bundle 1) of certctl. +Read this if you're running certctl in production and need to grant / +revoke access to API keys, set up the auditor split, or onboard the +first admin. + +For the threat model behind these controls, see +[`auth-threat-model.md`](auth-threat-model.md). For the migration +flow from a pre-Bundle-1 deployment, see +[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md). + +## Mental model + +Every action against the certctl HTTP / CLI / MCP / GUI surface is +performed by an **actor** (an API key, an agent's machine identity, +the synthetic demo-anon actor when the server runs in +`CERTCTL_AUTH_TYPE=none` mode). Each actor holds zero or more +**roles**. Each role grants a set of **permissions** at a **scope**. +A request to a gated endpoint succeeds when the actor's effective +permission set (the union across all held roles) contains the +permission the endpoint requires. + +The schema lives in `migrations/000029_rbac.up.sql` and ships with +seven seeded default roles + a 33-permission canonical catalogue. +The middleware that gates requests lives at +`internal/auth/require_permission.go`. The service-layer authorizer +that resolves "actor → permissions" lives at +`internal/service/auth/authorizer.go`. + +## Default roles (seeded by migration 000029) + +| Role | ID | Use case | Permission shape | +|---|---|---|---| +| Admin | `r-admin` | Operator with full control | Every permission in the canonical catalogue | +| Operator | `r-operator` | Day-to-day cert lifecycle | `cert.*`, `profile.read`, `issuer.read`, `target.*`, `agent.read`, `audit.read` | +| Viewer | `r-viewer` | Read-only console access | `*.read` for every resource type | +| Agent | `r-agent` | Machine identity for `certctl-agent` | `cert.read` + `agent.heartbeat` + `agent.job.poll` + `agent.job.complete` + `agent.job.report` | +| MCP | `r-mcp` | Operator-equivalent for the MCP server, minus destructive ops | Like Operator without `*.delete` | +| CLI | `r-cli` | Day-to-day operator CLI | Like Operator + `auth.key.list` / `auth.key.create` / `auth.key.rotate` | +| Auditor | `r-auditor` | Compliance reviewer | `audit.read` + `audit.export` ONLY | + +The auditor split is the load-bearing one: an auditor cannot read +certificates, profiles, or issuers — only audit events. That makes the +role legitimate to hand to a SOC 2 / FedRAMP / PCI auditor without +giving them the keys to the kingdom. The +`internal/domain/auth/auditor_test.go` invariants pin this set going +forward. + +The five **admin-only fine-grained perms** seeded by migration +000030 (Phase 3.5 conversion) gate the high-blast-radius endpoints: + +- `cert.bulk_revoke` — `POST /api/v1/certificates/bulk-revoke` and the EST sibling +- `crl.admin` — `/api/v1/admin/crl/cache` +- `scep.admin` — `/api/v1/admin/scep/intune/*` +- `est.admin` — `/api/v1/admin/est/*` +- `ca.hierarchy.manage` — `/api/v1/issuers/{id}/intermediates`, `/api/v1/intermediates/{id}` + +Only `r-admin` holds these by default. To delegate one, create a +custom role with the specific perm and grant it to the right actor. + +## Permission catalogue + +The catalogue is namespaced. Permission strings are stable across +releases; new permissions add to the namespace, never reshape an +existing one. Run +`certctl-cli auth permissions list` (or `GET /api/v1/auth/permissions`) +for the live catalogue. + +| Namespace | Examples | What the namespace gates | +|---|---|---| +| `cert.*` | `cert.read`, `cert.issue`, `cert.revoke`, `cert.delete`, `cert.bulk_revoke` | The certificate lifecycle surface (`/api/v1/certificates`) | +| `profile.*` | `profile.read`, `profile.edit`, `profile.delete` | `CertificateProfile` CRUD | +| `issuer.*` | `issuer.read`, `issuer.edit`, `issuer.delete` | Issuer connector config | +| `target.*` | `target.read`, `target.edit`, `target.delete` | Deployment target config | +| `agent.*` | `agent.read`, `agent.edit`, `agent.retire`, `agent.heartbeat`, `agent.job.*` | Agent fleet + agent self-service endpoints | +| `audit.*` | `audit.read`, `audit.export` | The audit-events surface | +| `auth.role.*` | `auth.role.list`, `auth.role.create`, `auth.role.edit`, `auth.role.delete`, `auth.role.assign` | RBAC management | +| `auth.key.*` | `auth.key.list`, `auth.key.create`, `auth.key.rotate`, `auth.key.delete` | API key management | +| `auth.bootstrap.*` | `auth.bootstrap.use` | Day-0 first-admin path | +| `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage` | (single perms) | The five admin-only fine-grained perms (see above) | + +## Scope semantics + +Permissions are granted at one of three scopes: + +- **`global`** — applies to every resource in the tenant. The + default for the seeded role grants. A `cert.read` grant at global + scope lets the actor read any certificate. +- **`profile`** — applies only to the named `CertificateProfile` + (matched by ID). `cert.issue` at scope `profile`/`p-corp-cdn` lets + the actor issue against `p-corp-cdn` only. +- **`issuer`** — applies only to the named issuer. Lets you grant + `issuer.edit` on the production issuer to a senior operator + without giving them edit on every issuer. + +Global beats specific: an actor with `cert.read` at global scope +passes a `cert.read` check against any specific profile or issuer +even if no scoped grant exists. The reverse is also true — a +scoped grant doesn't satisfy a request against a different scope. +The Authorizer's `CheckPermission` is the single point of truth. + +> **Note (Bundle 1 deferral):** the `scope_id` column is not +> currently FK-constrained against the resource tables. An +> operator can grant a permission at scope `profile`/`p-bogus` +> without `p-bogus` existing; the gate still works (no rows match +> at request time), but the API does not 404 the grant. Bundle 2 +> tracks the strict-FK closure. See +> `internal/repository/postgres/auth.go::AddPermission`'s +> `TODO(bundle-2)` comment. + +## Granting + revoking access + +### From the GUI + +`/auth/roles` lists every role; click into one to see its +permissions and (if you hold `auth.role.edit`) add or remove a +permission. `/auth/keys` lists every actor with role grants; +click "Assign role" to grant, click the × on a role tag to revoke. + +The synthetic `actor-demo-anon` row is shown but flagged +"system-managed" with the mutation buttons hidden — the server-side +reserved-actor guard rejects mutations against it regardless. + +### From the CLI + +```bash +# Identity probe — what can the current API key actually do? +certctl-cli auth me + +# Roles +certctl-cli auth roles list +certctl-cli auth roles get r-admin + +# Permissions catalogue +certctl-cli auth permissions list + +# Key → role assignment +certctl-cli auth keys list +certctl-cli auth keys assign alice --role r-operator +certctl-cli auth keys revoke alice --role r-admin + +# Walk-every-key prompt for downgrade +certctl-cli auth keys scope-down + +# Audit-driven role suggestion (last 30 days of audit events) +certctl-cli auth keys scope-down --suggest +certctl-cli auth keys scope-down --suggest --apply + +# JSON-driven scope-down for automation (Helm post-upgrade hook etc.) +certctl-cli auth keys scope-down --non-interactive ./scope-down.json +``` + +The mutating role-lifecycle commands (`certctl-cli auth roles +create / update / delete` + `roles add-permission / remove-permission`) +are tracked as Bundle 1 Phase 5.5 follow-up; today, manage custom +roles via the HTTP API or GUI. + +### From the HTTP API + +Every endpoint is documented in `api/openapi.yaml` under the `[Auth]` +tag. Quick reference: + +| Endpoint | Permission | +|---|---| +| `GET /v1/auth/me` | (none — own data) | +| `GET /v1/auth/roles` | `auth.role.list` | +| `GET /v1/auth/roles/{id}` | `auth.role.list` | +| `POST /v1/auth/roles` | `auth.role.create` | +| `PUT /v1/auth/roles/{id}` | `auth.role.edit` | +| `DELETE /v1/auth/roles/{id}` | `auth.role.delete` | +| `GET /v1/auth/permissions` | `auth.role.list` | +| `POST /v1/auth/roles/{id}/permissions` | `auth.role.edit` | +| `DELETE /v1/auth/roles/{id}/permissions/{perm}` | `auth.role.edit` | +| `GET /v1/auth/keys` | `auth.role.list` | +| `POST /v1/auth/keys/{id}/roles` | `auth.role.assign` | +| `DELETE /v1/auth/keys/{id}/roles/{role_id}` | `auth.role.assign` | +| `GET /v1/auth/check` | (authenticated; surfaces effective perms) | +| `GET /v1/auth/bootstrap` + `POST /v1/auth/bootstrap` | (auth-exempt; gated by env-var token) | + +### From the MCP server + +Bundle 1 Phase 11 ships 12 RBAC tools: +`certctl_auth_me`, `certctl_auth_list_roles`, `certctl_auth_get_role`, +`certctl_auth_create_role`, `certctl_auth_update_role`, +`certctl_auth_delete_role`, `certctl_auth_list_permissions`, +`certctl_auth_add_permission_to_role`, +`certctl_auth_remove_permission_from_role`, +`certctl_auth_list_keys`, `certctl_auth_assign_role_to_key`, +`certctl_auth_revoke_role_from_key`. Each routes through the same +HTTP surface above; permission gates fire server-side. + +## The auditor pattern + +Hand the auditor key to compliance reviewers. They get: + +- `GET /api/v1/audit?category=auth` — every auth/authz mutation + in the system (role creates, role grants on actors, bootstrap + consumption, etc.). +- `GET /api/v1/audit?category=cert_lifecycle` — every cert event. +- `GET /api/v1/audit?category=config` — every issuer / target / + settings edit. +- `GET /api/v1/audit/export` — bulk export. + +They do NOT get cert read, profile read, issuer read, or any +mutating permission. The categorization is enforced by the database +CHECK constraint (migration 000032); the WORM trigger from +migration 000018 keeps the audit table append-only at the DB layer. + +To create an auditor key: + +1. `certctl-cli auth keys assign --role r-auditor` +2. (Optional) Revoke any other roles the key holds with + `certctl-cli auth keys revoke --role r-...` +3. Confirm via `certctl-cli auth me` while authenticated as the + auditor key — the response should show only `audit.read` and + `audit.export` in `effective_permissions`. + +## Day-0 bootstrap (first-admin path) + +Bundle 1 Phase 6 ships a one-shot bootstrap endpoint for fresh +deployments where no admin actor exists yet. + +1. Set `CERTCTL_BOOTSTRAP_TOKEN=$(openssl rand -hex 32)` in the + server environment. +2. Boot the server. Logs include + "bootstrap endpoint enabled — POST /api/v1/auth/bootstrap to + mint the first admin key (one-shot)" when the path is callable. +3. Run a single curl: + + ```bash + curl -X POST $URL/api/v1/auth/bootstrap \ + -H 'Content-Type: application/json' \ + -d '{"token":"","actor_name":"first-admin"}' + ``` + +4. Capture the `key_value` from the response. **It is shown ONCE.** + The server never logs it. +5. Use the new key to authenticate against the rest of the API. + The bootstrap path is now closed: subsequent calls return HTTP + 410 Gone, even with the same valid token, because an admin + actor exists. + +The token is constant-time-compared. The server logs a startup +warning if `CERTCTL_BOOTSTRAP_TOKEN` is set AND admin actors +already exist (config-drift signal). For OIDC-first-admin (the +"first user who signs in via SSO becomes admin" pattern), wait for +Bundle 2. + +## Demo mode (`CERTCTL_AUTH_TYPE=none`) + +When auth is disabled, the server injects a synthetic actor +`actor-demo-anon` into every request context. That actor holds +`r-admin` at global scope (seeded by migration 000029), so every +gated route resolves with a populated actor and admin grants. The +synthetic actor is reserved: the API rejects any mutation that +targets it (HTTP 409 with `ErrAuthReservedActor`). + +Production deployments MUST NOT use demo mode — there is no +per-request actor identity for the audit trail, and every request +flows as admin. Use it for the `docker compose up` demo + the five +example folders only. + +## Where to look next + +- [Threat model](auth-threat-model.md) — what attacks this primitive + defends against and which it does not +- [Migration guide](../migration/api-keys-to-rbac.md) — moving + pre-Bundle-1 deployments onto RBAC +- [Profiles](../reference/profiles.md) — the `RequiresApproval=true` + flow that Bundle 1 Phase 9 closure protects from flip-flop +- [Approval workflow](approval-workflow.md) — the Rank 7 Infisical + deep-research deliverable that the Phase 9 closure piggybacks on +- `internal/auth/` — the middleware + keystore + RequirePermission +- `internal/service/auth/` — the service-layer Authorizer +- `cowork/auth-bundle-1-prompt.md` — the design + phase plan +- `cowork/auth-bundles-index.md` — the per-phase status tracker diff --git a/docs/operator/security.md b/docs/operator/security.md index f423dc5..e13f3fb 100644 --- a/docs/operator/security.md +++ b/docs/operator/security.md @@ -1,6 +1,6 @@ # certctl Security Posture & Operator Guidance -> Last reviewed: 2026-05-05 +> Last reviewed: 2026-05-09 This document collects the operator-facing security guidance that the source code's per-finding comment blocks reference. Each section names the audit @@ -75,15 +75,60 @@ the accompanying tests for the format spec. Bundle B / M-002. Two layers decide auth-exempt status: 1. **Router layer:** `internal/api/router/router.go::AuthExemptRouterRoutes` - — the 4 endpoints registered via direct `r.mux.Handle` without going + — the endpoints registered via direct `r.mux.Handle` without going through the middleware chain (`/health`, `/ready`, `/api/v1/auth/info`, - `/api/v1/version`). + `/api/v1/version`, plus `/api/v1/auth/bootstrap` GET + POST per + Bundle 1 Phase 6). 2. **Dispatch layer:** `internal/api/router/router.go::AuthExemptDispatchPrefixes` — URL-prefix routing in `cmd/server/main.go::buildFinalHandler` for - `/.well-known/pki/*`, `/.well-known/est/*`, and `/scep[/...]*`. + `/.well-known/pki/*`, `/.well-known/est/*`, `/.well-known/est-mtls`, + and `/scep[/...]*` (incl. `/scep-mtls`). Both lists have AST-walking regression tests (`auth_exempt_test.go`) that -fail CI if a new bypass lands without an updating the documented constant. +fail CI if a new bypass lands without updating the documented constant. + +### RBAC primitive (Bundle 1) + +Bundle 1 ships role-based authorization on top of API-key +authentication. Every gated handler routes through the +`auth.RequirePermission` middleware (or its router-level wrap +`rbacGate`); the middleware resolves the actor's effective +permissions via the service-layer `Authorizer.CheckPermission` +and returns HTTP 403 BEFORE the handler body runs on miss. The +seven default roles (`admin` / `operator` / `viewer` / `agent` / +`mcp` / `cli` / `auditor`), 33-permission canonical catalogue, +and the auditor split (`r-auditor` holds only `audit.read` + +`audit.export`) are seeded by migration 000029. + +For the operator how-to, see [`rbac.md`](rbac.md). For the +threat model + compliance mapping, see +[`auth-threat-model.md`](auth-threat-model.md). For the upgrade +flow from a pre-Bundle-1 deployment, see +[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md). + +### Day-0 admin bootstrap (Bundle 1 Phase 6) + +Fresh deployments where no admin actor exists yet can mint the +first admin via `POST /api/v1/auth/bootstrap` — set +`CERTCTL_BOOTSTRAP_TOKEN`, POST a single curl with the token, and +the server returns the plaintext key value once. The token is +constant-time-compared; the strategy is one-shot via mutex; the +admin-existence probe re-closes the path once an admin lands. +The token is NEVER logged. The minted plaintext key flows only +into the HTTP response body. See +[`rbac.md`](rbac.md#day-0-bootstrap-first-admin-path) for the +full flow. + +### Approval-bypass closure (Bundle 1 Phase 9) + +`CertificateProfile.RequiresApproval=true` profiles route both +issuance/renewal AND profile edits through the +`ApprovalService` two-person integrity gate (Phase 9 closes the +flip-flop loophole where an admin could disable approval, mutate, +re-enable). Same-actor self-approve is rejected at the service +layer with `ErrApproveBySameActor`. See +[`docs/reference/profiles.md`](../reference/profiles.md) for the +full gate semantics. ## Per-user rate limiting diff --git a/docs/reference/profiles.md b/docs/reference/profiles.md index 2b865db..75d3546 100644 --- a/docs/reference/profiles.md +++ b/docs/reference/profiles.md @@ -1,6 +1,6 @@ # Certificate profiles -Last reviewed: 2026-05-09 +> Last reviewed: 2026-05-09 A `CertificateProfile` is the policy object that groups every cert with the same shape: which issuer mints it, which key algorithm + size are