Pure git mv operations; no content edits. Internal links remain pointing
at old paths and will be fixed in Phase 11. Per the Phase 1 audit
recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
35 files moved across 8 audience-organized subdirectories:
docs/getting-started/ (5):
quickstart.md, concepts.md, examples.md, advanced-demo.md (was
demo-advanced.md), why-certctl.md
docs/reference/ (6):
architecture.md, api.md (was openapi.md), mcp.md,
intermediate-ca-hierarchy.md, deployment-model.md (was
deployment-atomicity.md), vendor-matrix.md (was
deployment-vendor-matrix.md)
docs/reference/protocols/ (6):
acme-server.md, acme-server-threat-model.md, scep-intune.md,
est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md)
docs/operator/ (4):
security.md, tls.md, database-tls.md, approval-workflow.md
docs/operator/runbooks/ (3):
cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md
(was runbook-expiry-alerts.md), disaster-recovery.md
docs/migration/ (3):
from-certbot.md (was migrate-from-certbot.md), from-acmesh.md
(was migrate-from-acmesh.md), cert-manager-coexistence.md (was
certctl-for-cert-manager-users.md)
docs/compliance/ (4):
index.md (was compliance.md), soc2.md (was compliance-soc2.md),
pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was
compliance-nist.md)
docs/contributor/ (4):
testing-strategy.md, test-environment.md (was test-env.md),
ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md)
Deferred to later Phase 2 sub-phases:
- connectors.md split (Phase 4): docs/connectors.md +
docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level
- testing-guide.md prune (Phase 5): docs/testing-guide.md still
at top level
- features.md disperse (Phase 6): docs/features.md still at top
level
- legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md
still at top level
- ACME walkthrough re-homing (Phase 8): three
docs/acme-*-walkthrough.md still at top level
- Upgrade docs archive (Phase 3): two docs/upgrade-*.md still
at top level
Cross-reference updates (Phase 11) will happen after all moves and
content edits land. Internal links to docs/* paths are temporarily
broken until that phase completes.
8.1 KiB
Issuance approval workflow
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA) configure this on production-tier CertificateProfile rows so every renewal-loop tick or manual POST /api/v1/certificates/{id}/renew blocks at JobStatusAwaitingApproval until a different actor approves.
Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — without this surface the answer is "we don't"; with requires_approval=true on the profile, the answer is "here's the RBAC contract + here's the audit query that proves bypass mode is off in production."
End-to-end flow
sequenceDiagram
autonumber
participant A as Operator A<br/>(or scheduler)
participant SVC as CertificateService<br/>.TriggerRenewal
participant JOB as Job + ApprovalRequest
participant B as Operator B
participant APR as ApprovalService.Approve
participant SCH as Scheduler
A->>SVC: POST /api/v1/certificates/{id}/renew<br/>(or renewal-loop tick)
SVC->>JOB: read profile.RequiresApproval;<br/>create Job @ JobStatusAwaitingApproval;<br/>create ApprovalRequest<br/>(state=pending, requested_by=Operator A)
Note over JOB,SCH: Scheduler skips —<br/>AwaitingApproval is NOT a dispatchable status
B->>JOB: GET /api/v1/approvals?state=pending
B->>APR: POST /api/v1/approvals/{id}/approve<br/>(decided_by=Operator B, note=...)
APR->>APR: RBAC: reject if Operator B == Operator A<br/>→ ErrApproveBySameActor (HTTP 403)
APR->>JOB: ApprovalRequest → state=approved;<br/>Job AwaitingApproval → Pending;<br/>audit row (action=approval_approved,<br/>actor=Operator B);<br/>certctl_approval_decisions_total<br/>{outcome=approved,profile_id=...}++
SCH->>JOB: pick up Pending → dispatch to issuer connector
JOB-->>A: cert issues normally
Configuration
Set requires_approval=true on a CertificateProfile:
curl -X PUT https://certctl/api/v1/profiles/p-prod-cdn \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Production CDN",
"requires_approval": true,
...
}'
Every certificate bound to that profile is now gated. The default is requires_approval=false — existing profiles keep the historical unattended renewal path.
RBAC: the two-person integrity rule
The actor that triggers a renewal cannot be the actor that approves it. The check happens at the service layer and surfaces as HTTP 403 at the handler. The error message contains the substring two-person integrity so server-log greps detect attempted self-approvals.
This is the load-bearing compliance contract. Pinned by:
internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor— service-level pin.internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403— handler-level pin (HTTP 403 + body contains "two-person integrity").
Operator playbook: "I need to approve a renewal"
# 1. Find the pending request
curl -s "https://certctl/api/v1/approvals?state=pending" \
-H "Authorization: Bearer $API_KEY" | jq
# 2. Inspect the request — confirm CN, SANs, requester
curl -s "https://certctl/api/v1/approvals/ar-abc123" \
-H "Authorization: Bearer $API_KEY" | jq
# 3. Approve as a different actor than the requester
curl -X POST "https://certctl/api/v1/approvals/ar-abc123/approve" \
-H "Authorization: Bearer $APPROVER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"note":"approved per ticket SECOPS-12345"}'
# 4. Confirm the job transitioned to Pending
curl -s "https://certctl/api/v1/jobs?certificate_id=mc-foo" \
-H "Authorization: Bearer $API_KEY" | jq '.[] | {id,status,type}'
To reject instead, swap the path: POST /api/v1/approvals/{id}/reject with the same body shape. The job transitions to Cancelled and the note is recorded in the audit row.
Operator playbook: "approval timed out"
The scheduler reaper transitions stale pending requests + their linked jobs after CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT (default 168h = 7 days):
ApprovalRequest.state→expiredJob.Status→Cancelled(witherror_message="approval expired")- One audit row per expiry (
action=approval_expired, actor=system-reaper, actorType=System) certctl_approval_decisions_total{outcome="expired",profile_id="..."}increments
Resolve by re-triggering the renewal once the underlying delay is sorted:
curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
-H "Authorization: Bearer $API_KEY"
Tighten the timeout for short-window deployments via the env var, e.g. CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h.
Compliance control mapping
| Standard | Control | What this surface satisfies |
|---|---|---|
| PCI-DSS 4.0 | §6.4.5 (Separation of duties for production change-management) | Same-actor RBAC pin; audit row carries both requested_by and decided_by so reviewers see two distinct identities per change. |
| NIST SP 800-53 | SA-15 (Development process; two-person review for security-relevant changes) | Service-layer ErrApproveBySameActor + TestApproval_Approve_RejectsSameActor pin the contract. Bypass-mode emits a typed audit row (action=approval_bypassed) so compliance reviewers detect dev-mode misuse via SELECT count(*) FROM audit_events WHERE actor='system-bypass' returning > 0. |
| SOC 2 Type II | CC6.1 (Logical access — restrict, monitor, terminate) | Per-decision audit row + certctl_approval_decisions_total{outcome,profile_id} Prometheus counter. Operators alert on sustained outcome="rejected" or outcome="expired" bursts. |
| HIPAA | §164.308(a)(4) (Information access management) | Same surface — the per-policy gating + audit trail is the access-management control. |
Bypass mode (dev / CI ONLY)
Setting CERTCTL_APPROVAL_BYPASS=true short-circuits the workflow: every RequestApproval call auto-approves with decided_by=system-bypass and actorType=System. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.
Production deploys MUST leave this unset. The bypass emits a typed audit event (action=approval_bypassed) so compliance auditors detect misuse via:
SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
returning zero rows in production and a high count in dev. The certctl-server logs a WARN line at boot when bypass is enabled — operators alert on that log line in production environments.
Prometheus metrics
certctl_approval_decisions_total{outcome,profile_id} counter
certctl_approval_pending_age_seconds histogram
(le buckets:
60, 300, 1800, 3600,
21600, 86400, +Inf)
outcome is one of approved, rejected, expired, bypassed. profile_id is the CertificateProfile.ID that triggered the gate (cardinality-bounded — operators have <100 profiles in production).
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — compliance customers usually have a same-day decision deadline.
Future free V2 work
- M-of-N approver chains. Today's primitive is single-approver. Future V2 work adds chains — e.g., "needs 2 of 3 platform-team members."
- Time-windowed auto-approve. Today's reaper hard-cancels at the static deadline. Policy-driven time-windowed auto-approve (T+30m unattended → cancel; T+24h business hours → escalate) is future work.
- External ticketing integration. ServiceNow / JIRA bridging so approval state mirrors the change-management record.
- Per-owner / per-team routing. Today's pool is global. Per-owner / per-team routing matches cert ownership to approver pools.
- Approval delegation. Today the same-actor rule is strict. Time-bounded delegation is future work.
Tracked in WORKSPACE-ROADMAP.md under the Future Free V2 Work section — every item ships free under BSL.