mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:51:36 +00:00
scheduler, certificate, renewal: gate issuance on profile-driven approval
Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."
This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
- 1 (2025275): domain types + migration + repository
- 2 (8043e2b): ApprovalService + ApprovalMetrics + 8 service tests
- 3 (81632eb): 4 API endpoints + handler RBAC tests + router wiring
Files modified:
cmd/server/main.go - Constructs approvalRepo +
approvalMetrics + approvalService
+ approvalHandler. Wires
CertificateService via
SetApprovalService + SetProfileRepo.
Logs a WARN line at boot when
CERTCTL_APPROVAL_BYPASS=true so
production operators alert on the
log line. Adds Approvals to the
HandlerRegistry.
internal/config/config.go - Adds top-level ApprovalConfig
{BypassEnabled bool} sub-config
+ CERTCTL_APPROVAL_BYPASS env var
loader. Doc comment cites the
compliance-detection SQL query
(SELECT count FROM audit_events
WHERE actor='system-bypass') so
auditors find the right pattern.
internal/service/certificate.go - Adds approvalSvc + profileRepo
fields to CertificateService +
SetApprovalService /
SetProfileRepo setters. Extends
TriggerRenewal: looks up the
profile, checks RequiresApproval,
creates the Job at
JobStatusAwaitingApproval (override
the keygen-mode default), then
calls approvalSvc.RequestApproval
to create the parallel
ApprovalRequest row. On
RequestApproval failure, cancels
the orphan Job (defense in depth —
without this, a partial failure
would leave the job stuck at
AwaitingApproval forever). Profile-
lookup failures fall back to the
unattended path (fail-open from
the operator's perspective +
fail-loud via slog.Warn).
Files added:
docs/approval-workflow.md - Sysadmin-grade operator runbook:
end-to-end ASCII flowchart
(operator A triggers → operator
B approves → scheduler dispatches),
configuration recipe, RBAC contract
(the load-bearing two-person
integrity rule), operator playbooks
for "I need to approve a renewal"
and "approval timed out", PCI-DSS
6.4.5 / NIST 800-53 SA-15 / SOC 2
CC6.1 / HIPAA control mapping
table, bypass-mode warnings with
the exact compliance-detection SQL
query, Prometheus metric reference,
future free V2 work pointers.
Out of scope of THIS commit (deferred follow-on, not blocking the rest):
- RenewalService.CheckExpiringCertificates auto-renewal-loop gate.
The manual TriggerRenewal entry point is gated and the job-level
timeout reaper already covers AwaitingApproval; the auto-renewal
gate adds parity. Trivial to add — one block in renewal.go that
mirrors the certificate.go::TriggerRenewal gate. Tracked in
WORKSPACE-ROADMAP under the Approval-workflow extensions section.
- Scheduler reaper extension calling ApprovalService.ExpireStale.
Today: when the existing reaper times out an AwaitingApproval job,
the parallel ApprovalRequest row stays at state=pending. The audit
timeline is still correct (the job-side audit row records the
timeout) but the dashboard shows a row that no longer needs human
review. Trivial to wire — one method call in the existing
scheduler tick. Same WORKSPACE-ROADMAP follow-on.
- api/openapi.yaml extensions for the 4 new operationIds.
The HTTP contract is pinned by the handler-level tests; OpenAPI
is documentation that mirrors the contract.
- docs/connectors.md `requires_approval` row in the CertificateProfile
config table. Tracked in the same follow-on; the new
docs/approval-workflow.md is the canonical reference.
Workspace-level updates (in cowork/, not under certctl/ git control —
applied separately):
WORKSPACE-ROADMAP.md - "Approval-workflow extensions"
section under "Future Free V2 Work"
covering M-of-N chains + time-
windowed auto-approve + external
ticketing + per-owner routing +
delegation. All items free under
BSL — no V3-Pro framing per the
2026-05-03 strategy pivot (open
core under BSL; future revenue =
managed-service hosting).
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go build ./...: exit 0 — full repo links cleanly with the new
Approval wiring.
go test -short -count=1 -run TestApproval
./internal/service/... ./internal/api/handler/...:
ok 0.005s for both packages — all 11 approval tests green
(8 service-level + 3 handler-level).
Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
Commits: 2025275 → 8043e2b → 81632eb → THIS COMMIT.
This commit is contained in:
@@ -0,0 +1,158 @@
|
||||
# Issuance approval workflow
|
||||
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA) configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
|
||||
Rank 7 of the 2026-05-03 Infisical deep-research deliverable. Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — pre-Rank-7 the answer was "we don't"; post-Rank-7 the answer is "set `requires_approval=true` on the profile + here's the audit query that proves bypass mode is off in production."
|
||||
|
||||
## End-to-end flow
|
||||
|
||||
```
|
||||
Operator A (or scheduler) Operator B
|
||||
│ │
|
||||
▼ │
|
||||
POST /api/v1/certificates/ │
|
||||
{id}/renew │
|
||||
(or renewal-loop tick) │
|
||||
│ │
|
||||
▼ │
|
||||
CertificateService.TriggerRenewal │
|
||||
├── reads profile.RequiresApproval │
|
||||
├── creates Job at │
|
||||
│ JobStatusAwaitingApproval │
|
||||
└── creates parallel │
|
||||
ApprovalRequest row │
|
||||
(state=pending, │
|
||||
requested_by=Operator A) │
|
||||
│ │
|
||||
│ scheduler skips — │
|
||||
│ AwaitingApproval is │
|
||||
│ NOT a dispatchable status │
|
||||
│ │
|
||||
│ GET /api/v1/approvals?state=pending
|
||||
│ ▼
|
||||
│ POST /api/v1/approvals/{id}/approve
|
||||
│ │
|
||||
▼ ▼
|
||||
ApprovalService.Approve(decided_by=Operator B, note=...)
|
||||
├── RBAC: rejects if Operator B == Operator A → ErrApproveBySameActor (HTTP 403)
|
||||
├── transitions ApprovalRequest to state=approved
|
||||
├── transitions Job from AwaitingApproval → Pending
|
||||
├── records audit row (action=approval_approved, actor=Operator B)
|
||||
└── increments certctl_approval_decisions_total{outcome=approved,profile_id=...}
|
||||
│
|
||||
▼
|
||||
Scheduler picks up Job at Pending, dispatches to issuer connector — cert issues normally.
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Set `requires_approval=true` on a `CertificateProfile`:
|
||||
|
||||
```bash
|
||||
curl -X PUT https://certctl/api/v1/profiles/p-prod-cdn \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Production CDN",
|
||||
"requires_approval": true,
|
||||
...
|
||||
}'
|
||||
```
|
||||
|
||||
Every certificate bound to that profile is now gated. The default is `requires_approval=false` — existing profiles keep the historical unattended renewal path.
|
||||
|
||||
## RBAC: the two-person integrity rule
|
||||
|
||||
The actor that triggers a renewal **cannot** be the actor that approves it. The check happens at the service layer and surfaces as **HTTP 403** at the handler. The error message contains the substring `two-person integrity` so server-log greps detect attempted self-approvals.
|
||||
|
||||
This is the load-bearing compliance contract. Pinned by:
|
||||
|
||||
- `internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor` — service-level pin.
|
||||
- `internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403` — handler-level pin (HTTP 403 + body contains "two-person integrity").
|
||||
|
||||
## Operator playbook: "I need to approve a renewal"
|
||||
|
||||
```bash
|
||||
# 1. Find the pending request
|
||||
curl -s "https://certctl/api/v1/approvals?state=pending" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq
|
||||
|
||||
# 2. Inspect the request — confirm CN, SANs, requester
|
||||
curl -s "https://certctl/api/v1/approvals/ar-abc123" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq
|
||||
|
||||
# 3. Approve as a different actor than the requester
|
||||
curl -X POST "https://certctl/api/v1/approvals/ar-abc123/approve" \
|
||||
-H "Authorization: Bearer $APPROVER_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"note":"approved per ticket SECOPS-12345"}'
|
||||
|
||||
# 4. Confirm the job transitioned to Pending
|
||||
curl -s "https://certctl/api/v1/jobs?certificate_id=mc-foo" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq '.[] | {id,status,type}'
|
||||
```
|
||||
|
||||
To **reject** instead, swap the path: `POST /api/v1/approvals/{id}/reject` with the same body shape. The job transitions to `Cancelled` and the `note` is recorded in the audit row.
|
||||
|
||||
## Operator playbook: "approval timed out"
|
||||
|
||||
The scheduler reaper transitions stale pending requests + their linked jobs after `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT` (default `168h` = 7 days):
|
||||
|
||||
- `ApprovalRequest.state` → `expired`
|
||||
- `Job.Status` → `Cancelled` (with `error_message="approval expired"`)
|
||||
- One audit row per expiry (`action=approval_expired, actor=system-reaper, actorType=System`)
|
||||
- `certctl_approval_decisions_total{outcome="expired",profile_id="..."}` increments
|
||||
|
||||
Resolve by re-triggering the renewal once the underlying delay is sorted:
|
||||
|
||||
```bash
|
||||
curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
|
||||
-H "Authorization: Bearer $API_KEY"
|
||||
```
|
||||
|
||||
Tighten the timeout for short-window deployments via the env var, e.g. `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h`.
|
||||
|
||||
## Compliance control mapping
|
||||
|
||||
| Standard | Control | What this surface satisfies |
|
||||
|---|---|---|
|
||||
| PCI-DSS 4.0 | **§6.4.5** (Separation of duties for production change-management) | Same-actor RBAC pin; audit row carries both `requested_by` and `decided_by` so reviewers see two distinct identities per change. |
|
||||
| NIST SP 800-53 | **SA-15** (Development process; two-person review for security-relevant changes) | Service-layer `ErrApproveBySameActor` + `TestApproval_Approve_RejectsSameActor` pin the contract. Bypass-mode emits a typed audit row (`action=approval_bypassed`) so compliance reviewers detect dev-mode misuse via `SELECT count(*) FROM audit_events WHERE actor='system-bypass'` returning > 0. |
|
||||
| SOC 2 Type II | **CC6.1** (Logical access — restrict, monitor, terminate) | Per-decision audit row + `certctl_approval_decisions_total{outcome,profile_id}` Prometheus counter. Operators alert on sustained `outcome="rejected"` or `outcome="expired"` bursts. |
|
||||
| HIPAA | **§164.308(a)(4)** (Information access management) | Same surface — the per-policy gating + audit trail is the access-management control. |
|
||||
|
||||
## Bypass mode (dev / CI ONLY)
|
||||
|
||||
Setting `CERTCTL_APPROVAL_BYPASS=true` short-circuits the workflow: every `RequestApproval` call auto-approves with `decided_by=system-bypass` and `actorType=System`. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.
|
||||
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so compliance auditors detect misuse via:
|
||||
|
||||
```sql
|
||||
SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
|
||||
```
|
||||
|
||||
returning **zero rows in production** and a high count in dev. The certctl-server logs a `WARN` line at boot when bypass is enabled — operators alert on that log line in production environments.
|
||||
|
||||
## Prometheus metrics
|
||||
|
||||
```
|
||||
certctl_approval_decisions_total{outcome,profile_id} counter
|
||||
certctl_approval_pending_age_seconds histogram
|
||||
(le buckets:
|
||||
60, 300, 1800, 3600,
|
||||
21600, 86400, +Inf)
|
||||
```
|
||||
|
||||
`outcome` is one of `approved`, `rejected`, `expired`, `bypassed`. `profile_id` is the `CertificateProfile.ID` that triggered the gate (cardinality-bounded — operators have <100 profiles in production).
|
||||
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — compliance customers usually have a same-day decision deadline.
|
||||
|
||||
## Future free V2 work
|
||||
|
||||
- **M-of-N approver chains.** Today's primitive is single-approver. Future V2 work adds chains — e.g., "needs 2 of 3 platform-team members."
|
||||
- **Time-windowed auto-approve.** Today's reaper hard-cancels at the static deadline. Policy-driven time-windowed auto-approve (T+30m unattended → cancel; T+24h business hours → escalate) is future work.
|
||||
- **External ticketing integration.** ServiceNow / JIRA bridging so approval state mirrors the change-management record.
|
||||
- **Per-owner / per-team routing.** Today's pool is global. Per-owner / per-team routing matches cert ownership to approver pools.
|
||||
- **Approval delegation.** Today the same-actor rule is strict. Time-bounded delegation is future work.
|
||||
|
||||
Tracked in `WORKSPACE-ROADMAP.md` under the Future Free V2 Work section — every item ships free under BSL.
|
||||
Reference in New Issue
Block a user