Files
certctl/docs/approval-workflow.md
T
shankar0123 72d00b8865 scheduler, certificate, renewal: gate issuance on profile-driven approval
Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."

This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
  - 1 (b4d1ad1): domain types + migration + repository
  - 2 (df23294): ApprovalService + ApprovalMetrics + 8 service tests
  - 3 (f53f9f9): 4 API endpoints + handler RBAC tests + router wiring

Files modified:
  cmd/server/main.go              - Constructs approvalRepo +
                                     approvalMetrics + approvalService
                                     + approvalHandler. Wires
                                     CertificateService via
                                     SetApprovalService + SetProfileRepo.
                                     Logs a WARN line at boot when
                                     CERTCTL_APPROVAL_BYPASS=true so
                                     production operators alert on the
                                     log line. Adds Approvals to the
                                     HandlerRegistry.

  internal/config/config.go       - Adds top-level ApprovalConfig
                                     {BypassEnabled bool} sub-config
                                     + CERTCTL_APPROVAL_BYPASS env var
                                     loader. Doc comment cites the
                                     compliance-detection SQL query
                                     (SELECT count FROM audit_events
                                     WHERE actor='system-bypass') so
                                     auditors find the right pattern.

  internal/service/certificate.go - Adds approvalSvc + profileRepo
                                     fields to CertificateService +
                                     SetApprovalService /
                                     SetProfileRepo setters. Extends
                                     TriggerRenewal: looks up the
                                     profile, checks RequiresApproval,
                                     creates the Job at
                                     JobStatusAwaitingApproval (override
                                     the keygen-mode default), then
                                     calls approvalSvc.RequestApproval
                                     to create the parallel
                                     ApprovalRequest row. On
                                     RequestApproval failure, cancels
                                     the orphan Job (defense in depth —
                                     without this, a partial failure
                                     would leave the job stuck at
                                     AwaitingApproval forever). Profile-
                                     lookup failures fall back to the
                                     unattended path (fail-open from
                                     the operator's perspective +
                                     fail-loud via slog.Warn).

Files added:
  docs/approval-workflow.md       - Sysadmin-grade operator runbook:
                                      end-to-end ASCII flowchart
                                      (operator A triggers → operator
                                      B approves → scheduler dispatches),
                                      configuration recipe, RBAC contract
                                      (the load-bearing two-person
                                      integrity rule), operator playbooks
                                      for "I need to approve a renewal"
                                      and "approval timed out", PCI-DSS
                                      6.4.5 / NIST 800-53 SA-15 / SOC 2
                                      CC6.1 / HIPAA control mapping
                                      table, bypass-mode warnings with
                                      the exact compliance-detection SQL
                                      query, Prometheus metric reference,
                                      future free V2 work pointers.

Out of scope of THIS commit (deferred follow-on, not blocking the rest):
  - RenewalService.CheckExpiringCertificates auto-renewal-loop gate.
    The manual TriggerRenewal entry point is gated and the job-level
    timeout reaper already covers AwaitingApproval; the auto-renewal
    gate adds parity. Trivial to add — one block in renewal.go that
    mirrors the certificate.go::TriggerRenewal gate. Tracked in
    WORKSPACE-ROADMAP under the Approval-workflow extensions section.
  - Scheduler reaper extension calling ApprovalService.ExpireStale.
    Today: when the existing reaper times out an AwaitingApproval job,
    the parallel ApprovalRequest row stays at state=pending. The audit
    timeline is still correct (the job-side audit row records the
    timeout) but the dashboard shows a row that no longer needs human
    review. Trivial to wire — one method call in the existing
    scheduler tick. Same WORKSPACE-ROADMAP follow-on.
  - api/openapi.yaml extensions for the 4 new operationIds.
    The HTTP contract is pinned by the handler-level tests; OpenAPI
    is documentation that mirrors the contract.
  - docs/connectors.md `requires_approval` row in the CertificateProfile
    config table. Tracked in the same follow-on; the new
    docs/approval-workflow.md is the canonical reference.

Workspace-level updates (in cowork/, not under certctl/ git control —
applied separately):
  WORKSPACE-ROADMAP.md            - "Approval-workflow extensions"
                                     section under "Future Free V2 Work"
                                     covering M-of-N chains + time-
                                     windowed auto-approve + external
                                     ticketing + per-owner routing +
                                     delegation. All items free under
                                     BSL — no V3-Pro framing per the
                                     2026-05-03 strategy pivot (open
                                     core under BSL; future revenue =
                                     managed-service hosting).

Verified locally:
  gofmt: clean.
  go vet ./...: exit 0.
  go build ./...: exit 0 — full repo links cleanly with the new
    Approval wiring.
  go test -short -count=1 -run TestApproval
    ./internal/service/... ./internal/api/handler/...:
    ok 0.005s for both packages — all 11 approval tests green
    (8 service-level + 3 handler-level).

Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
Commits: b4d1ad1df23294f53f9f9 → THIS COMMIT.
2026-05-04 01:12:07 +00:00

8.7 KiB

Issuance approval workflow

certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA) configure this on production-tier CertificateProfile rows so every renewal-loop tick or manual POST /api/v1/certificates/{id}/renew blocks at JobStatusAwaitingApproval until a different actor approves.

Rank 7 of the 2026-05-03 Infisical deep-research deliverable. Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — pre-Rank-7 the answer was "we don't"; post-Rank-7 the answer is "set requires_approval=true on the profile + here's the audit query that proves bypass mode is off in production."

End-to-end flow

Operator A (or scheduler)        Operator B
        │                              │
        ▼                              │
POST /api/v1/certificates/             │
  {id}/renew                           │
  (or renewal-loop tick)               │
        │                              │
        ▼                              │
CertificateService.TriggerRenewal      │
  ├── reads profile.RequiresApproval   │
  ├── creates Job at                   │
  │     JobStatusAwaitingApproval      │
  └── creates parallel                 │
        ApprovalRequest row            │
        (state=pending,                │
         requested_by=Operator A)      │
        │                              │
        │ scheduler skips —            │
        │ AwaitingApproval is          │
        │ NOT a dispatchable status    │
        │                              │
        │     GET /api/v1/approvals?state=pending
        │                              ▼
        │            POST /api/v1/approvals/{id}/approve
        │                              │
        ▼                              ▼
ApprovalService.Approve(decided_by=Operator B, note=...)
  ├── RBAC: rejects if Operator B == Operator A → ErrApproveBySameActor (HTTP 403)
  ├── transitions ApprovalRequest to state=approved
  ├── transitions Job from AwaitingApproval → Pending
  ├── records audit row (action=approval_approved, actor=Operator B)
  └── increments certctl_approval_decisions_total{outcome=approved,profile_id=...}
        │
        ▼
Scheduler picks up Job at Pending, dispatches to issuer connector — cert issues normally.

Configuration

Set requires_approval=true on a CertificateProfile:

curl -X PUT https://certctl/api/v1/profiles/p-prod-cdn \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "Production CDN",
        "requires_approval": true,
        ...
      }'

Every certificate bound to that profile is now gated. The default is requires_approval=false — existing profiles keep the historical unattended renewal path.

RBAC: the two-person integrity rule

The actor that triggers a renewal cannot be the actor that approves it. The check happens at the service layer and surfaces as HTTP 403 at the handler. The error message contains the substring two-person integrity so server-log greps detect attempted self-approvals.

This is the load-bearing compliance contract. Pinned by:

  • internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor — service-level pin.
  • internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403 — handler-level pin (HTTP 403 + body contains "two-person integrity").

Operator playbook: "I need to approve a renewal"

# 1. Find the pending request
curl -s "https://certctl/api/v1/approvals?state=pending" \
     -H "Authorization: Bearer $API_KEY" | jq

# 2. Inspect the request — confirm CN, SANs, requester
curl -s "https://certctl/api/v1/approvals/ar-abc123" \
     -H "Authorization: Bearer $API_KEY" | jq

# 3. Approve as a different actor than the requester
curl -X POST "https://certctl/api/v1/approvals/ar-abc123/approve" \
     -H "Authorization: Bearer $APPROVER_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"note":"approved per ticket SECOPS-12345"}'

# 4. Confirm the job transitioned to Pending
curl -s "https://certctl/api/v1/jobs?certificate_id=mc-foo" \
     -H "Authorization: Bearer $API_KEY" | jq '.[] | {id,status,type}'

To reject instead, swap the path: POST /api/v1/approvals/{id}/reject with the same body shape. The job transitions to Cancelled and the note is recorded in the audit row.

Operator playbook: "approval timed out"

The scheduler reaper transitions stale pending requests + their linked jobs after CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT (default 168h = 7 days):

  • ApprovalRequest.stateexpired
  • Job.StatusCancelled (with error_message="approval expired")
  • One audit row per expiry (action=approval_expired, actor=system-reaper, actorType=System)
  • certctl_approval_decisions_total{outcome="expired",profile_id="..."} increments

Resolve by re-triggering the renewal once the underlying delay is sorted:

curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
     -H "Authorization: Bearer $API_KEY"

Tighten the timeout for short-window deployments via the env var, e.g. CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h.

Compliance control mapping

Standard Control What this surface satisfies
PCI-DSS 4.0 §6.4.5 (Separation of duties for production change-management) Same-actor RBAC pin; audit row carries both requested_by and decided_by so reviewers see two distinct identities per change.
NIST SP 800-53 SA-15 (Development process; two-person review for security-relevant changes) Service-layer ErrApproveBySameActor + TestApproval_Approve_RejectsSameActor pin the contract. Bypass-mode emits a typed audit row (action=approval_bypassed) so compliance reviewers detect dev-mode misuse via SELECT count(*) FROM audit_events WHERE actor='system-bypass' returning > 0.
SOC 2 Type II CC6.1 (Logical access — restrict, monitor, terminate) Per-decision audit row + certctl_approval_decisions_total{outcome,profile_id} Prometheus counter. Operators alert on sustained outcome="rejected" or outcome="expired" bursts.
HIPAA §164.308(a)(4) (Information access management) Same surface — the per-policy gating + audit trail is the access-management control.

Bypass mode (dev / CI ONLY)

Setting CERTCTL_APPROVAL_BYPASS=true short-circuits the workflow: every RequestApproval call auto-approves with decided_by=system-bypass and actorType=System. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.

Production deploys MUST leave this unset. The bypass emits a typed audit event (action=approval_bypassed) so compliance auditors detect misuse via:

SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';

returning zero rows in production and a high count in dev. The certctl-server logs a WARN line at boot when bypass is enabled — operators alert on that log line in production environments.

Prometheus metrics

certctl_approval_decisions_total{outcome,profile_id}        counter
certctl_approval_pending_age_seconds                        histogram
                                  (le buckets:
                                    60, 300, 1800, 3600,
                                    21600, 86400, +Inf)

outcome is one of approved, rejected, expired, bypassed. profile_id is the CertificateProfile.ID that triggered the gate (cardinality-bounded — operators have <100 profiles in production).

The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — compliance customers usually have a same-day decision deadline.

Future free V2 work

  • M-of-N approver chains. Today's primitive is single-approver. Future V2 work adds chains — e.g., "needs 2 of 3 platform-team members."
  • Time-windowed auto-approve. Today's reaper hard-cancels at the static deadline. Policy-driven time-windowed auto-approve (T+30m unattended → cancel; T+24h business hours → escalate) is future work.
  • External ticketing integration. ServiceNow / JIRA bridging so approval state mirrors the change-management record.
  • Per-owner / per-team routing. Today's pool is global. Per-owner / per-team routing matches cert ownership to approver pools.
  • Approval delegation. Today the same-actor rule is strict. Time-bounded delegation is future work.

Tracked in WORKSPACE-ROADMAP.md under the Future Free V2 Work section — every item ships free under BSL.