scheduler, certificate, renewal: gate issuance on profile-driven approval

Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."

This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
  - 1 (2025275): domain types + migration + repository
  - 2 (8043e2b): ApprovalService + ApprovalMetrics + 8 service tests
  - 3 (81632eb): 4 API endpoints + handler RBAC tests + router wiring

Files modified:
  cmd/server/main.go              - Constructs approvalRepo +
                                     approvalMetrics + approvalService
                                     + approvalHandler. Wires
                                     CertificateService via
                                     SetApprovalService + SetProfileRepo.
                                     Logs a WARN line at boot when
                                     CERTCTL_APPROVAL_BYPASS=true so
                                     production operators alert on the
                                     log line. Adds Approvals to the
                                     HandlerRegistry.

  internal/config/config.go       - Adds top-level ApprovalConfig
                                     {BypassEnabled bool} sub-config
                                     + CERTCTL_APPROVAL_BYPASS env var
                                     loader. Doc comment cites the
                                     compliance-detection SQL query
                                     (SELECT count FROM audit_events
                                     WHERE actor='system-bypass') so
                                     auditors find the right pattern.

  internal/service/certificate.go - Adds approvalSvc + profileRepo
                                     fields to CertificateService +
                                     SetApprovalService /
                                     SetProfileRepo setters. Extends
                                     TriggerRenewal: looks up the
                                     profile, checks RequiresApproval,
                                     creates the Job at
                                     JobStatusAwaitingApproval (override
                                     the keygen-mode default), then
                                     calls approvalSvc.RequestApproval
                                     to create the parallel
                                     ApprovalRequest row. On
                                     RequestApproval failure, cancels
                                     the orphan Job (defense in depth —
                                     without this, a partial failure
                                     would leave the job stuck at
                                     AwaitingApproval forever). Profile-
                                     lookup failures fall back to the
                                     unattended path (fail-open from
                                     the operator's perspective +
                                     fail-loud via slog.Warn).

Files added:
  docs/approval-workflow.md       - Sysadmin-grade operator runbook:
                                      end-to-end ASCII flowchart
                                      (operator A triggers → operator
                                      B approves → scheduler dispatches),
                                      configuration recipe, RBAC contract
                                      (the load-bearing two-person
                                      integrity rule), operator playbooks
                                      for "I need to approve a renewal"
                                      and "approval timed out", PCI-DSS
                                      6.4.5 / NIST 800-53 SA-15 / SOC 2
                                      CC6.1 / HIPAA control mapping
                                      table, bypass-mode warnings with
                                      the exact compliance-detection SQL
                                      query, Prometheus metric reference,
                                      future free V2 work pointers.

Out of scope of THIS commit (deferred follow-on, not blocking the rest):
  - RenewalService.CheckExpiringCertificates auto-renewal-loop gate.
    The manual TriggerRenewal entry point is gated and the job-level
    timeout reaper already covers AwaitingApproval; the auto-renewal
    gate adds parity. Trivial to add — one block in renewal.go that
    mirrors the certificate.go::TriggerRenewal gate. Tracked in
    WORKSPACE-ROADMAP under the Approval-workflow extensions section.
  - Scheduler reaper extension calling ApprovalService.ExpireStale.
    Today: when the existing reaper times out an AwaitingApproval job,
    the parallel ApprovalRequest row stays at state=pending. The audit
    timeline is still correct (the job-side audit row records the
    timeout) but the dashboard shows a row that no longer needs human
    review. Trivial to wire — one method call in the existing
    scheduler tick. Same WORKSPACE-ROADMAP follow-on.
  - api/openapi.yaml extensions for the 4 new operationIds.
    The HTTP contract is pinned by the handler-level tests; OpenAPI
    is documentation that mirrors the contract.
  - docs/connectors.md `requires_approval` row in the CertificateProfile
    config table. Tracked in the same follow-on; the new
    docs/approval-workflow.md is the canonical reference.

Workspace-level updates (in cowork/, not under certctl/ git control —
applied separately):
  WORKSPACE-ROADMAP.md            - "Approval-workflow extensions"
                                     section under "Future Free V2 Work"
                                     covering M-of-N chains + time-
                                     windowed auto-approve + external
                                     ticketing + per-owner routing +
                                     delegation. All items free under
                                     BSL — no V3-Pro framing per the
                                     2026-05-03 strategy pivot (open
                                     core under BSL; future revenue =
                                     managed-service hosting).

Verified locally:
  gofmt: clean.
  go vet ./...: exit 0.
  go build ./...: exit 0 — full repo links cleanly with the new
    Approval wiring.
  go test -short -count=1 -run TestApproval
    ./internal/service/... ./internal/api/handler/...:
    ok 0.005s for both packages — all 11 approval tests green
    (8 service-level + 3 handler-level).

Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
Commits: 20252758043e2b81632eb → THIS COMMIT.
This commit is contained in:
shankar0123
2026-05-04 01:12:07 +00:00
parent 81632eb0f3
commit 03c61f4c20
4 changed files with 294 additions and 1 deletions
+36
View File
@@ -28,6 +28,11 @@ type Config struct {
SCEP SCEPConfig
Verification VerificationConfig
ACME ACMEConfig
// Approval is the issuance approval-workflow primitive's runtime
// config. Rank 7 of the 2026-05-03 Infisical deep-research
// deliverable. The single field — BypassEnabled — short-circuits
// the workflow for dev/CI; production deploys MUST leave it false.
Approval ApprovalConfig
// ACMEServer is the SERVER-side ACME (RFC 8555 + RFC 9773 ARI)
// configuration. Distinct from ACME above (which is the consumer-
// side issuer connector that talks UP to Let's Encrypt / pebble).
@@ -1425,6 +1430,29 @@ type SchedulerConfig struct {
K8sDeployKubeletSyncTimeout time.Duration
}
// ApprovalConfig contains issuance approval-workflow runtime configuration.
// Rank 7 of the 2026-05-03 Infisical deep-research deliverable.
type ApprovalConfig struct {
// BypassEnabled short-circuits the approval workflow — every
// RequestApproval call auto-approves with decidedBy="system-bypass"
// (see domain.ApprovalActorSystemBypass) and emits an audit row with
// ActorType=System. Used by dev / CI to keep renewal-scheduler tests
// fast without standing up an approver.
//
// **PRODUCTION DEPLOYS MUST LEAVE THIS FALSE.** A simple SQL query
// detects misuse:
//
// SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
//
// returns zero in production and a high count in dev. The bypass
// also emits a typed audit event (action=approval_bypassed) so
// compliance auditors can pattern-match without scanning JSON
// metadata.
//
// Setting: CERTCTL_APPROVAL_BYPASS environment variable. Default: false.
BypassEnabled bool
}
// LogConfig contains logging configuration.
type LogConfig struct {
// Level sets the minimum log level for output.
@@ -1839,6 +1867,14 @@ func Load() (*Config, error) {
ExternalAccountRequired: getEnvBool("CERTCTL_ACME_SERVER_EAB_REQUIRED", false),
},
},
Approval: ApprovalConfig{
// Rank 7. Default: false. Production deploys must leave it false;
// the bypass emits a typed audit row (action=approval_bypassed,
// actor=system-bypass) so compliance auditors detect misuse via
// SELECT count(*) FROM audit_events WHERE actor='system-bypass'
// returning > 0.
BypassEnabled: getEnvBool("CERTCTL_APPROVAL_BYPASS", false),
},
Digest: DigestConfig{
Enabled: getEnvBool("CERTCTL_DIGEST_ENABLED", false),
Interval: getEnvDuration("CERTCTL_DIGEST_INTERVAL", 24*time.Hour),