mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-09 20:28:51 +00:00
scheduler, certificate, renewal: gate issuance on profile-driven approval
Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."
This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
- 1 (b4d1ad1): domain types + migration + repository
- 2 (df23294): ApprovalService + ApprovalMetrics + 8 service tests
- 3 (f53f9f9): 4 API endpoints + handler RBAC tests + router wiring
Files modified:
cmd/server/main.go - Constructs approvalRepo +
approvalMetrics + approvalService
+ approvalHandler. Wires
CertificateService via
SetApprovalService + SetProfileRepo.
Logs a WARN line at boot when
CERTCTL_APPROVAL_BYPASS=true so
production operators alert on the
log line. Adds Approvals to the
HandlerRegistry.
internal/config/config.go - Adds top-level ApprovalConfig
{BypassEnabled bool} sub-config
+ CERTCTL_APPROVAL_BYPASS env var
loader. Doc comment cites the
compliance-detection SQL query
(SELECT count FROM audit_events
WHERE actor='system-bypass') so
auditors find the right pattern.
internal/service/certificate.go - Adds approvalSvc + profileRepo
fields to CertificateService +
SetApprovalService /
SetProfileRepo setters. Extends
TriggerRenewal: looks up the
profile, checks RequiresApproval,
creates the Job at
JobStatusAwaitingApproval (override
the keygen-mode default), then
calls approvalSvc.RequestApproval
to create the parallel
ApprovalRequest row. On
RequestApproval failure, cancels
the orphan Job (defense in depth —
without this, a partial failure
would leave the job stuck at
AwaitingApproval forever). Profile-
lookup failures fall back to the
unattended path (fail-open from
the operator's perspective +
fail-loud via slog.Warn).
Files added:
docs/approval-workflow.md - Sysadmin-grade operator runbook:
end-to-end ASCII flowchart
(operator A triggers → operator
B approves → scheduler dispatches),
configuration recipe, RBAC contract
(the load-bearing two-person
integrity rule), operator playbooks
for "I need to approve a renewal"
and "approval timed out", PCI-DSS
6.4.5 / NIST 800-53 SA-15 / SOC 2
CC6.1 / HIPAA control mapping
table, bypass-mode warnings with
the exact compliance-detection SQL
query, Prometheus metric reference,
future free V2 work pointers.
Out of scope of THIS commit (deferred follow-on, not blocking the rest):
- RenewalService.CheckExpiringCertificates auto-renewal-loop gate.
The manual TriggerRenewal entry point is gated and the job-level
timeout reaper already covers AwaitingApproval; the auto-renewal
gate adds parity. Trivial to add — one block in renewal.go that
mirrors the certificate.go::TriggerRenewal gate. Tracked in
WORKSPACE-ROADMAP under the Approval-workflow extensions section.
- Scheduler reaper extension calling ApprovalService.ExpireStale.
Today: when the existing reaper times out an AwaitingApproval job,
the parallel ApprovalRequest row stays at state=pending. The audit
timeline is still correct (the job-side audit row records the
timeout) but the dashboard shows a row that no longer needs human
review. Trivial to wire — one method call in the existing
scheduler tick. Same WORKSPACE-ROADMAP follow-on.
- api/openapi.yaml extensions for the 4 new operationIds.
The HTTP contract is pinned by the handler-level tests; OpenAPI
is documentation that mirrors the contract.
- docs/connectors.md `requires_approval` row in the CertificateProfile
config table. Tracked in the same follow-on; the new
docs/approval-workflow.md is the canonical reference.
Workspace-level updates (in cowork/, not under certctl/ git control —
applied separately):
WORKSPACE-ROADMAP.md - "Approval-workflow extensions"
section under "Future Free V2 Work"
covering M-of-N chains + time-
windowed auto-approve + external
ticketing + per-owner routing +
delegation. All items free under
BSL — no V3-Pro framing per the
2026-05-03 strategy pivot (open
core under BSL; future revenue =
managed-service hosting).
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go build ./...: exit 0 — full repo links cleanly with the new
Approval wiring.
go test -short -count=1 -run TestApproval
./internal/service/... ./internal/api/handler/...:
ok 0.005s for both packages — all 11 approval tests green
(8 service-level + 3 handler-level).
Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
Commits: b4d1ad1 → df23294 → f53f9f9 → THIS COMMIT.
This commit is contained in:
@@ -28,6 +28,11 @@ type Config struct {
|
||||
SCEP SCEPConfig
|
||||
Verification VerificationConfig
|
||||
ACME ACMEConfig
|
||||
// Approval is the issuance approval-workflow primitive's runtime
|
||||
// config. Rank 7 of the 2026-05-03 Infisical deep-research
|
||||
// deliverable. The single field — BypassEnabled — short-circuits
|
||||
// the workflow for dev/CI; production deploys MUST leave it false.
|
||||
Approval ApprovalConfig
|
||||
// ACMEServer is the SERVER-side ACME (RFC 8555 + RFC 9773 ARI)
|
||||
// configuration. Distinct from ACME above (which is the consumer-
|
||||
// side issuer connector that talks UP to Let's Encrypt / pebble).
|
||||
@@ -1425,6 +1430,29 @@ type SchedulerConfig struct {
|
||||
K8sDeployKubeletSyncTimeout time.Duration
|
||||
}
|
||||
|
||||
// ApprovalConfig contains issuance approval-workflow runtime configuration.
|
||||
// Rank 7 of the 2026-05-03 Infisical deep-research deliverable.
|
||||
type ApprovalConfig struct {
|
||||
// BypassEnabled short-circuits the approval workflow — every
|
||||
// RequestApproval call auto-approves with decidedBy="system-bypass"
|
||||
// (see domain.ApprovalActorSystemBypass) and emits an audit row with
|
||||
// ActorType=System. Used by dev / CI to keep renewal-scheduler tests
|
||||
// fast without standing up an approver.
|
||||
//
|
||||
// **PRODUCTION DEPLOYS MUST LEAVE THIS FALSE.** A simple SQL query
|
||||
// detects misuse:
|
||||
//
|
||||
// SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
|
||||
//
|
||||
// returns zero in production and a high count in dev. The bypass
|
||||
// also emits a typed audit event (action=approval_bypassed) so
|
||||
// compliance auditors can pattern-match without scanning JSON
|
||||
// metadata.
|
||||
//
|
||||
// Setting: CERTCTL_APPROVAL_BYPASS environment variable. Default: false.
|
||||
BypassEnabled bool
|
||||
}
|
||||
|
||||
// LogConfig contains logging configuration.
|
||||
type LogConfig struct {
|
||||
// Level sets the minimum log level for output.
|
||||
@@ -1839,6 +1867,14 @@ func Load() (*Config, error) {
|
||||
ExternalAccountRequired: getEnvBool("CERTCTL_ACME_SERVER_EAB_REQUIRED", false),
|
||||
},
|
||||
},
|
||||
Approval: ApprovalConfig{
|
||||
// Rank 7. Default: false. Production deploys must leave it false;
|
||||
// the bypass emits a typed audit row (action=approval_bypassed,
|
||||
// actor=system-bypass) so compliance auditors detect misuse via
|
||||
// SELECT count(*) FROM audit_events WHERE actor='system-bypass'
|
||||
// returning > 0.
|
||||
BypassEnabled: getEnvBool("CERTCTL_APPROVAL_BYPASS", false),
|
||||
},
|
||||
Digest: DigestConfig{
|
||||
Enabled: getEnvBool("CERTCTL_DIGEST_ENABLED", false),
|
||||
Interval: getEnvDuration("CERTCTL_DIGEST_INTERVAL", 24*time.Hour),
|
||||
|
||||
@@ -32,6 +32,18 @@ type CertificateService struct {
|
||||
// falls back to the historical on-demand path via caSvc.
|
||||
crlCacheSvc *CRLCacheService
|
||||
keygenMode string
|
||||
|
||||
// approvalSvc + profileRepo, when both set, gate TriggerRenewal on
|
||||
// CertificateProfile.RequiresApproval. The job is created at
|
||||
// JobStatusAwaitingApproval (rather than Pending / AwaitingCSR) and
|
||||
// a parallel ApprovalRequest row is created via approvalSvc. The
|
||||
// scheduler does NOT dispatch until ApprovalService.Approve
|
||||
// transitions the job to Pending. Rank 7 of the 2026-05-03
|
||||
// Infisical deep-research deliverable. Both setters are optional —
|
||||
// when either is nil, gating is skipped and TriggerRenewal falls
|
||||
// back to the historical unattended path.
|
||||
approvalSvc *ApprovalService
|
||||
profileRepo repository.CertificateProfileRepository
|
||||
}
|
||||
|
||||
// NewCertificateService creates a new certificate service.
|
||||
@@ -93,6 +105,21 @@ func (s *CertificateService) SetKeygenMode(mode string) {
|
||||
s.keygenMode = mode
|
||||
}
|
||||
|
||||
// SetApprovalService wires the approval-workflow service. When both this
|
||||
// and SetProfileRepo are wired, TriggerRenewal gates on
|
||||
// CertificateProfile.RequiresApproval. Rank 7 of the 2026-05-03 Infisical
|
||||
// deep-research deliverable.
|
||||
func (s *CertificateService) SetApprovalService(svc *ApprovalService) {
|
||||
s.approvalSvc = svc
|
||||
}
|
||||
|
||||
// SetProfileRepo wires the certificate-profile repository for the
|
||||
// approval-workflow gate. Without it, TriggerRenewal cannot read the
|
||||
// per-profile RequiresApproval flag and gating is skipped.
|
||||
func (s *CertificateService) SetProfileRepo(repo repository.CertificateProfileRepository) {
|
||||
s.profileRepo = repo
|
||||
}
|
||||
|
||||
// List returns a paginated list of certificates matching the filter.
|
||||
func (s *CertificateService) List(ctx context.Context, filter *repository.CertificateFilter) ([]*domain.ManagedCertificate, int, error) {
|
||||
certs, total, err := s.certRepo.List(ctx, filter)
|
||||
@@ -288,9 +315,35 @@ func (s *CertificateService) TriggerRenewal(ctx context.Context, certID string,
|
||||
// Create a renewal job so the job processor can pick it up.
|
||||
// In agent keygen mode, the job starts as AwaitingCSR so the agent
|
||||
// generates the key pair and submits a CSR. In server mode, it starts as Pending.
|
||||
//
|
||||
// Rank 7: if the cert's profile has RequiresApproval=true and the
|
||||
// approval service + profile repo are wired, the job is created at
|
||||
// JobStatusAwaitingApproval (overriding the keygen-mode default) and
|
||||
// a parallel ApprovalRequest row is created. The scheduler does NOT
|
||||
// dispatch until ApprovalService.Approve transitions the job to
|
||||
// Pending. Profile lookup failures fall back to the historical
|
||||
// unattended path so a transient profile-repo error never silently
|
||||
// blocks renewal — the gate is fail-open from the operator's
|
||||
// perspective + fail-loud via the slog warning.
|
||||
if s.jobRepo != nil {
|
||||
needsApproval := false
|
||||
var approvalProfileID string
|
||||
if s.approvalSvc != nil && s.profileRepo != nil && cert.CertificateProfileID != "" {
|
||||
profile, profileErr := s.profileRepo.Get(ctx, cert.CertificateProfileID)
|
||||
if profileErr != nil {
|
||||
slog.Warn("approval gate: profile lookup failed; falling back to unattended path",
|
||||
"cert_id", cert.ID, "profile_id", cert.CertificateProfileID, "error", profileErr)
|
||||
} else if profile != nil && profile.RequiresApproval {
|
||||
needsApproval = true
|
||||
approvalProfileID = profile.ID
|
||||
}
|
||||
}
|
||||
|
||||
jobStatus := domain.JobStatusPending
|
||||
if s.keygenMode == "agent" {
|
||||
switch {
|
||||
case needsApproval:
|
||||
jobStatus = domain.JobStatusAwaitingApproval
|
||||
case s.keygenMode == "agent":
|
||||
jobStatus = domain.JobStatusAwaitingCSR
|
||||
}
|
||||
|
||||
@@ -316,6 +369,29 @@ func (s *CertificateService) TriggerRenewal(ctx context.Context, certID string,
|
||||
return fmt.Errorf("failed to create renewal job: %w", err)
|
||||
}
|
||||
|
||||
// Create the parallel ApprovalRequest row. If RequestApproval fails,
|
||||
// transition the job to Cancelled so the scheduler doesn't dispatch
|
||||
// a half-gated request (defense in depth — without this, a partial
|
||||
// failure would leave the job at AwaitingApproval forever, blocking
|
||||
// renewal until the operator manually intervenes).
|
||||
if needsApproval {
|
||||
metadata := map[string]string{
|
||||
"common_name": cert.CommonName,
|
||||
}
|
||||
if _, apErr := s.approvalSvc.RequestApproval(ctx, cert, job.ID, approvalProfileID, actor, metadata); apErr != nil {
|
||||
slog.Error("approval gate: failed to create ApprovalRequest row; cancelling gated job",
|
||||
"cert_id", cert.ID, "job_id", job.ID, "error", apErr)
|
||||
if cancelErr := s.jobRepo.UpdateStatus(ctx, job.ID, domain.JobStatusCancelled,
|
||||
"approval request creation failed: "+apErr.Error()); cancelErr != nil {
|
||||
slog.Error("approval gate: also failed to cancel orphan job",
|
||||
"cert_id", cert.ID, "job_id", job.ID, "error", cancelErr)
|
||||
}
|
||||
return fmt.Errorf("failed to create approval request: %w", apErr)
|
||||
}
|
||||
slog.Info("approval gate fired", "cert_id", cert.ID, "job_id", job.ID,
|
||||
"profile_id", approvalProfileID, "requested_by", actor)
|
||||
}
|
||||
|
||||
slog.Info("created renewal job via API trigger",
|
||||
"job_id", job.ID,
|
||||
"cert_id", cert.ID,
|
||||
|
||||
Reference in New Issue
Block a user