diff --git a/docs/approval-workflow.md b/docs/approval-workflow.md
index add8550..e11eb3c 100644
--- a/docs/approval-workflow.md
+++ b/docs/approval-workflow.md
@@ -6,42 +6,25 @@ Closes the procurement-checklist question "How do you enforce two-person integri
## End-to-end flow
-```
-Operator A (or scheduler) Operator B
- │ │
- ▼ │
-POST /api/v1/certificates/ │
- {id}/renew │
- (or renewal-loop tick) │
- │ │
- ▼ │
-CertificateService.TriggerRenewal │
- ├── reads profile.RequiresApproval │
- ├── creates Job at │
- │ JobStatusAwaitingApproval │
- └── creates parallel │
- ApprovalRequest row │
- (state=pending, │
- requested_by=Operator A) │
- │ │
- │ scheduler skips — │
- │ AwaitingApproval is │
- │ NOT a dispatchable status │
- │ │
- │ GET /api/v1/approvals?state=pending
- │ ▼
- │ POST /api/v1/approvals/{id}/approve
- │ │
- ▼ ▼
-ApprovalService.Approve(decided_by=Operator B, note=...)
- ├── RBAC: rejects if Operator B == Operator A → ErrApproveBySameActor (HTTP 403)
- ├── transitions ApprovalRequest to state=approved
- ├── transitions Job from AwaitingApproval → Pending
- ├── records audit row (action=approval_approved, actor=Operator B)
- └── increments certctl_approval_decisions_total{outcome=approved,profile_id=...}
- │
- ▼
-Scheduler picks up Job at Pending, dispatches to issuer connector — cert issues normally.
+```mermaid
+sequenceDiagram
+ autonumber
+ participant A as Operator A
(or scheduler)
+ participant SVC as CertificateService
.TriggerRenewal
+ participant JOB as Job + ApprovalRequest
+ participant B as Operator B
+ participant APR as ApprovalService.Approve
+ participant SCH as Scheduler
+
+ A->>SVC: POST /api/v1/certificates/{id}/renew
(or renewal-loop tick)
+ SVC->>JOB: read profile.RequiresApproval;
create Job @ JobStatusAwaitingApproval;
create ApprovalRequest
(state=pending, requested_by=Operator A)
+ Note over JOB,SCH: Scheduler skips —
AwaitingApproval is NOT a dispatchable status
+ B->>JOB: GET /api/v1/approvals?state=pending
+ B->>APR: POST /api/v1/approvals/{id}/approve
(decided_by=Operator B, note=...)
+ APR->>APR: RBAC: reject if Operator B == Operator A
→ ErrApproveBySameActor (HTTP 403)
+ APR->>JOB: ApprovalRequest → state=approved;
Job AwaitingApproval → Pending;
audit row (action=approval_approved,
actor=Operator B);
certctl_approval_decisions_total
{outcome=approved,profile_id=...}++
+ SCH->>JOB: pick up Pending → dispatch to issuer connector
+ JOB-->>A: cert issues normally
```
## Configuration
diff --git a/docs/intermediate-ca-hierarchy.md b/docs/intermediate-ca-hierarchy.md
index d3ebc61..eef7e3c 100644
--- a/docs/intermediate-ca-hierarchy.md
+++ b/docs/intermediate-ca-hierarchy.md
@@ -43,19 +43,13 @@ reference can leak.
## Lifecycle states
-```
-created (CreateRoot or CreateChild)
- │
- ▼
-active (issuing certs)
- │
- ▼
-retiring (drain — children still active; this CA stops issuing
- NEW children but existing children continue)
- │
- ▼
-retired (terminal — no issuance, OCSP responder keeps responding
- for already-issued leaves until expiry)
+```mermaid
+stateDiagram-v2
+ [*] --> created : CreateRoot / CreateChild
+ created --> active : registration completes
+ active --> retiring : Retire(confirm=false) —
drain start; this CA stops issuing
NEW children but existing children continue
+ retiring --> retired : Retire(confirm=true) —
terminal; refused if active children remain
(ErrCAStillHasActiveChildren → HTTP 409)
+ retired --> [*] : no issuance;
OCSP keeps responding for
already-issued leaves until expiry
```
Drain-first semantics: a CA in `retiring` state cannot terminalize to
@@ -67,11 +61,13 @@ the children first.
### Pattern A — 4-level FedRAMP boundary CA
-```
-Acme Root CA (path_len=3, offline air-gapped)
- └── Acme Policy CA (path_len=2, FedRAMP-Moderate boundary)
- └── Acme Issuing A (path_len=0, prod workload leaves)
- └── Acme Issuing B (path_len=0, ephemeral pod identity)
+```mermaid
+flowchart TD
+ Root["Acme Root CA
path_len=3
offline air-gapped"]
+ Policy["Acme Policy CA
path_len=2
FedRAMP-Moderate boundary"]
+ IssA["Acme Issuing A
path_len=0
prod workload leaves"]
+ IssB["Acme Issuing B
path_len=0
ephemeral pod identity"]
+ Root --> Policy --> IssA --> IssB
```
Operator workflow:
@@ -98,10 +94,12 @@ Operator workflow:
### Pattern B — 3-level financial-services policy CA
-```
-FinCo Root CA (path_len=2)
- └── FinCo Trading Policy CA (path_len=1; permitted DNS = trading.finco.example)
- └── FinCo Trading Issuing CA (path_len=0)
+```mermaid
+flowchart TD
+ Root["FinCo Root CA
path_len=2"]
+ Pol["FinCo Trading Policy CA
path_len=1
permitted DNS = trading.finco.example"]
+ Iss["FinCo Trading Issuing CA
path_len=0"]
+ Root --> Pol --> Iss
```
Per business-unit name constraints: each policy CA carries a
@@ -113,9 +111,11 @@ excluded subtree. Operators submit `name_constraints` on the
### Pattern C — 2-level internal PKI
-```
-Internal Root CA (path_len=0)
- └── Internal Issuing CA (path_len=0; issues leaves directly)
+```mermaid
+flowchart TD
+ Root["Internal Root CA
path_len=0"]
+ Iss["Internal Issuing CA
path_len=0
issues leaves directly"]
+ Root --> Iss
```
The simplest tree-mode deployment. Roughly equivalent to single mode
diff --git a/docs/runbook-cloud-targets.md b/docs/runbook-cloud-targets.md
index 47a9f6f..7391792 100644
--- a/docs/runbook-cloud-targets.md
+++ b/docs/runbook-cloud-targets.md
@@ -15,42 +15,39 @@ install certctl.
## End-to-end flow (cloud targets)
-```
- cert renewed → renewal job created
- │
- ▼
- agent picks up DeployCertificate work item
- │
- ▼
- target.Connector.DeployCertificate(ctx, request)
- │
- ┌──────────────────┴──────────────────┐
- │ │
- ▼ ▼
- AWS ACM path Azure Key Vault path
- │ │
- ▼ ▼
- 1. (rotate-in-place only) 1. GetCertificate(name, "" /* latest */)
- DescribeCertificate(arn) — capture snapshot CER bytes
- 2. GetCertificate(arn) — capture 2. Build PFX from cert+chain+key
- snapshot bytes for rollback (PKCS#12 via go-pkcs12)
- 3. ImportCertificate(arn, new_bytes) 3. ImportCertificate(name, PFX, tags)
- — fresh ARN OR rotate-in-place — ALWAYS creates a new version
- 4. AddTagsToCertificate(arn, 4. (Tags carried forward
- provenance) — ACM strips on automatically)
- re-import; we re-apply
- 5. DescribeCertificate(arn) — verify 5. GetCertificate(name, "" /* latest */)
- serial matches expected — verify serial matches expected
- 6. ON MISMATCH: rollback ←──── (same shape) ────→ 6. ON MISMATCH: rollback
- ImportCertificate(arn, ImportCertificate(name,
- snapshot_bytes) snapshot_PFX) — new version
- │
- ▼
- 7. Audit row + Prometheus counter
- certctl_deploy_attempts_total{target_type="AWSACM"|"AzureKeyVault",
- result="success"|"failure"}
- certctl_deploy_rollback_total{target_type=...,
- outcome="restored"|"also_failed"}
+```mermaid
+flowchart TD
+ Renew["cert renewed → renewal job created"]
+ Pick["agent picks up DeployCertificate work item"]
+ Dispatch["target.Connector.DeployCertificate(ctx, request)"]
+
+ Renew --> Pick --> Dispatch
+ Dispatch --> AWS
+ Dispatch --> AZ
+
+ subgraph AWS["AWS ACM path"]
+ A1["1. rotate-in-place only:
DescribeCertificate(arn)"]
+ A2["2. GetCertificate(arn) —
capture snapshot bytes for rollback"]
+ A3["3. ImportCertificate(arn, new_bytes) —
fresh ARN OR rotate-in-place"]
+ A4["4. AddTagsToCertificate(arn, provenance) —
ACM strips on re-import; we re-apply"]
+ A5["5. DescribeCertificate(arn) —
verify serial matches expected"]
+ A6["6. ON MISMATCH: rollback
ImportCertificate(arn, snapshot_bytes)"]
+ A1 --> A2 --> A3 --> A4 --> A5 --> A6
+ end
+
+ subgraph AZ["Azure Key Vault path"]
+ Z1["1. GetCertificate(name, '' = latest) —
capture snapshot CER bytes"]
+ Z2["2. Build PFX from cert+chain+key
(PKCS#12 via go-pkcs12)"]
+ Z3["3. ImportCertificate(name, PFX, tags) —
ALWAYS creates a new version"]
+ Z4["4. Tags carried forward automatically"]
+ Z5["5. GetCertificate(name, '' = latest) —
verify serial matches expected"]
+ Z6["6. ON MISMATCH: rollback
ImportCertificate(name, snapshot_PFX) —
new version"]
+ Z1 --> Z2 --> Z3 --> Z4 --> Z5 --> Z6
+ end
+
+ A6 --> Audit
+ Z6 --> Audit
+ Audit["7. Audit row + Prometheus counters
certctl_deploy_attempts_total{target_type, result}
certctl_deploy_rollback_total{target_type, outcome}"]
```
---
diff --git a/docs/runbook-expiry-alerts.md b/docs/runbook-expiry-alerts.md
index f5e7db7..a493531 100644
--- a/docs/runbook-expiry-alerts.md
+++ b/docs/runbook-expiry-alerts.md
@@ -14,36 +14,37 @@ walkthrough of how to install certctl — that lives in the README.
## End-to-end flow
-```
- daily ticker (renewalCheckLoop)
- │
- ▼
- RenewalService.CheckExpiringCertificates
- │
- ┌────────────────┴────────────────┐
- │ for cert in expiring (≤30 days):│
- │ 1. Resolve RenewalPolicy │
- │ 2. Compute daysUntil │
- │ 3. updateCertExpiryStatus │
- │ 4. sendThresholdAlerts ──────►│ per threshold:
- │ 5. Create renewal job (if │ a. resolve severity tier
- │ issuer registered + ARI │ via AlertSeverityMap
- │ allows) │ b. resolve channel set
- └──────────────────────────────────┘ via AlertChannels[tier]
- c. for each channel:
- i. dedup via
- notification_events
- (cert,threshold,channel)
- ii. SendThresholdAlertOnChannel
- → notifierRegistry[channel]
- → Send(recipient,subj,body)
- iii. record audit row
- (event_type=expiration_alert_sent,
- metadata.channel,
- metadata.severity_tier)
- iv. bump Prometheus counter
- certctl_expiry_alerts_total
- {channel,threshold,result}
+```mermaid
+flowchart TD
+ Tick["daily ticker (renewalCheckLoop)"]
+ Check["RenewalService.CheckExpiringCertificates"]
+
+ Tick --> Check --> Loop
+
+ subgraph Loop["for cert in expiring (≤30 days)"]
+ L1["1. Resolve RenewalPolicy"]
+ L2["2. Compute daysUntil"]
+ L3["3. updateCertExpiryStatus"]
+ L4["4. sendThresholdAlerts"]
+ L5["5. Create renewal job
(if issuer registered +
ARI allows)"]
+ L1 --> L2 --> L3 --> L4 --> L5
+ end
+
+ L4 --> Threshold
+
+ subgraph Threshold["per threshold"]
+ T1["a. resolve severity tier
via AlertSeverityMap"]
+ T2["b. resolve channel set
via AlertChannels[tier]"]
+ T1 --> T2 --> Channel
+ end
+
+ subgraph Channel["for each channel (fault-isolating)"]
+ C1["i. dedup via notification_events
(cert, threshold, channel)"]
+ C2["ii. SendThresholdAlertOnChannel
→ notifierRegistry[channel]
→ Send(recipient, subj, body)"]
+ C3["iii. record audit row
event_type=expiration_alert_sent
metadata.channel, metadata.severity_tier"]
+ C4["iv. bump Prometheus counter
certctl_expiry_alerts_total
{channel, threshold, result}"]
+ C1 --> C2 --> C3 --> C4
+ end
```
The dispatch loop's per-channel error handling is