Files
certctl/docs/runbook-cloud-targets.md
T
shankar0123 85649cf983 docs: convert remaining ASCII diagrams to mermaid (audit closure)
Audit pass over docs/ found 4 files with non-mermaid (ASCII
box-drawing) diagrams in fenced code blocks. The other 9 doc files
already used mermaid blocks (architecture.md, demo-advanced.md,
ci-pipeline.md, concepts.md, est.md, legacy-est-scep.md, mcp.md,
qa-test-guide.md, scep-intune.md). Rendering parity for everything
in docs/.

Conversions:

  approval-workflow.md
    1 ASCII swimlane → sequenceDiagram with named participants
    (Operator A / CertificateService / Job+ApprovalRequest /
    Operator B / ApprovalService / Scheduler). Same content: the
    same-actor RBAC reject path, the AwaitingApproval gate, the
    audit + Prometheus side effects.

  intermediate-ca-hierarchy.md
    1 lifecycle ASCII → stateDiagram-v2 (created → active → retiring
    → retired with the drain-first refusal annotation).
    3 ASCII tree patterns → 3 flowchart TD diagrams (FedRAMP 4-level
    boundary CA, financial-services 3-level policy CA, internal-PKI
    2-level). Same depth, same path_len + permitted-DNS labels.

  runbook-cloud-targets.md
    1 dual-column ASCII flow → flowchart TD with two subgraphs
    (AWS ACM path, Azure Key Vault path) joining at the audit +
    Prometheus exposer node. Same 6-step deploy sequence on each
    side with the rollback-on-mismatch step explicit.

  runbook-expiry-alerts.md
    1 nested-loop ASCII flow → flowchart TD with three nested
    subgraphs (per-cert main loop / per-threshold inner / per-channel
    fault-isolating dispatch). Same dedup + Prometheus + audit-row
    side effects per channel.

Verified locally:
  Audit re-run: every fenced block in docs/*.md that does NOT open
    with ```mermaid contains zero ASCII box-drawing characters
    (┌ └ │ ─ ━ ═ ║ ╔ ╚ ▼ ▲).
  Mermaid block tally: 39 across 13 files (up from 32 across 9
    files pre-audit). The +7 new blocks are the 4 conversions plus
    the lifecycle + 3 tree patterns expanded out of the single
    intermediate-ca-hierarchy.md ASCII section.

No code or test changes. Doc-only commit.
2026-05-04 02:40:01 +00:00

9.8 KiB
Raw Blame History

Runbook: cloud-target deployment connectors (AWS ACM + Azure Key Vault)

This runbook covers the SDK-driven cloud target connectors that ship in certctl post-2026-05-03 (Rank 5 of the Infisical deep-research deliverable). It complements the operator-facing AWS Certificate Manager and Azure Key Vault sections in docs/connectors.md.

Audience: a platform sysadmin or SRE who needs to configure, debug, or audit certctl's cloud-target deploys. Not a walkthrough of how to install certctl.


End-to-end flow (cloud targets)

flowchart TD
    Renew["cert renewed → renewal job created"]
    Pick["agent picks up DeployCertificate work item"]
    Dispatch["target.Connector.DeployCertificate(ctx, request)"]

    Renew --> Pick --> Dispatch
    Dispatch --> AWS
    Dispatch --> AZ

    subgraph AWS["AWS ACM path"]
        A1["1. rotate-in-place only:<br/>DescribeCertificate(arn)"]
        A2["2. GetCertificate(arn) —<br/>capture snapshot bytes for rollback"]
        A3["3. ImportCertificate(arn, new_bytes) —<br/>fresh ARN OR rotate-in-place"]
        A4["4. AddTagsToCertificate(arn, provenance) —<br/>ACM strips on re-import; we re-apply"]
        A5["5. DescribeCertificate(arn) —<br/>verify serial matches expected"]
        A6["6. ON MISMATCH: rollback<br/>ImportCertificate(arn, snapshot_bytes)"]
        A1 --> A2 --> A3 --> A4 --> A5 --> A6
    end

    subgraph AZ["Azure Key Vault path"]
        Z1["1. GetCertificate(name, '' = latest) —<br/>capture snapshot CER bytes"]
        Z2["2. Build PFX from cert+chain+key<br/>(PKCS#12 via go-pkcs12)"]
        Z3["3. ImportCertificate(name, PFX, tags) —<br/>ALWAYS creates a new version"]
        Z4["4. Tags carried forward automatically"]
        Z5["5. GetCertificate(name, '' = latest) —<br/>verify serial matches expected"]
        Z6["6. ON MISMATCH: rollback<br/>ImportCertificate(name, snapshot_PFX) —<br/>new version"]
        Z1 --> Z2 --> Z3 --> Z4 --> Z5 --> Z6
    end

    A6 --> Audit
    Z6 --> Audit
    Audit["7. Audit row + Prometheus counters<br/>certctl_deploy_attempts_total{target_type, result}<br/>certctl_deploy_rollback_total{target_type, outcome}"]

Configuring an AWS ACM target

Minimum config

curl -X POST https://certctl.example.com/api/v1/targets \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Production ALB cert",
    "type": "AWSACM",
    "agent_id": "ag-server",
    "config": {
      "region": "us-east-1",
      "tags": {"env": "production"}
    }
  }'

Empty certificate_arn on first deploy = ACM creates a fresh ARN; the deployment record's Metadata captures it. Update the deployment_targets.config.certificate_arn field via the GUI / API / direct SQL to pin the ARN for subsequent renewals.

Minimum IAM policy

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "acm:ImportCertificate",
      "acm:GetCertificate",
      "acm:DescribeCertificate",
      "acm:ListCertificates",
      "acm:AddTagsToCertificate"
    ],
    "Resource": "arn:aws:acm:us-east-1:*:certificate/*"
  }]
}

Pin Resource to the specific region / account where the ALB lives. Cross-account deploys use AssumeRole — configure the agent's role with sts:AssumeRole against the target account's role ARN.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: certctl-agent
  namespace: certctl-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/certctl-acm-deployer

Trust policy on certctl-acm-deployer:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE:sub": "system:serviceaccount:certctl-system:certctl-agent"
      }
    }
  }]
}

Configuring an Azure Key Vault target

Minimum config

curl -X POST https://certctl.example.com/api/v1/targets \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Production AGW cert",
    "type": "AzureKeyVault",
    "agent_id": "ag-server",
    "config": {
      "vault_url": "https://prod-vault.vault.azure.net",
      "certificate_name": "api-prod",
      "credential_mode": "managed_identity",
      "tags": {"env": "production"}
    }
  }'

Minimum RBAC role

Off-the-shelf builtin: Key Vault Certificates Officer (assigns at the vault scope).

Custom minimum-permission role:

{
  "properties": {
    "roleName": "certctl-keyvault-deployer",
    "description": "Minimum permissions for certctl Key Vault target",
    "assignableScopes": [
      "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>"
    ],
    "permissions": [{
      "actions": [],
      "notActions": [],
      "dataActions": [
        "Microsoft.KeyVault/vaults/certificates/import/action",
        "Microsoft.KeyVault/vaults/certificates/read",
        "Microsoft.KeyVault/vaults/certificates/listversions/read"
      ],
      "notDataActions": []
    }]
  }
}

Annotate the agent's ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: certctl-agent
  namespace: certctl-system
  annotations:
    azure.workload.identity/client-id: <app-registration-client-id>
  labels:
    azure.workload.identity/use: "true"

Federated credential on the app registration:

{
  "name": "certctl-agent-federated",
  "issuer": "https://<oidc-issuer-url>",
  "subject": "system:serviceaccount:certctl-system:certctl-agent",
  "audiences": ["api://AzureADTokenExchange"]
}

Set credential_mode: workload_identity on the deployment_target config.


Operator playbook

"Did the cert get imported to ACM / Key Vault?"

AWS:

aws acm describe-certificate \
  --certificate-arn arn:aws:acm:us-east-1:...:certificate/<id> \
  --query 'Certificate.{Status:Status,Serial:Serial,Issued:IssuedAt,NotAfter:NotAfter,Tags:[Tags]}'

Azure:

az keyvault certificate show \
  --vault-name prod-vault \
  --name api-prod \
  --query '{Serial:x509ThumbprintHex, Version:id, NotAfter:attributes.expires}'

In both cases, the certctl-managed-by tag confirms the cert was imported by certctl (and not someone running aws-cli directly).

"Why did the rollback fail?"

The Prometheus counter certctl_deploy_rollback_total{outcome="also_failed"} ticks when the rollback's own ImportCertificate / Set call also returns an error. Look at the agent's slog at ERROR level for the per-call diagnostic; the underlying cloud SDK error message tells you whether it was IAM denial, throttling, or a structural input problem.

Manual recovery:

AWS ACM:

# Get the snapshot of a known-good cert from S3 / Vault / wherever the
# operator stores backup PEMs:
aws acm import-certificate \
  --certificate fileb://known-good.crt \
  --private-key  fileb://known-good.key \
  --certificate-chain fileb://known-good.chain \
  --certificate-arn  arn:aws:acm:us-east-1:...:certificate/<id> \
  --tags Key=certctl-managed-by,Value=manual-recovery

Azure Key Vault:

# Import a fresh PFX as a new version under the same name:
az keyvault certificate import \
  --vault-name prod-vault \
  --name api-prod \
  --file known-good.pfx \
  --tags certctl-managed-by=manual-recovery

After the manual recovery, certctl's next renewal-loop tick re-verifies the live cert via ValidateDeployment and resumes normal operation.

"How do I know certctl is the only one writing to this ARN / vault cert?"

AWS — via CloudTrail:

EventName = "ImportCertificate"
Resources.ARN = "arn:aws:acm:us-east-1:...:certificate/<id>"

Filter by user identity to see which principal made each call. The certctl agent's IAM role / IRSA-bound role should be the only writer.

Azure — via Activity Log:

az monitor activity-log list \
  --resource-id /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault>/certificates/<name> \
  --offset 30d \
  --query "[?operationName.value=='Microsoft.KeyVault/vaults/certificates/import/action'].{caller:caller, time:eventTimestamp}"

Cardinality + cost

  • Per-target-type Prometheus counters: 2 new certctl_deploy_attempts_total series (AWSACM + AzureKeyVault) × 2 results = 4 series. Comfortable.
  • AWS ACM costs: ImportCertificate is free; CloudTrail logs at $2 per GB. Renewing 100 certs/month adds ~10 KB to CloudTrail.
  • Azure Key Vault costs: certificate operations $0.03 per 10K operations (V2 pricing as of 2026-05). 100 certs/month = $0.0009 in cert-op spend. Activity Log retention is configurable (default 90 days, free).

V3-Pro forward path

Tracked at cowork/WORKSPACE-ROADMAP.md under "Adapter hardening":

  • AWS CloudFront direct-attach — UpdateDistribution after an ACM ImportCertificate so the CloudFront edge picks up the new cert without operator intervention. Requires cloudfront:UpdateDistribution IAM permission on top of the ACM minimum.
  • Azure Front Door direct-attach — UpdateRoutingConfig equivalent.
  • AWS ALB / Azure App Gateway auto-bind — currently operators attach the ARN / KID URI to the LB out-of-band (Terraform); V3-Pro adds the auto-attach step.
  • Soft-delete recovery for Azure Key Vault — V2 always re-imports as a new version; V3 detects soft-deleted prior versions and offers operator-confirmed recovery.
  • GCP Certificate Manager target — Google Cloud's equivalent to ACM; mirrors the AWS ACM connector shape. Separate cloud, separate connector.