Files
certctl/docs/operator/runbooks/cloud-targets.md
T
shankar0123 3a807ae37e docs: Phase 2 mechanical file moves to subdirectory structure
Pure git mv operations; no content edits. Internal links remain pointing
at old paths and will be fixed in Phase 11. Per the Phase 1 audit
recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/.

35 files moved across 8 audience-organized subdirectories:

  docs/getting-started/ (5):
    quickstart.md, concepts.md, examples.md, advanced-demo.md (was
    demo-advanced.md), why-certctl.md

  docs/reference/ (6):
    architecture.md, api.md (was openapi.md), mcp.md,
    intermediate-ca-hierarchy.md, deployment-model.md (was
    deployment-atomicity.md), vendor-matrix.md (was
    deployment-vendor-matrix.md)

  docs/reference/protocols/ (6):
    acme-server.md, acme-server-threat-model.md, scep-intune.md,
    est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md)

  docs/operator/ (4):
    security.md, tls.md, database-tls.md, approval-workflow.md

  docs/operator/runbooks/ (3):
    cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md
    (was runbook-expiry-alerts.md), disaster-recovery.md

  docs/migration/ (3):
    from-certbot.md (was migrate-from-certbot.md), from-acmesh.md
    (was migrate-from-acmesh.md), cert-manager-coexistence.md (was
    certctl-for-cert-manager-users.md)

  docs/compliance/ (4):
    index.md (was compliance.md), soc2.md (was compliance-soc2.md),
    pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was
    compliance-nist.md)

  docs/contributor/ (4):
    testing-strategy.md, test-environment.md (was test-env.md),
    ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md)

Deferred to later Phase 2 sub-phases:
  - connectors.md split (Phase 4): docs/connectors.md +
    docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level
  - testing-guide.md prune (Phase 5): docs/testing-guide.md still
    at top level
  - features.md disperse (Phase 6): docs/features.md still at top
    level
  - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md
    still at top level
  - ACME walkthrough re-homing (Phase 8): three
    docs/acme-*-walkthrough.md still at top level
  - Upgrade docs archive (Phase 3): two docs/upgrade-*.md still
    at top level

Cross-reference updates (Phase 11) will happen after all moves and
content edits land. Internal links to docs/* paths are temporarily
broken until that phase completes.
2026-05-05 02:49:28 +00:00

9.8 KiB
Raw Blame History

Runbook: cloud-target deployment connectors (AWS ACM + Azure Key Vault)

This runbook covers the SDK-driven cloud target connectors that ship in certctl post-2026-05-03 (Rank 5 of the Infisical deep-research deliverable). It complements the operator-facing AWS Certificate Manager and Azure Key Vault sections in docs/connectors.md.

Audience: a platform sysadmin or SRE who needs to configure, debug, or audit certctl's cloud-target deploys. Not a walkthrough of how to install certctl.


End-to-end flow (cloud targets)

flowchart TD
    Renew["cert renewed → renewal job created"]
    Pick["agent picks up DeployCertificate work item"]
    Dispatch["target.Connector.DeployCertificate(ctx, request)"]

    Renew --> Pick --> Dispatch
    Dispatch --> AWS
    Dispatch --> AZ

    subgraph AWS["AWS ACM path"]
        A1["1. rotate-in-place only:<br/>DescribeCertificate(arn)"]
        A2["2. GetCertificate(arn) —<br/>capture snapshot bytes for rollback"]
        A3["3. ImportCertificate(arn, new_bytes) —<br/>fresh ARN OR rotate-in-place"]
        A4["4. AddTagsToCertificate(arn, provenance) —<br/>ACM strips on re-import; we re-apply"]
        A5["5. DescribeCertificate(arn) —<br/>verify serial matches expected"]
        A6["6. ON MISMATCH: rollback<br/>ImportCertificate(arn, snapshot_bytes)"]
        A1 --> A2 --> A3 --> A4 --> A5 --> A6
    end

    subgraph AZ["Azure Key Vault path"]
        Z1["1. GetCertificate(name, '' = latest) —<br/>capture snapshot CER bytes"]
        Z2["2. Build PFX from cert+chain+key<br/>(PKCS#12 via go-pkcs12)"]
        Z3["3. ImportCertificate(name, PFX, tags) —<br/>ALWAYS creates a new version"]
        Z4["4. Tags carried forward automatically"]
        Z5["5. GetCertificate(name, '' = latest) —<br/>verify serial matches expected"]
        Z6["6. ON MISMATCH: rollback<br/>ImportCertificate(name, snapshot_PFX) —<br/>new version"]
        Z1 --> Z2 --> Z3 --> Z4 --> Z5 --> Z6
    end

    A6 --> Audit
    Z6 --> Audit
    Audit["7. Audit row + Prometheus counters<br/>certctl_deploy_attempts_total{target_type, result}<br/>certctl_deploy_rollback_total{target_type, outcome}"]

Configuring an AWS ACM target

Minimum config

curl -X POST https://certctl.example.com/api/v1/targets \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Production ALB cert",
    "type": "AWSACM",
    "agent_id": "ag-server",
    "config": {
      "region": "us-east-1",
      "tags": {"env": "production"}
    }
  }'

Empty certificate_arn on first deploy = ACM creates a fresh ARN; the deployment record's Metadata captures it. Update the deployment_targets.config.certificate_arn field via the GUI / API / direct SQL to pin the ARN for subsequent renewals.

Minimum IAM policy

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "acm:ImportCertificate",
      "acm:GetCertificate",
      "acm:DescribeCertificate",
      "acm:ListCertificates",
      "acm:AddTagsToCertificate"
    ],
    "Resource": "arn:aws:acm:us-east-1:*:certificate/*"
  }]
}

Pin Resource to the specific region / account where the ALB lives. Cross-account deploys use AssumeRole — configure the agent's role with sts:AssumeRole against the target account's role ARN.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: certctl-agent
  namespace: certctl-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/certctl-acm-deployer

Trust policy on certctl-acm-deployer:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE:sub": "system:serviceaccount:certctl-system:certctl-agent"
      }
    }
  }]
}

Configuring an Azure Key Vault target

Minimum config

curl -X POST https://certctl.example.com/api/v1/targets \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "Production AGW cert",
    "type": "AzureKeyVault",
    "agent_id": "ag-server",
    "config": {
      "vault_url": "https://prod-vault.vault.azure.net",
      "certificate_name": "api-prod",
      "credential_mode": "managed_identity",
      "tags": {"env": "production"}
    }
  }'

Minimum RBAC role

Off-the-shelf builtin: Key Vault Certificates Officer (assigns at the vault scope).

Custom minimum-permission role:

{
  "properties": {
    "roleName": "certctl-keyvault-deployer",
    "description": "Minimum permissions for certctl Key Vault target",
    "assignableScopes": [
      "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>"
    ],
    "permissions": [{
      "actions": [],
      "notActions": [],
      "dataActions": [
        "Microsoft.KeyVault/vaults/certificates/import/action",
        "Microsoft.KeyVault/vaults/certificates/read",
        "Microsoft.KeyVault/vaults/certificates/listversions/read"
      ],
      "notDataActions": []
    }]
  }
}

Annotate the agent's ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: certctl-agent
  namespace: certctl-system
  annotations:
    azure.workload.identity/client-id: <app-registration-client-id>
  labels:
    azure.workload.identity/use: "true"

Federated credential on the app registration:

{
  "name": "certctl-agent-federated",
  "issuer": "https://<oidc-issuer-url>",
  "subject": "system:serviceaccount:certctl-system:certctl-agent",
  "audiences": ["api://AzureADTokenExchange"]
}

Set credential_mode: workload_identity on the deployment_target config.


Operator playbook

"Did the cert get imported to ACM / Key Vault?"

AWS:

aws acm describe-certificate \
  --certificate-arn arn:aws:acm:us-east-1:...:certificate/<id> \
  --query 'Certificate.{Status:Status,Serial:Serial,Issued:IssuedAt,NotAfter:NotAfter,Tags:[Tags]}'

Azure:

az keyvault certificate show \
  --vault-name prod-vault \
  --name api-prod \
  --query '{Serial:x509ThumbprintHex, Version:id, NotAfter:attributes.expires}'

In both cases, the certctl-managed-by tag confirms the cert was imported by certctl (and not someone running aws-cli directly).

"Why did the rollback fail?"

The Prometheus counter certctl_deploy_rollback_total{outcome="also_failed"} ticks when the rollback's own ImportCertificate / Set call also returns an error. Look at the agent's slog at ERROR level for the per-call diagnostic; the underlying cloud SDK error message tells you whether it was IAM denial, throttling, or a structural input problem.

Manual recovery:

AWS ACM:

# Get the snapshot of a known-good cert from S3 / Vault / wherever the
# operator stores backup PEMs:
aws acm import-certificate \
  --certificate fileb://known-good.crt \
  --private-key  fileb://known-good.key \
  --certificate-chain fileb://known-good.chain \
  --certificate-arn  arn:aws:acm:us-east-1:...:certificate/<id> \
  --tags Key=certctl-managed-by,Value=manual-recovery

Azure Key Vault:

# Import a fresh PFX as a new version under the same name:
az keyvault certificate import \
  --vault-name prod-vault \
  --name api-prod \
  --file known-good.pfx \
  --tags certctl-managed-by=manual-recovery

After the manual recovery, certctl's next renewal-loop tick re-verifies the live cert via ValidateDeployment and resumes normal operation.

"How do I know certctl is the only one writing to this ARN / vault cert?"

AWS — via CloudTrail:

EventName = "ImportCertificate"
Resources.ARN = "arn:aws:acm:us-east-1:...:certificate/<id>"

Filter by user identity to see which principal made each call. The certctl agent's IAM role / IRSA-bound role should be the only writer.

Azure — via Activity Log:

az monitor activity-log list \
  --resource-id /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault>/certificates/<name> \
  --offset 30d \
  --query "[?operationName.value=='Microsoft.KeyVault/vaults/certificates/import/action'].{caller:caller, time:eventTimestamp}"

Cardinality + cost

  • Per-target-type Prometheus counters: 2 new certctl_deploy_attempts_total series (AWSACM + AzureKeyVault) × 2 results = 4 series. Comfortable.
  • AWS ACM costs: ImportCertificate is free; CloudTrail logs at $2 per GB. Renewing 100 certs/month adds ~10 KB to CloudTrail.
  • Azure Key Vault costs: certificate operations $0.03 per 10K operations (V2 pricing as of 2026-05). 100 certs/month = $0.0009 in cert-op spend. Activity Log retention is configurable (default 90 days, free).

V3-Pro forward path

Tracked at cowork/WORKSPACE-ROADMAP.md under "Adapter hardening":

  • AWS CloudFront direct-attach — UpdateDistribution after an ACM ImportCertificate so the CloudFront edge picks up the new cert without operator intervention. Requires cloudfront:UpdateDistribution IAM permission on top of the ACM minimum.
  • Azure Front Door direct-attach — UpdateRoutingConfig equivalent.
  • AWS ALB / Azure App Gateway auto-bind — currently operators attach the ARN / KID URI to the LB out-of-band (Terraform); V3-Pro adds the auto-attach step.
  • Soft-delete recovery for Azure Key Vault — V2 always re-imports as a new version; V3 detects soft-deleted prior versions and offers operator-confirmed recovery.
  • GCP Certificate Manager target — Google Cloud's equivalent to ACM; mirrors the AWS ACM connector shape. Separate cloud, separate connector.