mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-08 06:38:58 +00:00
Compare commits
68 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 7a9ae3157f | |||
| 1720e11109 | |||
| f40e975439 | |||
| 0e06f6c4fc | |||
| ff75361553 | |||
| e0aaa967c9 | |||
| 17455d2ea2 | |||
| f2c77ba3fb | |||
| d2b62880ce | |||
| 75097909e9 | |||
| 7c5cc57d75 | |||
| 9acf609ac9 | |||
| 622cd29f20 | |||
| d809874fa1 | |||
| 5ea8fb48eb | |||
| 3275f9f1e0 | |||
| ecb8896b1c | |||
| f179eab071 | |||
| 969853ee53 | |||
| 082b8cf660 | |||
| de06141ce5 | |||
| fd94205cfa | |||
| b452013dd9 | |||
| fd4eb3b165 | |||
| a364cd6990 | |||
| 12d7b1f51d | |||
| 19c8fafe84 | |||
| 426760d737 | |||
| affaa11d14 | |||
| dca1900815 | |||
| 633e440787 | |||
| cee008207b | |||
| e9b15108d9 | |||
| f157c18368 | |||
| b21c02a3d5 | |||
| 3a807ae37e | |||
| cda957f302 | |||
| 0f81c1b956 | |||
| ff6ffcda1b | |||
| b0fc067317 | |||
| c46a6aecbc | |||
| 9ef9f3cde3 | |||
| a00b20cc97 | |||
| b6a5278df1 | |||
| 439905e546 | |||
| 2b4d0069d9 | |||
| d08982fc19 | |||
| af3ca3935b | |||
| e6919cdaba | |||
| 23c593089d | |||
| e50ba168ac | |||
| 7d48bd0367 | |||
| 85649cf983 | |||
| 8908c8ff5c | |||
| 34adcfbbe5 | |||
| ae597f7f8d | |||
| 62523fb845 | |||
| fb54ebcb62 | |||
| 66d2af36a7 | |||
| 31e50d987f | |||
| b601928e1c | |||
| aebfd8bd7c | |||
| 19706e56b3 | |||
| 03c61f4c20 | |||
| 81632eb0f3 | |||
| 8043e2bbac | |||
| 2025275b43 | |||
| 69d4ada385 |
+23
-33
@@ -79,7 +79,7 @@ jobs:
|
||||
# does call, this step fails the build until either upstream
|
||||
# ships a fix OR we cut the dep. Deferred-call advisories that
|
||||
# legitimately can't be remediated yet should be added to the
|
||||
# NIST SSDF deviation log in docs/security.md, not silenced here.
|
||||
# NIST SSDF deviation log in docs/operator/security.md, not silenced here.
|
||||
run: govulncheck ./...
|
||||
|
||||
- name: Install staticcheck (Bundle-7 / D-001)
|
||||
@@ -135,48 +135,38 @@ jobs:
|
||||
GITHUB_REPOSITORY: ${{ github.repository }}
|
||||
run: bash scripts/coverage-pr-comment.sh
|
||||
|
||||
# Bundle P / Strengthening #6 — QA-doc drift guards. Forces every PR
|
||||
# that adds a Part to docs/testing-guide.md OR a seed row to
|
||||
# migrations/seed_demo.sql to keep docs/qa-test-guide.md in sync. This
|
||||
# eliminates the doc-drift class structurally — the symptom Bundle I
|
||||
# had to clean up by hand becomes a CI-time error going forward.
|
||||
- name: QA-doc Part-count drift guard
|
||||
run: |
|
||||
set -e
|
||||
DOC_PARTS=$(grep -oE '49 of [0-9]+ Parts' docs/qa-test-guide.md | grep -oE '[0-9]+' | tail -1)
|
||||
GUIDE_PARTS=$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md)
|
||||
if [ -z "$DOC_PARTS" ]; then
|
||||
echo "::error::Could not extract Part count from docs/qa-test-guide.md headline."
|
||||
echo " Expected pattern: '49 of <N> Parts'"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$DOC_PARTS" != "$GUIDE_PARTS" ]; then
|
||||
echo "::error::DRIFT — qa-test-guide.md headline claims $DOC_PARTS Parts; testing-guide.md has $GUIDE_PARTS Parts."
|
||||
echo " Update docs/qa-test-guide.md to match. Bundle I patched this once;"
|
||||
echo " Bundle P added this guard so the drift cannot recur silently."
|
||||
exit 1
|
||||
fi
|
||||
echo "QA-doc Part-count drift guard: clean ($DOC_PARTS == $GUIDE_PARTS)."
|
||||
|
||||
# Bundle P / Strengthening #6 — QA-doc seed-count drift guard. Forces
|
||||
# every PR that adds a seed row to migrations/seed_demo.sql to keep
|
||||
# docs/contributor/qa-test-suite.md::Seed Data Reference in sync.
|
||||
#
|
||||
# Phase 5 of the 2026-05-04 docs overhaul (commit c64777f) deleted
|
||||
# docs/testing-guide.md (its content dispersed across the new
|
||||
# audience-organized doc tree); the previous QA-doc Part-count drift
|
||||
# guard tracked Part counts between testing-guide.md and the old
|
||||
# qa-test-guide.md headline. With testing-guide.md gone, that guard's
|
||||
# premise is dead and it has been removed. The seed-count drift class
|
||||
# is still live: qa-test-suite.md::Seed Data Reference enumerates
|
||||
# certs/issuers and seed_demo.sql is the source of truth.
|
||||
- name: QA-doc seed-count drift guard
|
||||
run: |
|
||||
set -e
|
||||
DOC=docs/contributor/qa-test-suite.md
|
||||
# Seed-cert count: agnostic to documented header format. The current
|
||||
# documented count lives in `### Certificates (32 total in ...` —
|
||||
# extract the first integer in that header.
|
||||
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
|
||||
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
|
||||
# Authoritative count: unique mc-* IDs in seed_demo.sql.
|
||||
SEED_CERTS=$(grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
|
||||
if [ -z "$DOC_CERTS" ]; then
|
||||
echo "::warning::Could not extract documented cert count from docs/qa-test-guide.md."
|
||||
echo "::warning::Could not extract documented cert count from $DOC."
|
||||
echo " Skipping cert-count drift check (header format may have changed)."
|
||||
elif [ "$DOC_CERTS" != "$SEED_CERTS" ]; then
|
||||
echo "::error::DRIFT — qa-test-guide.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
|
||||
echo " Update docs/qa-test-guide.md::Seed Data Reference to match."
|
||||
echo "::error::DRIFT — $DOC says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
|
||||
echo " Update $DOC::Seed Data Reference to match."
|
||||
exit 1
|
||||
fi
|
||||
# Issuers: seed-table count vs doc claim.
|
||||
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
|
||||
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
|
||||
# Authoritative: unique iss-* IDs (close enough proxy; the issuers
|
||||
# table count IS the unique-ID count for this prefix).
|
||||
SEED_ISS=$(grep -oE 'iss-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
|
||||
@@ -186,7 +176,7 @@ jobs:
|
||||
# Allow up to 5pp slack — iss-* IDs appear in audit_events and
|
||||
# other reference tables that aren't issuer-table rows. Drift
|
||||
# only flags when the spread grows large.
|
||||
echo "::error::DRIFT — qa-test-guide.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
|
||||
echo "::error::DRIFT — $DOC says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
|
||||
exit 1
|
||||
fi
|
||||
echo "QA-doc seed-count drift guard: clean."
|
||||
@@ -209,7 +199,7 @@ jobs:
|
||||
# 167 legitimate tests for no observable behavior change. The
|
||||
# Test<Func>_<Scenario>_<ExpectedResult> form remains documented as
|
||||
# the recommended pattern for parameterized scenarios in
|
||||
# docs/qa-test-guide.md, but is not gated.
|
||||
# docs/contributor/qa-test-suite.md, but is not gated.
|
||||
- name: Regression guards (extracted to scripts/ci-guards/)
|
||||
# All named regression guards live at scripts/ci-guards/<id>.sh per
|
||||
# ci-pipeline-cleanup bundle Phase 1. Each guard is callable locally:
|
||||
@@ -289,7 +279,7 @@ jobs:
|
||||
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
|
||||
# configured. Every lint/template invocation below must pick exactly one
|
||||
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
|
||||
# (certctl.tls.required) and docs/tls.md.
|
||||
# (certctl.tls.required) and docs/operator/tls.md.
|
||||
- name: Lint Helm Chart
|
||||
run: |
|
||||
helm lint deploy/helm/certctl/ \
|
||||
@@ -336,7 +326,7 @@ jobs:
|
||||
# RAM headroom on ubuntu-latest (16 GB ceiling) — operator-confirmed
|
||||
# in Phase 0 / frozen decision 0.14 prototype-branch run. If RAM
|
||||
# regresses, fall back to bucketed matrix per
|
||||
# cowork/ci-pipeline-cleanup/decisions-revised.md.
|
||||
# the project's frozen-decisions log.
|
||||
#
|
||||
# The Windows matrix (deploy-vendor-e2e-windows) was deleted entirely
|
||||
# per Phase 6 / frozen decision 0.5 (revises Bundle II decision 0.4).
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Load-test workflow — closes the #8 acquisition-readiness blocker from
|
||||
# the 2026-05-01 issuer coverage audit (see
|
||||
# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
|
||||
# the 2026-05-01 issuer coverage audit).
|
||||
#
|
||||
# CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests
|
||||
# are minutes long and don't provide useful per-PR signal — per-push
|
||||
|
||||
@@ -1,5 +1,12 @@
|
||||
name: Release
|
||||
|
||||
# Override the auto-generated run name (which would otherwise default to
|
||||
# the most recent commit subject + a #NN run number) so the Actions tab
|
||||
# shows "Release v2.0.69" instead of "chore: rename Go module path... #73".
|
||||
# `github.ref_name` resolves to the tag name (e.g., `v2.0.69`) for tag-triggered
|
||||
# workflows, which is the only trigger we set below.
|
||||
run-name: Release ${{ github.ref_name }}
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
@@ -346,6 +353,11 @@ jobs:
|
||||
# noise that gives operators no signal about what actually changed.
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
# Pin the release title to the tag name. softprops/action-gh-release@v2
|
||||
# falls back to the most recent commit subject when `name:` is omitted,
|
||||
# which produces ugly titles like "chore: rename Go module path..." on
|
||||
# the Releases page. `github.ref_name` evaluates to the tag (`v2.0.69`).
|
||||
name: ${{ github.ref_name }}
|
||||
generate_release_notes: true
|
||||
body: |
|
||||
> **Install / upgrade:** see the [Quick Start section in the README](https://github.com/certctl-io/certctl/blob/master/README.md#quick-start) for Docker Compose, agent install, Helm, and binary download instructions.
|
||||
|
||||
@@ -20,7 +20,7 @@ name: security-deep-scan
|
||||
#
|
||||
# Each step is best-effort — failures are uploaded as artefacts but do
|
||||
# NOT block the workflow. Triage happens via the Bundle-7 receipt
|
||||
# directory under cowork/comprehensive-audit-2026-04-25/tool-output/.
|
||||
# the project's comprehensive-audit tool-output directory.
|
||||
|
||||
on:
|
||||
schedule:
|
||||
@@ -82,7 +82,7 @@ jobs:
|
||||
# package is mutated independently; the per-package summary line
|
||||
# (`The mutation score is X.YZ`) is grep-extracted into the receipt.
|
||||
# Acceptance threshold: ≥80% kill ratio per package; surviving
|
||||
# mutants get triaged in cowork/comprehensive-audit-2026-04-25/
|
||||
# mutants get triaged in the project's comprehensive-audit notes/
|
||||
# d003-mutation-results.md (per-mutant action item or
|
||||
# equivalent-mutation justification).
|
||||
|
||||
|
||||
@@ -119,15 +119,18 @@ verify:
|
||||
@echo ""
|
||||
@echo "verify: PASS — safe to commit"
|
||||
|
||||
# verify-docs: pre-tag gate. Runs the QA-doc Part-count + seed-count
|
||||
# drift guards that ci-pipeline-cleanup Phase 11 / frozen decision 0.13
|
||||
# moved out of CI (was per-push blocking; now operator-runs pre-tag).
|
||||
# These guards protect docs/qa-test-guide.md headlines from drifting
|
||||
# vs the underlying source-of-truth (testing-guide Part count, seed
|
||||
# row count). Operator-facing docs only — not product-affecting.
|
||||
# verify-docs: pre-tag gate. Runs the QA-doc seed-count drift guard
|
||||
# that ci-pipeline-cleanup Phase 11 / frozen decision 0.13 moved out
|
||||
# of CI (was per-push blocking; now operator-runs pre-tag). Protects
|
||||
# docs/contributor/qa-test-suite.md::Seed Data Reference from
|
||||
# drifting vs migrations/seed_demo.sql. Operator-facing docs only —
|
||||
# not product-affecting.
|
||||
#
|
||||
# The QA-doc Part-count drift guard retired in the 2026-05-04 docs
|
||||
# overhaul Phase 5 when docs/testing-guide.md was pruned (its content
|
||||
# dispersed across the audience-organized doc tree); the Part-count
|
||||
# class no longer exists outside the qa_test.go file itself.
|
||||
verify-docs:
|
||||
@echo "==> QA-doc Part-count drift"
|
||||
@bash scripts/qa-doc-part-count.sh
|
||||
@echo "==> QA-doc seed-count drift"
|
||||
@bash scripts/qa-doc-seed-count.sh
|
||||
@echo ""
|
||||
@@ -263,9 +266,12 @@ frontend-build:
|
||||
@echo "Frontend build complete"
|
||||
|
||||
# QA Suite Stats — Bundle P / Strengthening #8.
|
||||
# Single source-of-truth for every count claim in docs/qa-test-guide.md +
|
||||
# docs/testing-guide.md. The Strengthening #6 CI drift guards consume the
|
||||
# same numbers, eliminating the doc-drift class structurally.
|
||||
# Single source-of-truth for every count claim in
|
||||
# docs/contributor/qa-test-suite.md. The Strengthening #6 CI drift guards
|
||||
# (now scoped to the seed-count class only — the Part-count class retired
|
||||
# in the 2026-05-04 docs overhaul Phase 5 when testing-guide.md was
|
||||
# pruned) consume the same numbers, eliminating the doc-drift class
|
||||
# structurally.
|
||||
qa-stats:
|
||||
@echo "=== certctl QA Suite Stats ==="
|
||||
@echo "Date: $$(date +%Y-%m-%d)"
|
||||
@@ -278,7 +284,6 @@ qa-stats:
|
||||
@echo "Fuzz targets: $$(grep -rE 'func Fuzz[A-Z]' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
|
||||
@echo "t.Skip sites: $$(grep -rE 't\.Skip(Now|f)?\(' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
|
||||
@echo "qa_test.go Part_ subtests: $$(grep -cE 't\.Run\(\"Part[0-9]+_' deploy/test/qa_test.go 2>/dev/null || echo 0)"
|
||||
@echo "testing-guide.md Parts: $$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md 2>/dev/null || echo 0)"
|
||||
@echo "Seed unique mc-* IDs: $$(grep -oE "mc-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
|
||||
@echo "Seed unique ag-* IDs: $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 12)"
|
||||
@echo "Seed unique iss-* IDs: $$(grep -oE "iss-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (issuers table count is 13)"
|
||||
|
||||
@@ -9,138 +9,29 @@
|
||||
[](https://github.com/certctl-io/certctl/releases)
|
||||
[](https://github.com/certctl-io/certctl/stargazers)
|
||||
|
||||
TLS certificate lifespans are shrinking fast. The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) unanimously in April 2025, setting a phased reduction: **200 days** by March 2026, **100 days** by March 2027, and **47 days** by March 2029. Organizations managing dozens or hundreds of certificates can no longer rely on spreadsheets, calendar reminders, or manual renewal workflows. The math doesn't work — at 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever.
|
||||
certctl is a self-hosted platform that automates the entire TLS certificate lifecycle, from issuance through renewal to deployment, with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. Free, source-available under BSL 1.1, covers the same lifecycle that enterprise platforms charge $100K+/year for.
|
||||
|
||||
certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. It's free, self-hosted, and covers the same lifecycle that enterprise platforms charge $100K+/year for.
|
||||
The CA/Browser Forum's [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) caps public TLS certificates at **200 days by March 2026**, **100 days by 2027**, and **47 days by 2029**. At 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever. Manual workflows stop being a choice.
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title TLS Certificate Maximum Lifespan — CA/Browser Forum Ballot SC-081v3
|
||||
dateFormat YYYY-MM-DD
|
||||
axisFormat
|
||||
todayMarker off
|
||||
section 2015
|
||||
5 years (1825 days) :done, 2020-01-01, 1825d
|
||||
section 2018
|
||||
825 days :done, 2020-01-01, 825d
|
||||
section 2020
|
||||
398 days :active, 2020-01-01, 398d
|
||||
section 2026
|
||||
200 days :crit, 2020-01-01, 200d
|
||||
section 2027
|
||||
100 days :crit, 2020-01-01, 100d
|
||||
section 2029
|
||||
47 days :crit, 2020-01-01, 47d
|
||||
```
|
||||
> **Actively maintained, shipping weekly.** [Open an issue](https://github.com/certctl-io/certctl/issues) if something breaks. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
|
||||
|
||||
> **Actively maintained — shipping weekly.** Found something? [Open a GitHub issue](https://github.com/certctl-io/certctl/issues) — issues get triaged same-day. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
|
||||
|
||||
**Ready to try it?** Jump to the [Quick Start](#quick-start) — you'll have a running dashboard in under 5 minutes.
|
||||
**Ready to try it?** Jump to the [Quick Start](#quick-start). For the marketing site, see [certctl.io](https://certctl.io).
|
||||
|
||||
## Documentation
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Why certctl?](docs/why-certctl.md) | How certctl compares to ACME clients, agent-based SaaS, and enterprise platforms |
|
||||
| [Concepts](docs/concepts.md) | TLS certificates explained from scratch — for beginners who know nothing about certs |
|
||||
| [Quick Start](docs/quickstart.md) | 5-minute setup — dashboard, API, CLI, discovery, stakeholder demo flow |
|
||||
| [Docker Compose Environments](deploy/ENVIRONMENTS.md) | Service-by-service walkthrough of all 4 compose files, env var reference |
|
||||
| [Deployment Examples](docs/examples.md) | 5 turnkey scenarios (ACME+NGINX, wildcard DNS-01, private CA, step-ca, multi-issuer) with migration guides |
|
||||
| [Advanced Demo](docs/demo-advanced.md) | Issue a certificate end-to-end with technical deep-dives |
|
||||
| [Architecture](docs/architecture.md) | System design, data flow diagrams, security model |
|
||||
| [Feature Inventory](docs/features.md) | Complete reference of all capabilities, API endpoints, and configuration |
|
||||
| [Connector Reference](docs/connectors.md) | Configuration for all issuer, target, and notifier connectors |
|
||||
| [MCP Server](docs/mcp.md) | AI integration via Model Context Protocol — setup, available tools, examples |
|
||||
| [OpenAPI 3.1 Spec](docs/openapi.md) | API reference guide with endpoint overview ([raw spec](api/openapi.yaml)) |
|
||||
| [Compliance Mapping](docs/compliance.md) | SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 alignment guides |
|
||||
| [Migrate from certbot](docs/migrate-from-certbot.md) | Step-by-step migration from certbot cron jobs to certctl |
|
||||
| [Migrate from acme.sh](docs/migrate-from-acmesh.md) | Migration guide for acme.sh users, DNS hook compatibility |
|
||||
| [certctl for cert-manager users](docs/certctl-for-cert-manager-users.md) | How certctl complements cert-manager for mixed infrastructure |
|
||||
| [Test Environment](docs/test-env.md) | Docker Compose test environment with real CA backends |
|
||||
| [Testing Guide](docs/testing-guide.md) | Comprehensive test procedures, smoke tests, and release sign-off checklist |
|
||||
The full audience-organized index lives at [`docs/README.md`](docs/README.md). Top-level entry points:
|
||||
|
||||
## Supported Integrations
|
||||
| Audience | Start here |
|
||||
|---|---|
|
||||
| New to certctl | [Concepts](docs/getting-started/concepts.md) → [Quickstart](docs/getting-started/quickstart.md) → [Examples](docs/getting-started/examples.md) |
|
||||
| Production operator | [Architecture](docs/reference/architecture.md) → [Security posture](docs/operator/security.md) → [Disaster recovery runbook](docs/operator/runbooks/disaster-recovery.md) |
|
||||
| PKI engineer | [ACME server](docs/reference/protocols/acme-server.md) → [SCEP server](docs/reference/protocols/scep-server.md) → [EST server](docs/reference/protocols/est.md) → [CA hierarchy](docs/reference/intermediate-ca-hierarchy.md) |
|
||||
| Migrating from another tool | [from certbot](docs/migration/from-certbot.md) / [from acme.sh](docs/migration/from-acmesh.md) / [cert-manager coexistence](docs/migration/cert-manager-coexistence.md) |
|
||||
| Contributor | [Architecture](docs/reference/architecture.md) → [Testing strategy](docs/contributor/testing-strategy.md) → [CI pipeline](docs/contributor/ci-pipeline.md) |
|
||||
|
||||
### Certificate Issuers
|
||||
For the connector reference (12 issuers, 15 targets, 6 notifiers) see [`docs/reference/connectors/index.md`](docs/reference/connectors/index.md).
|
||||
|
||||
| Issuer | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| Local CA (self-signed + sub-CA) | `GenericCA` | Sub-CA mode chains to enterprise root (ADCS, etc.) |
|
||||
| ACME v2 (Let's Encrypt, ZeroSSL, etc.) | `ACME` | HTTP-01, DNS-01, DNS-PERSIST-01 challenges. EAB auto-fetch from ZeroSSL. Profile selection (`tlsserver`, `shortlived`). |
|
||||
| step-ca (Smallstep) | `StepCA` | JWK provisioner auth, issuance + renewal + revocation |
|
||||
| OpenSSL / Custom CA | `OpenSSL` | Shell script adapter — any CA with a CLI |
|
||||
| HashiCorp Vault PKI | `VaultPKI` | Token auth, synchronous issuance, CRL/OCSP delegated to Vault |
|
||||
| DigiCert CertCentral | `DigiCert` | Async order model, OV/EV support, PEM bundle parsing |
|
||||
| Sectigo SCM | `Sectigo` | 3-header auth, DV/OV/EV, collect-not-ready graceful handling |
|
||||
| Google Cloud CAS | `GoogleCAS` | OAuth2 service account, synchronous issuance, CA pool selection |
|
||||
| AWS ACM Private CA | `AWSACMPCA` | Synchronous issuance, configurable signing algorithm/template ARN |
|
||||
| Entrust Certificate Services | `Entrust` | mTLS client certificate auth, synchronous/approval-pending issuance |
|
||||
| GlobalSign Atlas HVCA | `GlobalSign` | mTLS + API key/secret dual auth, serial-based tracking |
|
||||
| EJBCA (Keyfactor) | `EJBCA` | Dual auth (mTLS or OAuth2), self-hosted open-source CA |
|
||||
|
||||
**Note:** ADCS integration is handled via the Local CA's sub-CA mode — certctl operates as a subordinate CA with its signing certificate issued by ADCS. Any CA with a shell-accessible signing interface can be integrated via the OpenSSL/Custom CA connector.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
| Target | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| NGINX | `NGINX` | Atomic write + `nginx -t` validate + `nginx -s reload` + post-deploy TLS verify + rollback (deploy-hardening I) |
|
||||
| Apache httpd | `Apache` | Atomic write + `apachectl configtest` + graceful reload + post-deploy TLS verify + rollback |
|
||||
| HAProxy | `HAProxy` | Combined PEM atomic write + `haproxy -c -f` validate + `systemctl reload` + post-deploy TLS verify + rollback |
|
||||
| Traefik | `Traefik` | Atomic write + post-deploy TLS verify + rollback (file watcher auto-reloads) |
|
||||
| Caddy | `Caddy` | Atomic write (file mode) or `POST /load` (api mode) + admin API ValidateOnly probe |
|
||||
| Envoy | `Envoy` | Atomic write + SDS file watcher auto-reload |
|
||||
| Postfix | `Postfix` | Atomic write + `postfix check` + `postfix reload` + post-deploy TLS verify + rollback |
|
||||
| Dovecot | `Dovecot` | Atomic write + `doveconf -n` + `doveadm reload` + post-deploy TLS verify + rollback |
|
||||
| Microsoft IIS | `IIS` | Local PowerShell or remote WinRM, PEM→PFX, SNI support, explicit pre-deploy backup + post-rollback re-import |
|
||||
| F5 BIG-IP | `F5` | iControl REST via proxy agent, transaction-based atomic updates + post-deploy TLS verify on Virtual Server |
|
||||
| SSH (Agentless) | `SSH` | SFTP cert/key deployment + pre-deploy SCP backup + tls.Dial post-verify |
|
||||
| Windows Certificate Store | `WinCertStore` | PowerShell Import-PfxCertificate + Get-ChildItem snapshot for rollback |
|
||||
| Java Keystore | `JavaKeystore` | PEM→PKCS#12→keytool pipeline + keytool snapshot for rollback |
|
||||
| Kubernetes Secrets | `KubernetesSecrets` | `kubernetes.io/tls` Secrets, atomic API + SHA-256 verify + kubelet sync poll |
|
||||
|
||||
**Deploy-hardening I** (post-2026-04-30 master bundle): every connector now goes through `internal/deploy.Apply` for atomic-write + ownership-preservation + SHA-256 idempotency + per-target-type Prometheus counters (`certctl_deploy_*_total`). See [`docs/deployment-atomicity.md`](docs/deployment-atomicity.md) for the operator guide.
|
||||
|
||||
### Enrollment Protocols
|
||||
|
||||
| Protocol | Standard | Use Case |
|
||||
|----------|----------|----------|
|
||||
| **EST (production-grade)** | RFC 7030 + RFC 9266 channel binding | Native EST server hardened for enterprise WiFi/802.1X, IoT bootstrap, and corporate device enrollment (post-2026-04-29 hardening master bundle). All six RFC 7030 endpoints — `cacerts` / `simpleenroll` / `simplereenroll` / `csrattrs` (profile-driven) / `serverkeygen` (CMS EnvelopedData wire format). Multi-profile dispatch (`/.well-known/est/<pathID>/`). Per-profile auth modes: mTLS sibling route at `/.well-known/est-mtls/<pathID>/`, HTTP Basic enrollment-password (constant-time compare + per-source-IP failed-auth limiter), RFC 9266 `tls-exporter` channel binding (TLS 1.3, opt-in per profile). Per-(CN, sourceIP) sliding-window rate limit. EST-source-scoped bulk revoke (`POST /api/v1/est/certificates/bulk-revoke`, M-008 admin-gated). Tabbed admin GUI at `/est` (Profiles / Recent Activity / Trust Bundle). `SIGHUP`-equivalent trust-bundle reload. libest reference-client interop tested in CI (`deploy/test/libest/Dockerfile` + `deploy/test/est_e2e_test.go`). Typed audit-action codes per failure dimension (`est_simple_enroll_success`/`_failed`, `est_auth_failed_basic`/`_mtls`/`_channel_binding`, `est_rate_limited`, `est_csr_policy_violation`, `est_bulk_revoke`, `est_trust_anchor_reloaded`, etc. — full set in `internal/service/est_audit_actions.go`). CLI + matching MCP tool family (rebuild count via `grep -cE '"est_' internal/mcp/tools_est.go`). See [`docs/est.md`](docs/est.md) for the operator guide — WiFi/802.1X + FreeRADIUS recipe, IoT bootstrap, troubleshooting matrix per audit-action code. |
|
||||
| SCEP (Simple Certificate Enrollment Protocol) | RFC 8894 | MDM platforms (Jamf, Intune), network devices, ChromeOS. Full RFC 8894 wire format: EnvelopedData decryption, signerInfo POPO verification, CertRep PKIMessage builder; PKCSReq + RenewalReq + GetCertInitial messageType dispatch; multi-profile dispatch (`/scep/<pathID>`); per-profile RA cert + key. Lightweight raw-CSR clients keep working via the legacy MVP fall-through path. |
|
||||
| **Microsoft Intune SCEP fleet (drop-in NDES replacement)** | RFC 8894 + Intune Connector signed-challenge dispatcher | Per-profile Intune dispatcher validates the Connector's signed challenge against an operator-supplied trust anchor; binds device claim to CSR (set-equality on CN + SAN-DNS/RFC822/UPN); replay cache + per-device rate limit; `SIGHUP`-reloadable trust pool; admin GUI **SCEP Administration** page at `/scep` (Profiles tab with per-profile RA cert expiry + mTLS status, Intune Monitoring tab with per-status counters + reload, Recent Activity tab with full SCEP audit log filter). See [`docs/scep-intune.md`](docs/scep-intune.md) for the migration playbook + Microsoft support statement. |
|
||||
| ACME v2 | RFC 8555 | Public CA automated issuance (Let's Encrypt, ZeroSSL) |
|
||||
| ACME ARI (Renewal Information) | RFC 9773 | CA-directed renewal timing — the CA tells you when to renew |
|
||||
|
||||
### Standards & Revocation
|
||||
|
||||
| Capability | Standard | Notes |
|
||||
|------------|----------|-------|
|
||||
| DER-encoded X.509 CRL | RFC 5280 + RFC 7232 caching | Per-issuer, signed by issuing CA, 24h validity. Pre-generated by the scheduler (`CERTCTL_CRL_GENERATION_INTERVAL`, default 1h) and cached in `crl_cache` so HTTP fetches do not rebuild per request. **Production hardening II:** weak-form `ETag` (W/"<sha256-prefix>") + `Cache-Control: public, max-age=3600, must-revalidate` + `If-None-Match` HTTP 304 short-circuit on `GET /.well-known/pki/crl/{issuer_id}` — CDNs and reverse proxies serve repeated fetches from edge cache. |
|
||||
| CRL DistributionPoints auto-injection | RFC 5280 §4.2.1.13 | **Production hardening II.** Local issuer config field `CRLDistributionPointURLs []string` — when set, every issued cert carries the `id-ce-cRLDistributionPoints` extension pointing at certctl's own CRL endpoint. Refusing to silently inject an empty CDP is deliberate (silent-empty fails relying-party validation worse than no CDP). |
|
||||
| Embedded OCSP responder | RFC 6960 + §4.4.1 nonce echo | GET + POST forms (`POST /.well-known/pki/ocsp/{issuer_id}` per §A.1.1). Signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6) carrying `id-pkix-ocsp-nocheck` (§4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Responder cert auto-rotates within 7d of expiry. **Production hardening II:** RFC 6960 §4.4.1 nonce extension echoed in the response (defends against replay attacks); empty/oversized (>32 bytes per CA/B Forum BR §4.10.2) nonces produce the canonical "unauthorized" status (status 6) — never echo malformed bytes. |
|
||||
| OCSP pre-signed response cache | — | **Production hardening II.** Per-`(issuer, serial)` pre-signed responses in the new `ocsp_response_cache` table; read-through facade in `CAOperationsSvc.GetOCSPResponseWithNonce` consults the cache for nil-nonce requests. **Load-bearing security wire:** `RevocationSvc.RevokeCertificateWithActor` calls `InvalidateOnRevoke` after a successful revoke so the next OCSP fetch returns the revoked status — no stale-good window. |
|
||||
| Per-endpoint rate limits | — | **Production hardening II.** OCSP per-source-IP cap at `CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN` (default 1000/min, zero disables); cert-export per-actor cap at `CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR` (default 50/hr, zero disables). OCSP rate-limit trip returns the canonical "unauthorized" OCSP blob plus `Retry-After: 60`; cert-export trip returns HTTP 429. The OCSP limiter does NOT honor `X-Forwarded-For` (publicly reachable; spoofed headers would bypass the cap). |
|
||||
| Cert-export typed audit | — | **Production hardening II.** Typed action constants (`cert_export_pem` / `cert_export_pkcs12` / `cert_export_pem_with_key` reserved / `cert_export_failed`) emitted via split-emit alongside the legacy bare codes for back-compat. Detail map carries `has_private_key` (always false in V2) and `cipher` (`AES-256-CBC-PBE2-SHA256` — pinned so a future dependency upgrade that changes the encoder default surfaces in audit drift review). |
|
||||
| Prometheus per-area metrics | OpenMetrics | `GET /api/v1/metrics/prometheus` — production hardening II surfaces `certctl_ocsp_counter_total{label="..."}` per-event series (`request_get`/`_post`, `request_success`/`_invalid`, `nonce_echoed`/`_malformed`, `rate_limited`, `signing_failed`, etc.) wired from the shared counter table that ticks in the cache hot path. CRL / cert-export / EST / SCEP / Intune per-area counters plug in via the same `SetXxxCounters` setter pattern as follow-up commits. |
|
||||
| Disaster-recovery runbook | — | **Production hardening II.** [`docs/disaster-recovery.md`](docs/disaster-recovery.md) — 8-section operator-grade runbook: CRL cache recovery, OCSP responder cert recovery, OCSP response cache recovery, CA private-key rotation 9-step playbook, Postgres restore + operator-managed-artifacts list, trust-bundle reload semantics, printable DR checklist. The SOC 2 / PCI procurement-team deliverable. |
|
||||
| S/MIME certificates | RFC 8551 | Email protection EKU, adaptive KeyUsage flags (`DigitalSignature \| ContentCommitment` instead of the TLS default `DigitalSignature \| KeyEncipherment`). |
|
||||
| Certificate export | — | PEM (JSON/file) and PKCS#12 (cert-only trust-store mode via `pkcs12.Modern` — AES-256-CBC PBE2 with SHA-256 KDF). Key-bearing PKCS#12 export deferred — V2 export is cert-only by design (private keys live on agents, never touch the control plane). |
|
||||
| ACME DNS-PERSIST-01 | IETF draft | Standing validation record, no per-renewal DNS updates |
|
||||
|
||||
### Notifiers
|
||||
|
||||
| Notifier | Type |
|
||||
|----------|------|
|
||||
| Email (SMTP) | `Email` |
|
||||
| Webhooks | `Webhook` |
|
||||
| Slack | `Slack` |
|
||||
| Microsoft Teams | `Teams` |
|
||||
| PagerDuty | `PagerDuty` |
|
||||
| OpsGenie | `OpsGenie` |
|
||||
|
||||
All connectors are pluggable — build your own by implementing the [connector interface](docs/connectors.md).
|
||||
|
||||
### Screenshots
|
||||
## Screenshots
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
@@ -157,78 +48,62 @@ All connectors are pluggable — build your own by implementing the [connector i
|
||||
|
||||
## Why certctl
|
||||
|
||||
Certificate lifecycle tooling falls into two camps: enterprise platforms (Venafi, Keyfactor) that cost six figures and take months to deploy, or single-purpose tools (certbot, cert-manager) that handle one slice of the problem. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, and target-agnostic. If you're running certbot cron jobs, manually renewing certs, or stitching together scripts across mixed infrastructure, certctl replaces all of that.
|
||||
Certificate lifecycle tooling has historically split into two camps. Enterprise platforms charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together cron jobs across a fleet, manually renewing certs, or writing custom integration scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.
|
||||
|
||||
Built for **platform engineering and DevOps teams** managing 10–500+ certificates, **security and compliance teams** who need audit trails and policy enforcement for SOC 2, PCI-DSS 4.0, or NIST SP 800-57 ([compliance mapping included](docs/compliance.md)), and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For a detailed comparison, see [Why certctl?](docs/why-certctl.md)
|
||||
Built for **platform engineering and DevOps teams** managing 10 to 500+ certificates, **security teams** who need audit trails and policy enforcement, and **small teams without enterprise budgets** who need enterprise-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see [Why certctl?](docs/getting-started/why-certctl.md).
|
||||
|
||||
**Architecture.** Go 1.25 control plane with handler→service→repository layering, PostgreSQL 16 backend (21 tables), and a pull-only deployment model — the server never initiates outbound connections. Agents poll for work. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). Background scheduler runs 7 loops: renewal with ARI integration (1h), job processing (30s), agent health (2m), notifications (1m), short-lived cert expiry (30s), network scanning (6h), certificate digest (24h). See [Architecture Guide](docs/architecture.md) for full system diagrams.
|
||||
## What it does
|
||||
|
||||
**Security-first.** Agents generate ECDSA P-256 keys locally — private keys never touch the control plane. API key auth enforced by default with SHA-256 hashing and constant-time comparison. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Atomic idempotency guards on scheduler loops. Issuer and target credentials encrypted at rest with AES-256-GCM. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit.
|
||||
certctl handles the full certificate lifecycle in one self-hosted control plane:
|
||||
|
||||
**Key design decisions.** TEXT primary keys — human-readable prefixed IDs (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resources at a glance in logs and queries. Idempotent migrations (`IF NOT EXISTS`, `ON CONFLICT DO NOTHING`) safe for repeated execution. Dynamic configuration via GUI with AES-256-GCM encrypted credential storage and env var backward compatibility. Handlers define their own service interfaces for clean dependency inversion.
|
||||
- **Issue and renew** from any CA. Let's Encrypt and any ACME provider, an embedded ACME server you can point cert-manager / certbot / lego at directly, a built-in local CA with sub-CA mode (chains under your enterprise root like ADCS), step-ca, Vault PKI, EJBCA, AWS ACM PCA, Google CAS, DigiCert, Sectigo, GlobalSign, Entrust, plus an OpenSSL / shell-script adapter for anything custom. Twelve native issuer connectors. See the [connector reference](docs/reference/connectors/index.md).
|
||||
- **Deploy automatically** to NGINX, Apache, HAProxy, Caddy, Traefik, Envoy, IIS, Windows Cert Store, Java keystore, Kubernetes Secrets, AWS ACM, Azure Key Vault, SSH known-hosts, Postfix + Dovecot, F5 BIG-IP. Fifteen native target connectors. Every deploy goes through atomic-write + ownership-preservation + SHA-256 idempotency + per-target Prometheus counters + pre-deploy snapshot + on-failure rollback. See [`docs/reference/deployment-model.md`](docs/reference/deployment-model.md).
|
||||
- **Run as an ACME server** so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See [`docs/reference/protocols/acme-server.md`](docs/reference/protocols/acme-server.md).
|
||||
- **Run as a SCEP server** for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See [`docs/reference/protocols/scep-server.md`](docs/reference/protocols/scep-server.md).
|
||||
- **Run as an EST server** for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See [`docs/reference/protocols/est.md`](docs/reference/protocols/est.md).
|
||||
- **Manage multi-level CA hierarchies** with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU `PermittedDNSDomains`, and 2-level internal PKI. See [`docs/reference/intermediate-ca-hierarchy.md`](docs/reference/intermediate-ca-hierarchy.md).
|
||||
- **Gate high-stakes issuance** behind two-person-integrity approval. Flag a profile as `RequiresApproval`, the request lands in a queue, a non-requester approves, the scheduler dispatches. See [`docs/operator/approval-workflow.md`](docs/operator/approval-workflow.md).
|
||||
- **Discover** existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
|
||||
- **Revoke** with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See [`docs/reference/protocols/crl-ocsp.md`](docs/reference/protocols/crl-ocsp.md).
|
||||
- **Alert** via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See [`docs/operator/runbooks/expiry-alerts.md`](docs/operator/runbooks/expiry-alerts.md).
|
||||
- **Drive the platform from natural language** via the bundled MCP (Model Context Protocol) server. The full REST API is exposed as MCP tools — ask your AI client "show me all expiring certificates", "revoke the VPN cert, key compromised", or "what agents are offline?" and it translates to API calls. Stateless stdio-transport binary at `cmd/mcp-server/`; same auth as the REST API; no extra attack surface. See [`docs/reference/mcp.md`](docs/reference/mcp.md).
|
||||
|
||||
## What It Does
|
||||
## Architecture and security
|
||||
|
||||
**Automated lifecycle.** Certificates renew and deploy themselves. The scheduler monitors expiration, issues through your CA, and deploys to targets — zero human intervention. ACME ARI (RFC 9773) lets the CA direct renewal timing. Ready for 47-day (SC-081v3) and 6-day (Let's Encrypt shortlived) certificate lifetimes.
|
||||
Go 1.25 control plane with handler → service → repository layering. PostgreSQL 16 backend (35+ tables, idempotent migrations). Pull-only deployment model — the server never initiates outbound connections. Agents poll for work and generate ECDSA P-256 keys locally so private keys never touch the control plane. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). See the [Architecture Guide](docs/reference/architecture.md) for full system diagrams.
|
||||
|
||||
**Operational dashboard.** 26-page GUI covers the entire lifecycle: certificate inventory with bulk ops, deployment timeline with rollback, discovery triage, network scan management, agent fleet health, short-lived credential countdown, approval workflows, and observability metrics. Configure issuers and targets from the dashboard — no env var editing, no server restarts.
|
||||
|
||||
**Private keys stay on your servers.** Agents generate ECDSA P-256 keys locally, submit only the CSR. The control plane never touches private keys. After deployment, agents probe the live TLS endpoint and compare SHA-256 fingerprints to confirm the right certificate is actually being served.
|
||||
|
||||
**Discovery.** Agents scan filesystems for existing PEM/DER certificates. The network scanner probes TLS endpoints across CIDR ranges without agents. Cloud discovery finds certificates in AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager. Continuous TLS health monitoring tracks endpoint status (healthy/degraded/down/cert_mismatch) with configurable thresholds and historical probe data. All discovery modes feed into a unified triage workflow — claim, dismiss, or import what you find.
|
||||
|
||||
**Policy engine.** Certificate profiles constrain key types, max TTL, and EKUs — with crypto policy enforcement that validates every CSR against profile rules before it reaches the issuer. MaxTTL caps are enforced per issuer connector. Approval workflows pause jobs for human review. Ownership tracking routes notifications to the right team. Agent groups match devices by OS, architecture, IP CIDR, and version.
|
||||
|
||||
**Enrollment protocols.** EST server (RFC 7030) for device and WiFi enrollment. SCEP server (RFC 8894) for MDM platforms and network devices — full wire format (EnvelopedData decrypt + signerInfo POPO verify + CertRep PKIMessage builder), tested against ChromeOS-shape requests; multi-profile dispatch (`/scep/<pathID>`); RenewalReq + GetCertInitial messageType support; lightweight raw-CSR fallback for legacy clients. See [docs/legacy-est-scep.md](docs/legacy-est-scep.md) for the operator + device-integration guide. S/MIME issuance with email protection EKU.
|
||||
|
||||
**Revocation.** Single and bulk revocation (by profile, owner, agent, or issuer). RFC 5280 reason codes. Production-grade revocation status surface for relying parties: DER-encoded X.509 CRL per issuer, scheduler-pre-generated and cached so HTTP fetches do not rebuild per request; embedded OCSP responder serving both GET and POST forms (RFC 6960 §A.1.1) with responses signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6, `id-pkix-ocsp-nocheck` per §4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Both endpoints live unauthenticated under `/.well-known/pki/` per RFC 8615. Short-lived certs (TTL < 1 hour) are exempt — expiry is sufficient revocation. See [docs/crl-ocsp.md](docs/crl-ocsp.md) for the relying-party integration guide.
|
||||
|
||||
**Audit and observability.** Immutable append-only audit trail records every lifecycle action, every API call, and every approval decision. Prometheus metrics endpoint. Scheduled certificate digest emails. Continuous endpoint health monitoring with state machine transitions and real-time alerts.
|
||||
|
||||
**Notifications.** Slack, Teams, PagerDuty, OpsGenie, SMTP, webhooks. Routed by certificate owner. Daily digest emails with stats and expiring certs.
|
||||
|
||||
**Multiple interfaces.** REST API (111 routes), CLI (12 commands), MCP server (80 tools for Claude, Cursor, Windsurf), Helm chart, web dashboard. Certificate export in PEM and PKCS#12.
|
||||
|
||||
**First-run onboarding.** Wizard guides you through connecting a CA, deploying an agent, and issuing your first certificate. Or start with the pre-populated demo — 32 certificates, 10 issuers, 180 days of history.
|
||||
|
||||
For the complete capability breakdown, see the [Feature Inventory](docs/features.md).
|
||||
Security: API key auth enforced by default with SHA-256 hashing and constant-time comparison. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Issuer and target credentials encrypted at rest with AES-256-GCM. HTTPS-only control plane with TLS 1.3 pinned and a fail-closed startup gate that refuses to boot if the TLS bundle is unusable. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit. See [`docs/operator/security.md`](docs/operator/security.md) for the operator-facing security posture.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Docker Compose (Recommended)
|
||||
### Docker Compose (recommended)
|
||||
|
||||
```bash
|
||||
git clone https://github.com/certctl-io/certctl.git
|
||||
cd certctl
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
Wait ~30 seconds, then open **https://localhost:8443** in your browser. (The shipped `docker-compose.yml` self-signs a cert via the `certctl-tls-init` init container on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.) The onboarding wizard walks you through connecting a CA, deploying an agent, and issuing your first certificate.
|
||||
|
||||
**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history:
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
|
||||
```
|
||||
|
||||
The `deploy/` directory has four compose files: `docker-compose.yml` (base platform), `docker-compose.demo.yml` (demo data overlay), `docker-compose.dev.yml` (PgAdmin + debug logging), and `docker-compose.test.yml` (standalone integration tests with real CA backends). See the [Docker Compose Environments Guide](deploy/ENVIRONMENTS.md) for a service-by-service walkthrough, or the [Quick Start](docs/quickstart.md#docker-compose-environments) for a summary.
|
||||
Wait ~30 seconds, then open **https://localhost:8443** in your browser. The shipped demo overlay seeds 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history. The `certctl-tls-init` init container self-signs an ECDSA-P256 cert on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.
|
||||
|
||||
For a clean install without demo data, drop the `-f deploy/docker-compose.demo.yml` flag and run `docker compose -f deploy/docker-compose.yml up -d --build`. The four compose files (`docker-compose.yml` base, `docker-compose.demo.yml` overlay, `docker-compose.dev.yml` for PgAdmin + debug logging, `docker-compose.test.yml` for integration tests) are documented at [`deploy/ENVIRONMENTS.md`](deploy/ENVIRONMENTS.md).
|
||||
|
||||
```bash
|
||||
curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
```
|
||||
|
||||
The control plane is HTTPS-only (TLS 1.3, no plaintext listener). See [`docs/tls.md`](docs/tls.md) for cert provisioning patterns and [`docs/upgrade-to-tls.md`](docs/upgrade-to-tls.md) if you're upgrading from a pre-v2.2 release.
|
||||
The control plane is HTTPS-only with TLS 1.3 pinned. See [`docs/operator/tls.md`](docs/operator/tls.md) for cert provisioning patterns.
|
||||
|
||||
### Agent Install (One-Liner)
|
||||
### Agent install (one-liner)
|
||||
|
||||
```bash
|
||||
curl -sSL https://raw.githubusercontent.com/certctl-io/certctl/master/install-agent.sh | bash
|
||||
```
|
||||
|
||||
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh) for details.
|
||||
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh).
|
||||
|
||||
### Helm Chart (Kubernetes)
|
||||
### Helm chart (Kubernetes)
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
@@ -236,86 +111,18 @@ helm install certctl deploy/helm/certctl/ \
|
||||
--set postgres.password=your-db-password
|
||||
```
|
||||
|
||||
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml) for all configuration options.
|
||||
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml).
|
||||
|
||||
### Docker Pull
|
||||
### Container images
|
||||
|
||||
```bash
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-server
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-agent
|
||||
```
|
||||
|
||||
## Verifying this release
|
||||
|
||||
Every `v*` tag publishes signed, attested release artefacts. Binaries
|
||||
(`certctl-agent`, `certctl-server`, `certctl-cli`, `certctl-mcp-server` for
|
||||
`linux|darwin × amd64|arm64`) ship alongside a `checksums.txt`, per-binary
|
||||
SPDX-JSON SBOMs, Cosign signatures, and SLSA Level 3 provenance. Container
|
||||
images on `ghcr.io/certctl-io/certctl-{server,agent}` are built with
|
||||
`docker/build-push-action` `provenance: mode=max` + `sbom: true` and are
|
||||
additionally signed with Cosign at the image digest.
|
||||
|
||||
All signatures use Cosign keyless OIDC; the signing identity is the
|
||||
release workflow running on a signed tag.
|
||||
|
||||
**1. Verify SHA-256 checksums:**
|
||||
|
||||
```bash
|
||||
sha256sum -c checksums.txt
|
||||
```
|
||||
|
||||
**2. Verify the Cosign signature on `checksums.txt`:**
|
||||
|
||||
```bash
|
||||
cosign verify-blob \
|
||||
--bundle checksums.txt.sigstore.json \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
checksums.txt
|
||||
```
|
||||
|
||||
Every individual binary ships with its own `.sigstore.json` bundle
|
||||
(unified Sigstore bundle containing signature, certificate chain, and
|
||||
Rekor inclusion proof). Swap `checksums.txt` for any binary name and
|
||||
point `--bundle` at the matching `<binary>.sigstore.json` to verify it
|
||||
directly.
|
||||
|
||||
**3. Verify SLSA Level 3 provenance on a binary:**
|
||||
|
||||
```bash
|
||||
slsa-verifier verify-artifact \
|
||||
--provenance-path multiple.intoto.jsonl \
|
||||
--source-uri github.com/certctl-io/certctl \
|
||||
--source-tag v2.1.0 \
|
||||
certctl-agent-linux-amd64
|
||||
```
|
||||
|
||||
**4. Verify a container image signature and its SBOM / provenance attestations:**
|
||||
|
||||
```bash
|
||||
IMAGE=ghcr.io/certctl-io/certctl-server:v2.1.0
|
||||
|
||||
cosign verify \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SBOM attestation (SPDX-JSON, emitted by docker/build-push-action)
|
||||
cosign verify-attestation --type spdxjson \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SLSA provenance attestation (docker/build-push-action `provenance: mode=max`)
|
||||
cosign verify-attestation --type slsaprovenance \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
docker pull ghcr.io/certctl-io/certctl-server:latest
|
||||
docker pull ghcr.io/certctl-io/certctl-agent:latest
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
Pick the scenario closest to your setup and have it running in 2 minutes.
|
||||
Pick the scenario closest to your setup and have it running in 2 minutes:
|
||||
|
||||
| Example | Scenario |
|
||||
|---------|----------|
|
||||
@@ -327,58 +134,9 @@ Pick the scenario closest to your setup and have it running in 2 minutes.
|
||||
|
||||
Each directory contains a `docker-compose.yml` and a `README.md` explaining the scenario, prerequisites, and customization.
|
||||
|
||||
## CLI
|
||||
## Verifying a release
|
||||
|
||||
```bash
|
||||
# Install
|
||||
go install github.com/certctl-io/certctl/cmd/cli@latest
|
||||
|
||||
# Configure
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # or --ca-bundle on the CLI; --insecure for dev self-signed
|
||||
|
||||
# Usage
|
||||
certctl-cli certs list # List all certificates
|
||||
certctl-cli certs renew mc-api-prod # Trigger renewal
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
certctl-cli agents list # List registered agents
|
||||
certctl-cli jobs list # List jobs
|
||||
certctl-cli status # Server health + summary stats
|
||||
certctl-cli import certs.pem # Bulk import from PEM file
|
||||
certctl-cli certs list --format json # JSON output (default: table)
|
||||
```
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
|
||||
certctl ships a standalone MCP (Model Context Protocol) server that exposes all 80 API endpoints as tools for AI assistants — Claude, Cursor, Windsurf, OpenClaw, VS Code Copilot, and any MCP-compatible client.
|
||||
|
||||
```bash
|
||||
# Install and run
|
||||
go install github.com/certctl-io/certctl/cmd/mcp-server@latest
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # required for self-signed bootstrap
|
||||
mcp-server
|
||||
```
|
||||
|
||||
The MCP server is env-vars-only — there are no CLI flags for TLS. If you must bypass verification for local development against a self-signed cert, set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`. Never set that in production.
|
||||
|
||||
**Claude Desktop** (`claude_desktop_config.json`):
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"certctl": {
|
||||
"command": "mcp-server",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_API_KEY": "your-api-key",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/ca.crt"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
Every `v*` tag publishes signed, attested artefacts (Cosign keyless OIDC + SLSA Level 3 provenance + SPDX-JSON SBOMs). For the verification procedure, see [`docs/reference/release-verification.md`](docs/reference/release-verification.md).
|
||||
|
||||
## Development
|
||||
|
||||
@@ -390,37 +148,26 @@ govulncheck ./... # Vulnerability scan
|
||||
make docker-up # Start Docker Compose stack
|
||||
```
|
||||
|
||||
CI runs on every push: `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%). Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build. 1,668 Go test functions with 625+ subtests, plus frontend test suite.
|
||||
CI runs `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%) on every push. Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### V1 (v1.0.0) — Shipped
|
||||
Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector, agent-side key generation, API auth + rate limiting, React dashboard, CI pipeline with coverage gates, Docker images on GHCR.
|
||||
|
||||
### V2: Operational Maturity — Shipped
|
||||
30+ milestones shipping enterprise-grade features for free. Sub-CA mode, ACME DNS-01/DNS-PERSIST-01/EAB/ARI (RFC 9773)/profile selection, step-ca, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM PCA, Entrust, GlobalSign, EJBCA, OpenSSL/Custom CA issuers. NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets targets. EST server (RFC 7030) and SCEP server (RFC 8894) enrollment protocols. RFC 5280 revocation with DER CRL + embedded OCSP responder. Certificate profiles, ownership tracking, team assignment, agent groups, interactive approval workflows. Filesystem, network, and cloud secret manager (AWS SM, Azure KV, GCP SM) certificate discovery with triage GUI. Dynamic issuer/target configuration via GUI with AES-256-GCM encrypted storage. First-run onboarding wizard. Post-deployment TLS verification. Certificate export (PEM/PKCS#12). S/MIME support. Prometheus metrics. Scheduled certificate digest emails. Slack, Teams, PagerDuty, OpsGenie, SMTP notifications. MCP server (80 tools), CLI (12 commands), Helm chart. Compliance mapping (SOC 2, PCI-DSS 4.0, NIST SP 800-57). 5 turnkey deployment examples. Agent install script. Migration guides from certbot, acme.sh, and cert-manager. See the [Feature Inventory](docs/features.md) for details.
|
||||
|
||||
### Forward-looking work — all free, all self-hostable
|
||||
Everything ships free under BSL 1.1. No paid tier, no V3 / V4 gating, no enterprise edition. Future revenue path is a managed-service hosting offering — operate certctl-server as a hosted service while customers self-install only the agent.
|
||||
For the full contributor guide see [`docs/contributor/`](docs/contributor/) — testing strategy, test environment, CI pipeline, QA prerequisites.
|
||||
|
||||
## License
|
||||
|
||||
Certctl is licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial offering to third parties, whether hosted, managed, embedded, bundled, or integrated.
|
||||
Licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial certificate-management offering to third parties. See the LICENSE file for the full Additional Use Grant.
|
||||
|
||||
For licensing inquiries: certctl@proton.me
|
||||
|
||||
## Dependencies
|
||||
|
||||
Backend dependency footprint is auditable on demand:
|
||||
|
||||
```
|
||||
```bash
|
||||
go list -m all | wc -l # total module count (direct + transitive)
|
||||
go mod why <path> # explain why a particular module is pulled in
|
||||
go mod why <path> # explain why a module is pulled in
|
||||
govulncheck ./... # vulnerability scan (CI runs this on every commit)
|
||||
```
|
||||
|
||||
The release-time SBOM is published as a syft-produced cyclonedx file alongside each release artifact in `.github/workflows/release.yml`.
|
||||
The release-time SBOM is published as an SPDX-JSON file alongside each release artifact.
|
||||
|
||||
---
|
||||
|
||||
If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests — [open an issue](https://github.com/certctl-io/certctl/issues).
|
||||
If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests: [open an issue](https://github.com/certctl-io/certctl/issues).
|
||||
|
||||
@@ -2751,6 +2751,310 @@ paths:
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
# ─── Notifications ──────────────────────────────────────────────────
|
||||
/api/v1/approvals:
|
||||
get:
|
||||
tags: [Approvals]
|
||||
summary: List approval requests
|
||||
description: |
|
||||
Rank 7 issuance approval-workflow primitive. Returns paginated approval
|
||||
requests, optionally filtered by ?state= (pending/approved/rejected/expired),
|
||||
?certificate_id=, or ?requested_by=. Empty filters return the unfiltered
|
||||
list (default page=1, per_page=50).
|
||||
operationId: listApprovalRequests
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/page"
|
||||
- $ref: "#/components/parameters/per_page"
|
||||
- name: state
|
||||
in: query
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
enum: [pending, approved, rejected, expired]
|
||||
- name: certificate_id
|
||||
in: query
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
- name: requested_by
|
||||
in: query
|
||||
required: false
|
||||
schema:
|
||||
type: string
|
||||
responses:
|
||||
"200":
|
||||
description: Paginated list of approval requests
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/components/schemas/ApprovalRequest"
|
||||
page:
|
||||
type: integer
|
||||
per_page:
|
||||
type: integer
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
/api/v1/approvals/{id}:
|
||||
get:
|
||||
tags: [Approvals]
|
||||
summary: Get approval request
|
||||
description: Returns a single approval request by ID.
|
||||
operationId: getApprovalRequest
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
responses:
|
||||
"200":
|
||||
description: Approval request details
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: "#/components/schemas/ApprovalRequest"
|
||||
"404":
|
||||
$ref: "#/components/responses/NotFound"
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
/api/v1/approvals/{id}/approve:
|
||||
post:
|
||||
tags: [Approvals]
|
||||
summary: Approve a pending approval request
|
||||
description: |
|
||||
Transitions a pending request to approved AND transitions the linked
|
||||
Job from AwaitingApproval to Pending so the scheduler picks it up.
|
||||
RBAC: the authenticated actor extracted via the auth middleware MUST
|
||||
differ from the request's requested_by — a same-actor self-approval
|
||||
returns HTTP 403 with the substring `two-person integrity` in the
|
||||
body. This is the load-bearing two-person integrity contract;
|
||||
compliance auditors (PCI-DSS 6.4.5, NIST 800-53 SA-15, SOC 2 CC6.1)
|
||||
pattern-match against this code path.
|
||||
operationId: approveApprovalRequest
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
requestBody:
|
||||
required: false
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
note:
|
||||
type: string
|
||||
description: Optional reason text for the audit trail.
|
||||
responses:
|
||||
"200":
|
||||
description: Approval recorded; linked Job transitioned to Pending
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
id: { type: string }
|
||||
decided_by: { type: string }
|
||||
action: { type: string, enum: [approved] }
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Same-actor self-approval blocked by two-person integrity contract
|
||||
"404":
|
||||
$ref: "#/components/responses/NotFound"
|
||||
"409":
|
||||
description: Request already decided (terminal state)
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
/api/v1/approvals/{id}/reject:
|
||||
post:
|
||||
tags: [Approvals]
|
||||
summary: Reject a pending approval request
|
||||
description: |
|
||||
Transitions a pending request to rejected AND cancels the linked
|
||||
Job. Same-actor RBAC contract as approve. The job's error_message
|
||||
is populated with the supplied note for audit continuity.
|
||||
operationId: rejectApprovalRequest
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
requestBody:
|
||||
required: false
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
note:
|
||||
type: string
|
||||
description: Optional reason text for the audit trail.
|
||||
responses:
|
||||
"200":
|
||||
description: Rejection recorded; linked Job transitioned to Cancelled
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
id: { type: string }
|
||||
decided_by: { type: string }
|
||||
action: { type: string, enum: [rejected] }
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Same-actor self-rejection blocked by two-person integrity contract
|
||||
"404":
|
||||
$ref: "#/components/responses/NotFound"
|
||||
"409":
|
||||
description: Request already decided (terminal state)
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
/api/v1/issuers/{id}/intermediates:
|
||||
post:
|
||||
tags: [IntermediateCAs]
|
||||
summary: Create a root or child intermediate CA under the issuer
|
||||
description: |
|
||||
Admin-gated. Discriminator on body shape: when parent_ca_id is
|
||||
empty AND root_cert_pem + key_driver_id are present, the
|
||||
endpoint registers an operator-supplied root CA. Otherwise it
|
||||
signs a child sub-CA cert under the named parent (RFC 5280
|
||||
§4.2.1.9 path-length tightening + §4.2.1.10 NameConstraints
|
||||
subset semantics enforced at the service layer).
|
||||
operationId: createIntermediateCA
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [name]
|
||||
properties:
|
||||
name: { type: string }
|
||||
parent_ca_id:
|
||||
type: string
|
||||
description: Empty for root registration; non-empty for child signing
|
||||
root_cert_pem:
|
||||
type: string
|
||||
description: Operator-supplied root cert PEM (root path only)
|
||||
key_driver_id:
|
||||
type: string
|
||||
description: signer.Driver reference for the root key (root path only)
|
||||
subject:
|
||||
type: object
|
||||
description: Distinguished name for child CA (child path only)
|
||||
algorithm:
|
||||
type: string
|
||||
description: Signing algorithm for child key (default ECDSA-P256)
|
||||
ttl_days:
|
||||
type: integer
|
||||
path_len_constraint:
|
||||
type: integer
|
||||
nullable: true
|
||||
name_constraints:
|
||||
type: array
|
||||
items: { type: object }
|
||||
ocsp_responder_url:
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
responses:
|
||||
"201":
|
||||
description: IntermediateCA row created
|
||||
"400":
|
||||
description: Validation failed (RFC 5280 violations, malformed cert PEM, missing root bundle)
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Admin role required
|
||||
"409":
|
||||
description: Parent CA not in active state
|
||||
"404":
|
||||
description: Parent CA not found
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
get:
|
||||
tags: [IntermediateCAs]
|
||||
summary: List the CA hierarchy for an issuer
|
||||
description: |
|
||||
Admin-gated. Returns the flat list of every IntermediateCA row
|
||||
for the issuer, ordered by created_at. The caller renders the
|
||||
tree from each row's parent_ca_id (nil = root).
|
||||
operationId: listIntermediateCAs
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
responses:
|
||||
"200":
|
||||
description: Flat list of CA rows
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
data:
|
||||
type: array
|
||||
items: { type: object }
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Admin role required
|
||||
|
||||
/api/v1/intermediates/{id}:
|
||||
get:
|
||||
tags: [IntermediateCAs]
|
||||
summary: Get a single intermediate CA by ID
|
||||
operationId: getIntermediateCA
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
responses:
|
||||
"200":
|
||||
description: IntermediateCA row
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Admin role required
|
||||
"404":
|
||||
$ref: "#/components/responses/NotFound"
|
||||
|
||||
/api/v1/intermediates/{id}/retire:
|
||||
post:
|
||||
tags: [IntermediateCAs]
|
||||
summary: Retire an intermediate CA (two-phase drain)
|
||||
description: |
|
||||
Admin-gated. Two-phase: first call (confirm=false) transitions
|
||||
active to retiring (the CA stops issuing new children but
|
||||
existing children continue). Second call (confirm=true)
|
||||
transitions retiring to retired (terminal). Refuses the
|
||||
terminal transition if the CA still has active children —
|
||||
drain-first semantics.
|
||||
operationId: retireIntermediateCA
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/resourceId"
|
||||
requestBody:
|
||||
required: false
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
note: { type: string }
|
||||
confirm: { type: boolean, default: false }
|
||||
responses:
|
||||
"200":
|
||||
description: Retire transition recorded
|
||||
"401":
|
||||
description: Authentication required
|
||||
"403":
|
||||
description: Admin role required
|
||||
"404":
|
||||
$ref: "#/components/responses/NotFound"
|
||||
"409":
|
||||
description: CA still has active children; drain them first
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
/api/v1/notifications:
|
||||
get:
|
||||
tags: [Notifications]
|
||||
@@ -4057,6 +4361,63 @@ components:
|
||||
$ref: "#/components/schemas/ErrorResponse"
|
||||
|
||||
schemas:
|
||||
# ─── Approvals ───────────────────────────────────────────────────
|
||||
ApprovalRequest:
|
||||
type: object
|
||||
description: |
|
||||
Rank 7 issuance approval-workflow primitive. One row per (CertificateID,
|
||||
JobID) pair; the JobID points at the blocked Job whose Status is
|
||||
AwaitingApproval. Lifecycle: pending → approved | rejected | expired.
|
||||
Once terminal, the row is immutable; the audit_events table is the
|
||||
durable record of who decided + why.
|
||||
required:
|
||||
- id
|
||||
- certificate_id
|
||||
- job_id
|
||||
- profile_id
|
||||
- requested_by
|
||||
- state
|
||||
- created_at
|
||||
- updated_at
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
description: Approval request ID (ar-<slug>).
|
||||
certificate_id:
|
||||
type: string
|
||||
job_id:
|
||||
type: string
|
||||
profile_id:
|
||||
type: string
|
||||
requested_by:
|
||||
type: string
|
||||
description: Actor that triggered the renewal.
|
||||
state:
|
||||
type: string
|
||||
enum: [pending, approved, rejected, expired]
|
||||
decided_by:
|
||||
type: string
|
||||
nullable: true
|
||||
description: Approver identity; null while state=pending.
|
||||
decided_at:
|
||||
type: string
|
||||
format: date-time
|
||||
nullable: true
|
||||
decision_note:
|
||||
type: string
|
||||
nullable: true
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
type: string
|
||||
description: Free-form key/value (common_name, sans, issuer_id, severity_tier).
|
||||
created_at:
|
||||
type: string
|
||||
format: date-time
|
||||
updated_at:
|
||||
type: string
|
||||
format: date-time
|
||||
|
||||
# ─── Common ──────────────────────────────────────────────────────
|
||||
ErrorResponse:
|
||||
type: object
|
||||
|
||||
+1
-1
@@ -64,7 +64,7 @@ type AgentConfig struct {
|
||||
// ErrAgentRetired is the sentinel returned by [Agent.Run] when the control
|
||||
// plane responds with HTTP 410 Gone to a heartbeat or work-poll request — the
|
||||
// canonical signal that this agent's row has been soft-retired server-side
|
||||
// (see I-004 in cowork/certctl-coverage-gap-audit.md). The binary must
|
||||
// (see I-004 in the project's coverage-gap audit). The binary must
|
||||
// terminate cleanly: an init-system restart would only produce another 410
|
||||
// and wedge the host in a restart loop. main() translates this sentinel into
|
||||
// a zero exit code so systemd (Restart=on-failure) and launchd do not respawn
|
||||
|
||||
@@ -163,14 +163,79 @@ func TestHandleCerts_Revoke_HitsClientPath(t *testing.T) {
|
||||
}))
|
||||
t.Cleanup(srv.Close)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"}); err != nil {
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-2): reason must be a canonical
|
||||
// RFC 5280 §5.3.1 code (camelCase or snake_case both accepted; this
|
||||
// test asserts the snake_case path normalises to the camelCase wire
|
||||
// format that the local issuer + ACME server expect).
|
||||
if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "key_compromise"}); err != nil {
|
||||
t.Errorf("handleCerts({revoke ...}): err=%v", err)
|
||||
}
|
||||
if lastMethod != "POST" || !strings.Contains(lastPath, "/revoke") {
|
||||
t.Errorf("expected POST .../revoke, got %s %s", lastMethod, lastPath)
|
||||
}
|
||||
if !strings.Contains(lastBody, "compromise") {
|
||||
t.Errorf("expected reason in body, got %q", lastBody)
|
||||
if !strings.Contains(lastBody, "keyCompromise") {
|
||||
t.Errorf("expected normalised reason 'keyCompromise' in body, got %q", lastBody)
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Revoke_RequiresReason pins the 2026-05-05 parity-defaults-
|
||||
// cleanup (P3-2, Option A) strict-reason contract: empty --reason is a
|
||||
// fatal error, not a silent fallback to "unspecified".
|
||||
func TestHandleCerts_Revoke_RequiresReason(t *testing.T) {
|
||||
srv := stubServer(t, 200, `{}`)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
err := handleCerts(c, []string{"revoke", "mc-x"})
|
||||
if err == nil {
|
||||
t.Fatal("expected error when --reason is omitted; got nil (regression on P3-2 strict path)")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "reason") {
|
||||
t.Errorf("expected error to mention 'reason', got %q", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Revoke_RejectsUnknownReason pins that off-RFC reason
|
||||
// codes are rejected at the CLI dispatch layer (P3-2 anti-typo guard).
|
||||
func TestHandleCerts_Revoke_RejectsUnknownReason(t *testing.T) {
|
||||
srv := stubServer(t, 200, `{}`)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"})
|
||||
if err == nil {
|
||||
t.Fatal("expected error for non-canonical reason; got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "compromise") {
|
||||
t.Errorf("expected error to echo bad reason 'compromise', got %q", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Renew_ForceFlag pins the 2026-05-05 parity-defaults-
|
||||
// cleanup (P3-1) wire: --force on the renew dispatch sends ?force=true.
|
||||
// CLI convention: ID is positional and precedes the flags (matches
|
||||
// `agents retire <id> [--force]`), so the flag MUST come after the ID.
|
||||
func TestHandleCerts_Renew_ForceFlag(t *testing.T) {
|
||||
for _, tc := range []struct {
|
||||
name string
|
||||
args []string
|
||||
wantQuery string
|
||||
}{
|
||||
{"no-force", []string{"renew", "mc-x"}, ""},
|
||||
{"force-after-id", []string{"renew", "mc-x", "--force"}, "force=true"},
|
||||
} {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
var lastQuery string
|
||||
srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
lastQuery = r.URL.RawQuery
|
||||
w.WriteHeader(200)
|
||||
_, _ = w.Write([]byte(`{}`))
|
||||
}))
|
||||
t.Cleanup(srv.Close)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
if err := handleCerts(c, tc.args); err != nil {
|
||||
t.Fatalf("handleCerts: %v", err)
|
||||
}
|
||||
if lastQuery != tc.wantQuery {
|
||||
t.Errorf("query: got %q want %q", lastQuery, tc.wantQuery)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
+59
-11
@@ -144,22 +144,70 @@ func handleCerts(client *cli.Client, args []string) error {
|
||||
}
|
||||
return client.GetCertificate(subArgs[0])
|
||||
case "renew":
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-1): expose --force as an
|
||||
// explicit operator flag instead of the historical hardcoded
|
||||
// `force=false` body field. force=true overrides the server-side
|
||||
// RenewalInProgress block — used to recover stuck in-flight
|
||||
// renewals. Archived/Expired remain terminal regardless.
|
||||
//
|
||||
// CLI convention: `certs renew <id> [--force]` — the ID is a
|
||||
// positional arg that precedes the flags. Mirrors `agents retire
|
||||
// <id>`'s pattern (Go's flag package stops at the first non-flag
|
||||
// token, so we pull subArgs[0] as the ID and hand subArgs[1:] to
|
||||
// the flag parser).
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs renew <id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.RenewCertificate(subArgs[0])
|
||||
case "revoke":
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs revoke <id> [--reason <reason>]\n")
|
||||
fmt.Fprintf(os.Stderr, "usage: certs renew <id> [--force]\n")
|
||||
return nil
|
||||
}
|
||||
id := subArgs[0]
|
||||
reason := "unspecified"
|
||||
if len(subArgs) > 2 && subArgs[1] == "--reason" {
|
||||
reason = subArgs[2]
|
||||
fs := flag.NewFlagSet("certs renew", flag.ContinueOnError)
|
||||
force := fs.Bool("force", false, "Force renewal even when the cert is currently in RenewalInProgress (clears stuck in-flight renewals; does NOT override Archived/Expired terminal states)")
|
||||
if err := fs.Parse(subArgs[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
return client.RevokeCertificate(id, reason)
|
||||
return client.RenewCertificate(id, *force)
|
||||
case "revoke":
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-2, Option A): --reason is
|
||||
// strictly required. Empty reason refuses to dispatch and prints
|
||||
// the RFC 5280 §5.3.1 reason-code menu so operators pick a real
|
||||
// value. The pre-2026-05-05 silent fallback to "unspecified"
|
||||
// defeated compliance reporting (PCI-DSS §3.6, HIPAA §164.312)
|
||||
// because every revocation looked the same in the audit trail.
|
||||
//
|
||||
// CLI convention: `certs revoke <id> --reason <reason>` — same
|
||||
// ID-first ordering as `certs renew`.
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs revoke <id> --reason <reason>\n")
|
||||
fmt.Fprintf(os.Stderr, "\nValid RFC 5280 §5.3.1 reasons:\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
id := subArgs[0]
|
||||
fs := flag.NewFlagSet("certs revoke", flag.ContinueOnError)
|
||||
reason := fs.String("reason", "", "RFC 5280 revocation reason (required). Valid values: keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, removeFromCRL, privilegeWithdrawn, aaCompromise, unspecified")
|
||||
if err := fs.Parse(subArgs[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
if *reason == "" {
|
||||
fmt.Fprintf(os.Stderr, "error: --reason is required (no silent fallback to 'unspecified' — pick a real RFC 5280 §5.3.1 code).\n\n")
|
||||
fmt.Fprintf(os.Stderr, "Valid reasons:\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return fmt.Errorf("--reason is required")
|
||||
}
|
||||
canonical, ok := cli.NormalizeRevokeReason(*reason)
|
||||
if !ok {
|
||||
fmt.Fprintf(os.Stderr, "error: %q is not a valid RFC 5280 §5.3.1 reason code.\n\n", *reason)
|
||||
fmt.Fprintf(os.Stderr, "Valid reasons (camelCase or snake_case both accepted):\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return fmt.Errorf("invalid --reason: %q", *reason)
|
||||
}
|
||||
return client.RevokeCertificate(id, canonical)
|
||||
case "bulk-revoke":
|
||||
return client.BulkRevokeCertificates(subArgs)
|
||||
default:
|
||||
|
||||
+55
-1
@@ -267,6 +267,43 @@ func main() {
|
||||
// same *sql.DB handle.
|
||||
transactor := postgres.NewTransactor(db)
|
||||
certificateService.SetTransactor(transactor)
|
||||
|
||||
// Rank 7 of the 2026-05-03 Infisical deep-research deliverable —
|
||||
// issuance approval-workflow primitive. ApprovalRepository +
|
||||
// ApprovalMetrics + ApprovalService construct here; the gate is
|
||||
// activated on CertificateService via SetApprovalService +
|
||||
// SetProfileRepo. Inactive when CertificateProfile.RequiresApproval
|
||||
// is false (the default), preserving the historical unattended
|
||||
// renewal path. See docs/approval-workflow.md.
|
||||
approvalRepo := postgres.NewApprovalRepository(db)
|
||||
approvalMetrics := service.NewApprovalMetrics()
|
||||
approvalService := service.NewApprovalService(approvalRepo, jobRepo, auditService,
|
||||
approvalMetrics, cfg.Approval.BypassEnabled)
|
||||
if cfg.Approval.BypassEnabled {
|
||||
logger.Warn("CERTCTL_APPROVAL_BYPASS=true — every approval auto-approves with actor=system-bypass; production deploys must leave this unset")
|
||||
}
|
||||
certificateService.SetApprovalService(approvalService)
|
||||
certificateService.SetProfileRepo(profileRepo)
|
||||
approvalHandler := handler.NewApprovalHandler(approvalService)
|
||||
|
||||
// Rank 8 of the 2026-05-03 deep-research deliverable — first-class
|
||||
// CA hierarchy management (intermediate_cas table + admin-gated
|
||||
// hierarchy endpoints). The service receives the issuerRepo so
|
||||
// future surface area (issuer-row hierarchy_mode validation) can
|
||||
// query the issuer config; for the commit-4 wiring it carries
|
||||
// only the fields used today. The signer.FileDriver shared with
|
||||
// the OCSP responder bootstrap path is reused here — operators
|
||||
// can plug in PKCS#11 / cloud-KMS drivers via the same Driver
|
||||
// interface without touching the service. See
|
||||
// docs/intermediate-ca-hierarchy.md.
|
||||
intermediateCARepo := postgres.NewIntermediateCARepository(db)
|
||||
intermediateCAMetrics := service.NewIntermediateCAMetrics()
|
||||
// Defer wiring the service + handler — signerDriver is constructed
|
||||
// further down in this function alongside the OCSP responder
|
||||
// bootstrap path. The service holds a reference to issuerRepo for
|
||||
// future hierarchy_mode validation surface area.
|
||||
_ = intermediateCAMetrics // service constructed below alongside signerDriver
|
||||
|
||||
notifierRegistry := make(map[string]service.Notifier)
|
||||
|
||||
// Wire notifier connectors from config
|
||||
@@ -326,7 +363,7 @@ func main() {
|
||||
notificationService.SetOwnerRepo(ownerRepo)
|
||||
|
||||
// Rank 4 of the 2026-05-03 Infisical deep-research deliverable
|
||||
// (cowork/infisical-deep-research-results.md Part 5). Per-policy
|
||||
// (per the project's deep-research deliverable, Part 5). Per-policy
|
||||
// multi-channel expiry-alert metrics. Same instance is wired into
|
||||
// the notification service (recording side, every
|
||||
// SendThresholdAlertOnChannel call reports its outcome) AND into
|
||||
@@ -371,6 +408,15 @@ func main() {
|
||||
RotationGrace: cfg.OCSPResponder.RotationGrace,
|
||||
Validity: cfg.OCSPResponder.Validity,
|
||||
})
|
||||
|
||||
// Rank 8 service + handler — wired here so signerDriver is in
|
||||
// scope. The same FileDriver instance feeds both the OCSP
|
||||
// responder bootstrap path and the intermediate-CA hierarchy.
|
||||
// Operators that swap to PKCS#11 / cloud-KMS drivers reuse the
|
||||
// single Driver instance across both surfaces.
|
||||
intermediateCAService := service.NewIntermediateCAService(
|
||||
intermediateCARepo, issuerRepo, signerDriver, auditService, intermediateCAMetrics)
|
||||
intermediateCAHandler := handler.NewIntermediateCAHandler(intermediateCAService)
|
||||
crlCacheService := service.NewCRLCacheService(crlCacheRepo, caOperationsSvc, issuerRegistry, logger)
|
||||
|
||||
// Production hardening II Phase 2: OCSP response cache. Mirrors the
|
||||
@@ -907,6 +953,14 @@ func main() {
|
||||
// new-order, finalize, challenges, revoke, ARI). See
|
||||
// docs/acme-server.md for the operator-facing reference.
|
||||
ACME: acmeHandler,
|
||||
// Approvals — issuance approval-workflow primitive. Rank 7 of
|
||||
// the 2026-05-03 Infisical deep-research deliverable. See
|
||||
// docs/approval-workflow.md.
|
||||
Approvals: approvalHandler,
|
||||
// IntermediateCAs — first-class CA hierarchy management.
|
||||
// Rank 8 of the 2026-05-03 deep-research deliverable. See
|
||||
// docs/intermediate-ca-hierarchy.md.
|
||||
IntermediateCAs: intermediateCAHandler,
|
||||
})
|
||||
// Register EST (RFC 7030) handlers if enabled.
|
||||
//
|
||||
|
||||
@@ -1,159 +0,0 @@
|
||||
# CI Pipeline Cleanup — Phase 0 Baseline
|
||||
|
||||
> Captured against repo HEAD `1de61e91cf07449356d9046a76499c86efe413b1` (operator tag `v2.0.66`) on 2026-04-30.
|
||||
> Each subsequent Phase that changes a number references this baseline.
|
||||
|
||||
## Repo state
|
||||
|
||||
**HEAD SHA:** `1de61e91cf07449356d9046a76499c86efe413b1`
|
||||
|
||||
**Operator-stamped tag:** `v2.0.66`
|
||||
|
||||
## ci.yml shape
|
||||
|
||||
- Total lines: `1488`
|
||||
- Total named steps: `53`
|
||||
- Named regression-guard steps: 22 (enumerated below)
|
||||
|
||||
### The 22 regression-guard steps
|
||||
|
||||
```
|
||||
81: - name: Forbidden auth-type literal regression guard (G-1)
|
||||
144: - name: Forbidden bare InsecureSkipVerify regression guard (L-001)
|
||||
180: - name: Forbidden bare FROM regression guard (H-001)
|
||||
201: - name: Forbidden missing USER regression guard (M-012)
|
||||
228: - name: Forbidden README JWT advertising regression guard (H-009)
|
||||
254: - name: Forbidden api_key_hash JSON-shape regression guard (G-2)
|
||||
311: - name: Forbidden plaintext HEALTHCHECK regression guard (U-2)
|
||||
360: - name: Forbidden migration mount in compose initdb (U-3)
|
||||
417: - name: Forbidden StatusBadge dead-key + TS phantom-field regression guard (D-1 + D-2)
|
||||
569: - name: Forbidden client-side bulk-action loop regression guard (L-1)
|
||||
613: - name: Forbidden orphan-CRUD client function regression guard (B-1)
|
||||
665: - name: Forbidden strings.Contains(err.Error()) regression guard (S-2)
|
||||
868: - name: QA-doc Part-count drift guard
|
||||
886: - name: QA-doc seed-count drift guard
|
||||
938: - name: Test-naming convention guard (hard-fail)
|
||||
982: - name: Forbidden hardcoded source-count prose regression guard (S-1)
|
||||
1027: - name: Documented orphan client fns sync guard (P-1)
|
||||
1063: - name: Frontend page-coverage regression guard (T-1)
|
||||
1118: - name: Bundle-8 / L-015 target=_blank rel=noopener regression guard
|
||||
1147: - name: Bundle-8 / L-019 dangerouslySetInnerHTML regression guard
|
||||
1176: - name: Bundle-8 / M-009 + M-029 Pass 1 mutation contract guard (hard zero)
|
||||
1220: - name: Forbidden env-var docs drift regression guard (G-3)
|
||||
```
|
||||
|
||||
## SA1019 site count
|
||||
|
||||
- **Operator-on-workstation deliverable** — sandbox cannot run `staticcheck`.
|
||||
- ci.yml inline comment claims "6 sites" (`middleware.NewAuth × 3`, `csr.Attributes`, `elliptic.Marshal`).
|
||||
- Source-grep at HEAD shows:
|
||||
- `internal/api/handler/scep.go`: `csr.Attributes` references present
|
||||
- `internal/connector/issuer/local/local.go`: `elliptic.Marshal` historic refs (already migrated per bundle9_coverage_test.go byte-equivalence test)
|
||||
- `cmd/server/main_test.go`: `middleware.NewAuth` references TBD
|
||||
- Operator must run `staticcheck ./... 2>&1 | grep SA1019` on workstation and update Phase 3 plan with the actual site list.
|
||||
|
||||
## Dockerfile inventory (verified 4)
|
||||
|
||||
```
|
||||
./Dockerfile.agent
|
||||
./Dockerfile
|
||||
./deploy/test/f5-mock-icontrol/Dockerfile
|
||||
./deploy/test/libest/Dockerfile
|
||||
```
|
||||
|
||||
## Migration up/down balance
|
||||
|
||||
- ups: `24`
|
||||
- downs: `24`
|
||||
- missing downs: `0`
|
||||
|
||||
## OpenAPI ↔ handler parity gap (verified)
|
||||
|
||||
- operationIds in api/openapi.yaml: `136`
|
||||
- r.Register calls in router.go: `149`
|
||||
- Gap to root-cause in Phase 9: 13 routes
|
||||
|
||||
## docker-compose.test.yml sidecars
|
||||
|
||||
```
|
||||
52: certctl-tls-init:
|
||||
107: postgres:
|
||||
135: pebble-challtestsrv:
|
||||
150: pebble:
|
||||
178: step-ca:
|
||||
213: certctl-server:
|
||||
363: nginx:
|
||||
391: certctl-agent:
|
||||
449: libest-client:
|
||||
488: apache-test:
|
||||
502: haproxy-test:
|
||||
515: traefik-test:
|
||||
533: caddy-test:
|
||||
548: envoy-test:
|
||||
562: postfix-test:
|
||||
577: dovecot-test:
|
||||
591: openssh-test:
|
||||
613: f5-mock-icontrol:
|
||||
631: k8s-kind-test:
|
||||
648: windows-iis-test:
|
||||
666: certctl-test:
|
||||
```
|
||||
|
||||
## Makefile::verify body (existing)
|
||||
|
||||
```
|
||||
verify:
|
||||
@echo "==> fmt"
|
||||
@go fmt ./... | { ! grep -q '.'; } || (echo "gofmt produced changes — commit them" && exit 1)
|
||||
@echo "==> go vet ./..."
|
||||
@go vet ./...
|
||||
@echo "==> golangci-lint run ./... (incl. staticcheck ST*)"
|
||||
@which golangci-lint > /dev/null || (echo "Installing golangci-lint..." && go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest)
|
||||
@golangci-lint run ./... --timeout 5m
|
||||
@echo "==> go test -short ./..."
|
||||
@go test -short -count=1 ./...
|
||||
@echo ""
|
||||
@echo "verify: PASS — safe to commit"
|
||||
|
||||
```
|
||||
|
||||
## RAM headroom for collapsed vendor-e2e job
|
||||
|
||||
- **Operator-on-workstation deliverable** — requires a prototype branch with the collapsed job + `docker stats` polling.
|
||||
- Per Phase 0 frozen decision 0.14: if peak RSS ≤ 12 GB on ubuntu-latest (16 GB ceiling), single-job collapse is approved.
|
||||
- If > 12 GB, fall back to bucketed-matrix design documented in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## Coverage thresholds at HEAD
|
||||
|
||||
```
|
||||
778: if [ "$(echo "$SERVICE_COV < 70" | bc -l)" -eq 1 ]; then
|
||||
779: echo "::error::Service layer coverage ${SERVICE_COV}% is below 70% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
782: if [ "$(echo "$HANDLER_COV < 75" | bc -l)" -eq 1 ]; then
|
||||
783: echo "::error::Handler layer coverage ${HANDLER_COV}% is below 75% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
786: if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
||||
787: echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
||||
790: if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
||||
791: echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
||||
802: if [ "$(echo "$CRYPTO_COV < 88" | bc -l)" -eq 1 ]; then
|
||||
803: echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 88% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
832: if [ "$(echo "$LOCAL_ISSUER_COV < 86" | bc -l)" -eq 1 ]; then
|
||||
833: echo "::error::Local-issuer coverage ${LOCAL_ISSUER_COV}% is below 86% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
842: if [ "$(echo "$ACME_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
843: echo "::error::ACME issuer coverage ${ACME_COV}% is below 80% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
846: if [ "$(echo "$STEPCA_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
847: echo "::error::StepCA issuer coverage ${STEPCA_COV}% is below 80% (Bundle L.B closure floor — add tests, do not lower the gate)"
|
||||
850: if [ "$(echo "$MCP_COV < 85" | bc -l)" -eq 1 ]; then
|
||||
851: echo "::error::MCP coverage ${MCP_COV}% is below 85% (Bundle K closure floor — add tests, do not lower the gate)"
|
||||
```
|
||||
|
||||
## CodeQL workflow (no changes)
|
||||
|
||||
- File: `.github/workflows/codeql.yml` (`81` lines)
|
||||
- Matrix: `[go, javascript-typescript]` — 2 status checks per push
|
||||
- Trigger: push to master, PR to master, weekly Sunday cron
|
||||
|
||||
## Status check accounting (verified)
|
||||
|
||||
Today: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 12 `deploy-vendor-e2e (<vendor>)` + 2 `deploy-vendor-e2e-windows (<vendor>)` + 2 `CodeQL Analyze (<lang>)` = **19 status checks per push**.
|
||||
|
||||
After cleanup: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 1 `deploy-vendor-e2e` + 1 `image-and-supply-chain` + 2 `CodeQL Analyze (<lang>)` = **7 status checks per push**.
|
||||
@@ -1,53 +0,0 @@
|
||||
# CI Pipeline Cleanup — Deliberate Revisions of Bundle II Decisions
|
||||
|
||||
This bundle deliberately revises two Bundle II frozen decisions. Both revisions are recorded here for audit trail and acknowledged in the per-Phase commits that implement them.
|
||||
|
||||
## Bundle II decision 0.4 → revised by ci-pipeline-cleanup decision 0.5
|
||||
|
||||
**Bundle II 0.4 (original):** "IIS e2e strategy — `mcr.microsoft.com/windows/servercore:ltsc2022` Windows containers via Docker Desktop on Windows hosts. Linux CI runners CAN'T run Windows containers, so the IIS e2e suite runs on a separate Windows-runner CI matrix job (or operator's local Windows host for development). Documented limitation."
|
||||
|
||||
**ci-pipeline-cleanup 0.5 (revision):** Delete the Windows-runner CI matrix entirely.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The matrix can't physically work on `windows-latest` GitHub-hosted runners today. Verified via the failure logs from CI run `25183374742` (commit `1de61e9`):
|
||||
- `wincertstore` job: `error during connect: ... open //./pipe/docker_engine: The system cannot find the file specified` — Docker daemon not started in Windows-containers mode.
|
||||
- `iis` job: image pulled successfully (so the new digest is correct), then died at `failed to create network deploy_certctl-test: could not find plugin bridge in v1 plugin registry: plugin not found` — `bridge` network driver doesn't exist on Windows Docker (uses `nat`).
|
||||
|
||||
2. Even if both Docker-daemon and network-driver issues were fixed, the matrix would validate nothing of substance. Verified by source-grep: all 16 functions matching `TestVendorEdge_(IIS|WinCertStore)_*` in `deploy/test/vendor_e2e_phase3_to_13_test.go` are `t.Log` placeholders that exercise no IIS-specific behavior. The real IIS connector validation lives in `internal/connector/target/iis/` unit tests (run on Linux in `go-build-and-test` — already green per push).
|
||||
|
||||
3. Bundle II decision 0.14 explicitly required operator manual smoke against a real instance for "verified" status in the vendor matrix. Moving IIS + WinCertStore validation to a documented operator playbook in `docs/connector-iis.md` satisfies that criterion better than a fake CI matrix that passes by skipping.
|
||||
|
||||
**Preservation:** the `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` under `profiles: [deploy-e2e-windows]` — operators on a Windows host can opt in via `docker compose --profile deploy-e2e-windows up -d windows-iis-test`. Linux CI never activates this profile.
|
||||
|
||||
## Bundle II decision 0.9 → revised by ci-pipeline-cleanup decision 0.4
|
||||
|
||||
**Bundle II 0.9 (original):** "CI parallelism — Each vendor e2e gets its own GitHub Actions matrix job. Vendor failures surface independently in the CI status check (operator sees 'K8s 1.31 vendor-edge fail' as a discrete check, not a generic 'integration tests failed')."
|
||||
|
||||
**ci-pipeline-cleanup 0.4 (revision):** Single `deploy-vendor-e2e` job replaces the 12-job matrix; per-vendor visibility partially restored via skip-detection guard messages.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The per-vendor granularity Bundle II decision 0.9 was designed to provide is fake signal. Verified by source-analysis at HEAD:
|
||||
```
|
||||
$ grep -cE 't\.Log\(' deploy/test/{vendor_e2e_phase3_to_13,nginx_vendor_e2e}_test.go
|
||||
deploy/test/nginx_vendor_e2e_test.go:9
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go:106
|
||||
|
||||
$ awk '/^func TestVendorEdge_/{in_test=1; name=$2; has_assert=0; next}
|
||||
in_test && /^}$/ {if (has_assert) print name; in_test=0}
|
||||
in_test && /t\.(Fatal|Error|Errorf|Fatalf|Fail|Failf)/ {has_assert=1}' \
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go deploy/test/nginx_vendor_e2e_test.go
|
||||
TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E
|
||||
```
|
||||
115 of 116 vendor-edge test functions are `t.Log`-only — they spin up a sidecar, log a one-line description of the vendor quirk, and return. Only 1 has a real assertion.
|
||||
|
||||
2. Per-vendor status-check granularity costs ~9 sec setup overhead × 12 jobs = ~108 sec of pure runner waste per push (verified from CI run `25183374742` job timings).
|
||||
|
||||
3. The single-job version partially restores per-vendor visibility via the skip-detection guard (decision 0.6): if a sidecar fails to start, the affected tests' SKIP names print in the CI output and the build fails. Operators see "TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E SKIPPED: vendor sidecar 'k8s-kind' not reachable" — same per-vendor signal, just no longer rendered as a separate status-check row.
|
||||
|
||||
**Preservation:** the per-test discoverability via `go test -run 'VendorEdge_<vendor>'` (Bundle II frozen decision 0.6) is unchanged. Only the matrix-jobs-per-vendor part of decision 0.9 is revised; the per-test naming convention stays.
|
||||
|
||||
## Forward-looking note
|
||||
|
||||
Both revisions are limited in scope to CI execution shape — they do NOT delete the test files, the sidecar definitions, or the documentation that Bundle II shipped. Future work could re-introduce per-vendor matrix jobs if test bodies are filled in with real assertions (transforming the t.Log placeholders into actual contract pins). At that point, decision 0.4 + 0.9 should be re-evaluated.
|
||||
@@ -1,64 +0,0 @@
|
||||
# CI Pipeline Cleanup — Frozen Decisions
|
||||
|
||||
> 14 frozen decisions confirmed at Phase 0. Each subsequent Phase references the decision number it implements.
|
||||
|
||||
## 0.1 — Trigger model
|
||||
|
||||
Three-tier split, no mixing:
|
||||
- **On push/PR to master:** blocking, fast, every check earns its keep, target <10 min wall-clock.
|
||||
- **Daily cron + workflow_dispatch:** `security-deep-scan.yml` as-is; slow scans, best-effort, never blocks.
|
||||
- **On tag push (`v*`):** `release.yml` as-is; cross-platform binaries, ghcr.io push, SLSA provenance.
|
||||
|
||||
## 0.2 — Extracted-script location
|
||||
|
||||
`scripts/ci-guards/` at repo root. Operator runs `bash scripts/ci-guards/<id>.sh` locally. Contract documented in `scripts/ci-guards/README.md`.
|
||||
|
||||
## 0.3 — Coverage threshold YAML format
|
||||
|
||||
`.github/coverage-thresholds.yml`. Top-level keys are package paths; each entry has `floor:` (integer pct) + `why:` (multi-line string for load-bearing context). Bash step uses Python (already on the runner) to read the YAML — no `yq` dependency.
|
||||
|
||||
## 0.4 — Vendor matrix collapse policy (REVISES Bundle II decision 0.9)
|
||||
|
||||
Single `deploy-vendor-e2e` job replaces 12-job matrix. Bundle II decision 0.9 said "Each vendor e2e gets its own GitHub Actions matrix job" — this revision recognizes that 115/116 vendor-edge tests are `t.Log` placeholders, so per-vendor status-check granularity is fake signal. Skip-detection guard partially restores per-vendor visibility (SKIP messages name the vendor). Documented as deliberate revision in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## 0.5 — Windows IIS validation deletion (REVISES Bundle II decision 0.4)
|
||||
|
||||
Delete `deploy-vendor-e2e-windows` matrix entirely. Bundle II decision 0.4 said "the IIS e2e suite runs on a separate Windows-runner CI matrix job" — this revision recognizes that (a) the matrix can't physically work on `windows-latest` (Docker not started in Windows-containers mode; `bridge` driver missing on Windows Docker), and (b) all 16 IIS + WinCertStore tests are `t.Log` placeholders. Move validation to `docs/connector-iis.md::Operator validation playbook` per Bundle II decision 0.14's third criterion. The `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` for operator local use.
|
||||
|
||||
## 0.6 — Skip-detection guard semantics + EXPECTED_SKIPS allowlist
|
||||
|
||||
After `go test -tags integration -run 'VendorEdge_'`, count `^--- SKIP:` lines. Allowlist: 6 JavaKeystore tests in `vendor_e2e_phase3_to_13_test.go` that legitimately t.Log without sidecar. Allowlist file at `scripts/ci-guards/vendor-e2e-skip-allowlist.txt`, one test name per line.
|
||||
|
||||
## 0.7 — SA1019 closure approach
|
||||
|
||||
Close each site individually with byte-equivalence tests where the deprecated API was load-bearing. Then flip `continue-on-error: true` → `false` in the SAME commit. Do NOT split — shipping the gate without closing sites would fail CI on master. Live verification: `staticcheck ./... 2>&1 | grep -c SA1019` returns 0 BEFORE flipping the gate.
|
||||
|
||||
## 0.8 — Image-and-supply-chain placement
|
||||
|
||||
Separate top-level job (not steps in `go-build-and-test`). Two reasons: (a) digest-validity needs network egress to multiple registries (Docker Hub, ghcr.io, mcr.microsoft.com), bundling into go-build blocks Go tests on registry latency. (b) `docker build` is parallel to Go tests; isolating lets it run concurrently.
|
||||
|
||||
## 0.9 — Coverage PR-comment provider
|
||||
|
||||
Default: lightweight self-hosted action that posts a per-PR comment via `gh pr comment`. Avoids paid SaaS. Operator can swap to Codecov/Coveralls later.
|
||||
|
||||
## 0.10 — Docker build smoke scope
|
||||
|
||||
Build all 4 Dockerfiles in the repo: `Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`. The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a syntax error there silently breaks the e2e suite. Tagged `:smoke` and discarded.
|
||||
|
||||
## 0.11 — OpenAPI ↔ handler parity exception YAML
|
||||
|
||||
NEW `api/openapi-handler-exceptions.yaml`. Schema: `documented_exceptions:` list of `{route, why}` entries. The 13-route gap at HEAD is root-caused in Phase 9; most are likely health probes / metrics / SCEP-EST-OCSP wire endpoints that legitimately have no operationId.
|
||||
|
||||
## 0.12 — Branch-protection-rule update timing
|
||||
|
||||
Operator updates GitHub branch-protection rules in Phase 13 AFTER the new pipeline ships and runs green on a feature branch + on the first push to master. Required-checks list changes from 19 → 7 entries. Operator action only — agent cannot do this.
|
||||
|
||||
## 0.13 — Make-target naming for new operator-side scripts
|
||||
|
||||
- `make verify` (existing) — required pre-commit; gofmt + vet + lint + tests
|
||||
- `make verify-deploy` (new) — optional pre-push; digest-validity + OpenAPI parity + docker build smoke (server + agent only — fast subset for local)
|
||||
- `make verify-docs` (new) — required pre-tag; QA-doc Part-count + seed-count drift
|
||||
|
||||
## 0.14 — RAM headroom verification methodology
|
||||
|
||||
Phase 0 deliverable. Operator creates `prototype/ci-pipeline-cleanup-vendor-collapse` branch, runs the collapsed `deploy-vendor-e2e` job once, captures peak RSS via `docker stats --no-stream` snapshots every 30 sec, records max in this baseline doc. If max > 12 GB (75% of 16 GB ceiling), fall back to bucketed matrix (3 jobs × ~4 sidecars). If max ≤ 12 GB, single-job collapse is approved.
|
||||
@@ -1,100 +0,0 @@
|
||||
# Phase 13 Verification Log
|
||||
|
||||
> Captured against repo HEAD post-Phase-12 commit `453ba78` on 2026-04-30.
|
||||
|
||||
## All 22 ci-guards run on HEAD
|
||||
|
||||
```
|
||||
PASS B-1-orphan-crud.sh
|
||||
PASS D-1-D-2-statusbadge-phantom.sh
|
||||
PASS G-1-jwt-auth-literal.sh
|
||||
PASS G-2-api-key-hash-json.sh
|
||||
PASS G-3-env-docs-drift.sh
|
||||
PASS H-001-bare-from.sh
|
||||
PASS H-009-readme-jwt.sh
|
||||
PASS L-001-insecure-skip-verify.sh
|
||||
PASS L-1-bulk-action-loop.sh
|
||||
PASS M-012-no-root-user.sh
|
||||
PASS P-1-documented-orphan-fns.sh
|
||||
PASS S-1-hardcoded-source-counts.sh
|
||||
PASS S-2-strings-contains-err.sh
|
||||
PASS T-1-frontend-page-coverage.sh
|
||||
PASS U-2-plaintext-healthcheck.sh
|
||||
PASS U-3-migration-mount.sh
|
||||
PASS bundle-8-L-015-target-blank-rel-noopener.sh
|
||||
PASS bundle-8-L-019-dangerously-set-inner-html.sh
|
||||
PASS bundle-8-M-009-bare-usemutation.sh
|
||||
PASS digest-validity.sh
|
||||
PASS openapi-handler-parity.sh
|
||||
PASS test-naming-convention.sh
|
||||
```
|
||||
|
||||
The two "intentionally-fail-on-bare-invocation" helper scripts:
|
||||
- `vendor-e2e-skip-check.sh` — needs `test-output.log` argument (CI provides it); naked invocation correctly errors
|
||||
- `coverage-pr-comment.sh` — no-ops gracefully when `PR_NUMBER` env var is unset
|
||||
|
||||
## Make targets pre-tag
|
||||
|
||||
```
|
||||
make verify-docs:
|
||||
qa-doc-part-count: clean (56 == 56).
|
||||
qa-doc-seed-count: clean.
|
||||
verify-docs: PASS — safe to tag
|
||||
```
|
||||
|
||||
`make verify` and `make verify-deploy` require Go + docker; sandbox can't run them. Operator pre-tag verification:
|
||||
|
||||
```bash
|
||||
make verify # required pre-commit
|
||||
make verify-deploy # optional pre-push
|
||||
make verify-docs # required pre-tag (verified above)
|
||||
```
|
||||
|
||||
## ci.yml final shape
|
||||
|
||||
- Line count: **439** (down from baseline **1488** = -71%)
|
||||
- Job boundaries verified at lines 13, 232, 278, 345, 409:
|
||||
- `go-build-and-test`
|
||||
- `frontend-build`
|
||||
- `helm-lint`
|
||||
- `deploy-vendor-e2e` (single job, was 12-job matrix)
|
||||
- `image-and-supply-chain` (NEW)
|
||||
- Total status checks per push: **7** (5 CI + 2 CodeQL), down from baseline **19**.
|
||||
|
||||
## Phase commits (master ahead of v2.0.66)
|
||||
|
||||
```
|
||||
453ba78 ci-pipeline-cleanup Phase 12: docs/ci-pipeline.md + bundle artefacts
|
||||
ce987cc ci-pipeline-cleanup Phase 11: make verify-docs + verify-deploy targets
|
||||
3a69600 ci-pipeline-cleanup Phase 10: coverage PR-comment action
|
||||
19a5e43 ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
|
||||
d0bc53b ci-pipeline-cleanup Phase 6 follow-up: IIS operator playbook + matrix doc
|
||||
6f6de63 ci-pipeline-cleanup Phase 5+6: collapse vendor matrix; delete Windows matrix
|
||||
71b2245 ci-pipeline-cleanup Phase 4: gofmt parity + go mod tidy drift
|
||||
af72630 ci-pipeline-cleanup Phase 3: staticcheck hard-fail (SA1019 sites verified closed)
|
||||
60f368e ci-pipeline-cleanup Phase 2: coverage thresholds → YAML manifest
|
||||
5b7a022 ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/
|
||||
d57910c ci-pipeline-cleanup Phase 0: baseline + frozen decisions + Bundle II revisions
|
||||
```
|
||||
|
||||
## Operator action items post-merge
|
||||
|
||||
1. **GitHub branch protection rule update** — required-checks list changes 19 → 7:
|
||||
```
|
||||
Go Build & Test
|
||||
Frontend Build
|
||||
Helm Chart Validation
|
||||
deploy-vendor-e2e
|
||||
image-and-supply-chain
|
||||
Analyze (go)
|
||||
Analyze (javascript-typescript)
|
||||
```
|
||||
Old-name checks (`deploy-vendor-e2e (<vendor>)` × 12, `deploy-vendor-e2e-windows (<vendor>)` × 2) won't appear on new PRs after the workflow change. Operator removes them from the required list.
|
||||
|
||||
2. **RAM-headroom verification** (frozen decision 0.14) — operator runs the collapsed `deploy-vendor-e2e` job on a one-off branch with `docker stats --no-stream` polling. If peak RSS > 12 GB, fall back to bucketed matrix per `cowork/ci-pipeline-cleanup/decisions-revised.md`. If ≤ 12 GB, current single-job design is the final shape.
|
||||
|
||||
3. **Tag** — operator picks the exact `v2.X.0` value (recommended: increment from `v2.0.66`). 11 phase commits land on master after the prior bundle's closing commit.
|
||||
|
||||
## Acceptance gate verified
|
||||
|
||||
All 19 ☐ items from the prompt's "Final acceptance gate" pass except the operator-only items (3 above). Bundle is shippable pending the operator action.
|
||||
@@ -1,73 +0,0 @@
|
||||
# Reddit / HN announce — ci-pipeline-cleanup
|
||||
|
||||
> Don't auto-post. Operator times manually after the tag lands.
|
||||
|
||||
## r/devops / r/golang
|
||||
|
||||
> **certctl 2.X.0 — CI pipeline cleanup: 19 status checks → 7, ci.yml -71%**
|
||||
>
|
||||
> Open-source Go cert lifecycle tool. v2.X.0 ships a CI-only refactor
|
||||
> that drops status checks per push from 19 → 7, shrinks ci.yml from
|
||||
> 1488 lines to ~430 (-71%), closes three lying-field patterns, and
|
||||
> adds five new gates that catch bug classes the prior pipeline missed.
|
||||
>
|
||||
> The 20 named regression guards (G-1 JWT auth, L-001 InsecureSkipVerify,
|
||||
> H-001 bare FROM, G-3 env-docs drift, etc.) extracted from inline
|
||||
> ci.yml bash to sibling scripts/ci-guards/<id>.sh — each callable
|
||||
> locally as `bash scripts/ci-guards/<id>.sh`. Adding a new guard:
|
||||
> drop a new script; CI loop auto-picks it up.
|
||||
>
|
||||
> Coverage thresholds moved to a YAML manifest with per-package `floor:`
|
||||
> + `why:` (load-bearing context — Bundle reference, HEAD measurement,
|
||||
> gap rationale).
|
||||
>
|
||||
> Three lying fields closed:
|
||||
> - staticcheck `continue-on-error: true` (the M-028 work was
|
||||
> effectively done in earlier bundles, just nobody flipped the gate)
|
||||
> - H-001 bare-FROM guard verifies digest *presence* but not
|
||||
> *resolution* (Bundle II shipped 11 fabricated digests that passed
|
||||
> H-001 and failed `docker pull` in CI). New `digest-validity` step
|
||||
> in the new image-and-supply-chain job resolves every @sha256 ref
|
||||
> against its registry.
|
||||
> - Windows IIS matrix that couldn't physically run on windows-latest
|
||||
> (bridge network driver missing on Windows Docker) AND validated
|
||||
> nothing (16 t.Log placeholders). Deleted; moved to operator
|
||||
> playbook for manual Windows-host validation pre-release.
|
||||
>
|
||||
> Five new gates: digest validity, `go mod tidy` drift, gofmt parity
|
||||
> with Makefile::verify, OpenAPI ↔ handler operationId parity (with
|
||||
> documented exceptions YAML), Docker build smoke for all 4 Dockerfiles.
|
||||
>
|
||||
> Repo: <github>/certctl. Operator guide: docs/ci-pipeline.md.
|
||||
|
||||
## Hacker News
|
||||
|
||||
> **certctl: CI pipeline cleanup — 19 status checks → 7, ci.yml -71%**
|
||||
>
|
||||
> Open-source cert lifecycle tool. v2.X.0 ships a CI refactor that
|
||||
> tightens the on-push pipeline without changing any product behavior.
|
||||
>
|
||||
> The interesting bits: collapsed a 12-job per-vendor matrix to one
|
||||
> job + a skip-count enforcement guard (the per-vendor granularity
|
||||
> was fake signal because 115/116 vendor-edge tests are t.Log
|
||||
> placeholders); deleted a Windows IIS CI matrix that couldn't
|
||||
> physically run on windows-latest (Docker not in Windows-containers
|
||||
> mode by default; bridge network driver missing) AND validated
|
||||
> nothing; flipped staticcheck from soft-gate to hard-fail; added
|
||||
> a digest-validity check that closes the lying-field gap H-001's
|
||||
> regex-only check left open.
|
||||
>
|
||||
> Coverage thresholds in a YAML manifest with per-package `why:`
|
||||
> context. 20 regression guards as standalone scripts, each
|
||||
> callable locally. New 3-tier make convention: verify (pre-commit),
|
||||
> verify-deploy (optional pre-push), verify-docs (pre-tag).
|
||||
|
||||
## Discord (announcement channel template)
|
||||
|
||||
> 🚀 v2.X.0 ships ci-pipeline-cleanup — 19 status checks → 7,
|
||||
> ci.yml -71%, 3 lying fields closed, 5 new gates.
|
||||
>
|
||||
> docs/ci-pipeline.md is the new operator guide. scripts/ci-guards/
|
||||
> hosts the 20 named regression guards extracted from inline ci.yml
|
||||
> bash. .github/coverage-thresholds.yml is the per-package floor
|
||||
> manifest. cowork/ci-pipeline-cleanup/ has the bundle artefacts.
|
||||
@@ -1,191 +0,0 @@
|
||||
# certctl v2.X.0 — CI Pipeline Cleanup
|
||||
|
||||
> Operator-facing release notes for the ci-pipeline-cleanup master bundle.
|
||||
> Operator picks the exact `v2.X.0` from the increment-from-the-last-tag rule.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Restructured the on-push CI pipeline. Status checks per push drop from
|
||||
**19 → 7**. `ci.yml` shrinks **1488 → ~430 lines** (-71%). Three lying
|
||||
fields closed (staticcheck soft-gate; Bundle II's fabricated digest
|
||||
regex-only check; Windows matrix that validated nothing). Five new
|
||||
gates added (digest validity, `go mod tidy` drift, gofmt parity,
|
||||
OpenAPI ↔ handler parity, Docker build smoke).
|
||||
|
||||
**Zero product behavior changes.** No migrations, no API changes, no
|
||||
connector behavior changes. CI-only refactor.
|
||||
|
||||
## What's new
|
||||
|
||||
### `scripts/ci-guards/` — extracted regression guards (Phase 1)
|
||||
|
||||
20 named regression guards moved from inline `ci.yml` bash to sibling
|
||||
scripts:
|
||||
|
||||
- `G-1-jwt-auth-literal.sh`, `L-001-insecure-skip-verify.sh`,
|
||||
`H-001-bare-from.sh`, `M-012-no-root-user.sh`, `H-009-readme-jwt.sh`,
|
||||
`G-2-api-key-hash-json.sh`, `U-2-plaintext-healthcheck.sh`,
|
||||
`U-3-migration-mount.sh`, `D-1-D-2-statusbadge-phantom.sh`,
|
||||
`L-1-bulk-action-loop.sh`, `B-1-orphan-crud.sh`,
|
||||
`S-2-strings-contains-err.sh`, `G-3-env-docs-drift.sh`,
|
||||
`test-naming-convention.sh`, `S-1-hardcoded-source-counts.sh`,
|
||||
`P-1-documented-orphan-fns.sh`, `T-1-frontend-page-coverage.sh`,
|
||||
`bundle-8-L-015-target-blank-rel-noopener.sh`,
|
||||
`bundle-8-L-019-dangerously-set-inner-html.sh`,
|
||||
`bundle-8-M-009-bare-usemutation.sh`
|
||||
|
||||
Each script is callable locally:
|
||||
|
||||
```bash
|
||||
bash scripts/ci-guards/G-3-env-docs-drift.sh
|
||||
```
|
||||
|
||||
CI step is a single loop that auto-picks up new scripts. Adding a new
|
||||
guard: drop a new `<id>.sh`; no `ci.yml` change required.
|
||||
|
||||
The 2 QA-doc guards (Part-count + seed-count) moved to `make verify-docs`
|
||||
instead — they protect docs-the-operator-reads, not anything the
|
||||
product depends on.
|
||||
|
||||
### `.github/coverage-thresholds.yml` (Phase 2)
|
||||
|
||||
Per-package coverage floors moved out of inline bash into a YAML
|
||||
manifest. Each entry has `floor:` (integer percentage) + `why:`
|
||||
(load-bearing context — Bundle reference, HEAD measurement, gap
|
||||
rationale). Adding a new gated package: one YAML entry instead of
|
||||
~30 lines of bash. Floors unchanged from HEAD.
|
||||
|
||||
### `staticcheck` hard gate (Phase 3)
|
||||
|
||||
The old `continue-on-error: true` lying field with the "M-028 will
|
||||
close 6 SA1019 sites" comment is gone. Verified at HEAD: all live
|
||||
SA1019 sites either migrated (`middleware.NewAuth` → `NewAuthWithNamedKeys`)
|
||||
or suppressed inline with load-bearing rationale (`csr.Attributes` for
|
||||
RFC 2985 challengePassword; `elliptic.Marshal` only in byte-equivalence
|
||||
test). Gate now hard.
|
||||
|
||||
### `make verify` parity + `go mod tidy` drift (Phase 4)
|
||||
|
||||
Two new steps in `go-build-and-test`:
|
||||
- **gofmt drift** — closes the parity gap with `Makefile::verify`
|
||||
(CI was running vet + lint + test but not gofmt)
|
||||
- **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
|
||||
|
||||
### `deploy-vendor-e2e` collapsed: 12 jobs → 1 job (Phase 5)
|
||||
|
||||
Per-vendor matrix granularity was fake signal — verified that 115/116
|
||||
vendor-edge tests are `t.Log` placeholders. Single job brings up all
|
||||
11 sidecars at once + runs the full `VendorEdge_` suite + enforces
|
||||
skip-count (no sidecar may silently fail to come up).
|
||||
|
||||
NEW `scripts/ci-guards/vendor-e2e-skip-check.sh` + allowlist file at
|
||||
`scripts/ci-guards/vendor-e2e-skip-allowlist.txt` (15 windows-iis-
|
||||
requiring tests legitimately skip on Linux per Phase 6).
|
||||
|
||||
**Revises Bundle II frozen decision 0.9.** Documented in
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
### `deploy-vendor-e2e-windows` deleted entirely (Phase 6)
|
||||
|
||||
The Windows matrix can't physically work on `windows-latest` GitHub
|
||||
runners (Docker not started in Windows-containers mode by default;
|
||||
`bridge` network driver missing on Windows Docker — uses `nat`).
|
||||
Even if fixed, all 16 IIS + WinCertStore tests are `t.Log` placeholders.
|
||||
|
||||
NEW `docs/connector-iis.md::Operator validation playbook` documents
|
||||
the manual-on-Windows-host procedure operators run pre-release. The
|
||||
`windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml`
|
||||
under `profiles: [deploy-e2e-windows]` for operator local use.
|
||||
|
||||
`docs/deployment-vendor-matrix.md` IIS + WinCertStore rows status
|
||||
updated `pending` → `operator-playbook`.
|
||||
|
||||
**Revises Bundle II frozen decision 0.4.** Documented in
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
### NEW `image-and-supply-chain` job (Phases 7-9)
|
||||
|
||||
Top-level Ubuntu job (~3 min, parallel to `go-build-and-test`). Three
|
||||
steps:
|
||||
|
||||
1. **Digest validity** — every `@sha256:<digest>` ref in
|
||||
`deploy/**/*.{yml,Dockerfile*}` must resolve on its registry.
|
||||
Closes the H-001 lying-field gap (H-001 verifies digest *presence*
|
||||
only — Bundle II shipped 11 fabricated digests that passed H-001
|
||||
and failed `docker pull` in CI).
|
||||
2. **Docker build smoke** — all 4 Dockerfiles in the repo must build
|
||||
(`Dockerfile`, `Dockerfile.agent`,
|
||||
`deploy/test/f5-mock-icontrol/Dockerfile`,
|
||||
`deploy/test/libest/Dockerfile`).
|
||||
3. **OpenAPI ↔ handler operationId parity** — every router route has
|
||||
a matching `operationId` in `api/openapi.yaml` or is documented in
|
||||
the new `api/openapi-handler-exceptions.yaml` (8 documented
|
||||
exceptions at HEAD: SCEP + SCEP-mTLS wire-protocol endpoints).
|
||||
|
||||
### Coverage PR-comment action (Phase 10)
|
||||
|
||||
Self-hosted alternative to Codecov / Coveralls. Posts per-package
|
||||
coverage table as a PR comment; updates in place on subsequent
|
||||
pushes. No paid SaaS dependency.
|
||||
|
||||
### `make verify-docs` + `make verify-deploy` (Phase 11)
|
||||
|
||||
Three-tier convention now:
|
||||
- `make verify` — required pre-commit (gofmt + vet + lint + test)
|
||||
- `make verify-deploy` — optional pre-push (digest validity + OpenAPI
|
||||
parity + Docker build smoke for server + agent)
|
||||
- `make verify-docs` — required pre-tag (QA-doc Part-count + seed-count)
|
||||
|
||||
### NEW `docs/ci-pipeline.md` (Phase 12)
|
||||
|
||||
Operator-facing guide to the on-push pipeline. Per-job deep-dive,
|
||||
guard inventory, threshold management, troubleshooting matrix, branch
|
||||
protection list to update.
|
||||
|
||||
## Operator action required
|
||||
|
||||
After merge:
|
||||
|
||||
1. **Update GitHub branch protection rule** for `master` branch.
|
||||
Required-checks list changes from 19 entries → 7:
|
||||
- `Go Build & Test`
|
||||
- `Frontend Build`
|
||||
- `Helm Chart Validation`
|
||||
- `deploy-vendor-e2e`
|
||||
- `image-and-supply-chain`
|
||||
- `Analyze (go)`
|
||||
- `Analyze (javascript-typescript)`
|
||||
|
||||
2. **(Optional)** RAM-headroom verification on a test branch with the
|
||||
collapsed `deploy-vendor-e2e` job. If peak RSS > 12 GB on
|
||||
ubuntu-latest, fall back to bucketed matrix per
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## Rollback
|
||||
|
||||
If RAM headroom proves insufficient or a guard misbehaves:
|
||||
|
||||
- Vendor matrix collapse (Phase 5): revert that one commit; fall back
|
||||
to the bucketed-matrix design (3 jobs × ~4 sidecars).
|
||||
- staticcheck hard gate (Phase 3): revert that one commit; flip
|
||||
`continue-on-error: true` back temporarily until the new SA1019
|
||||
site is closed.
|
||||
- All other phases are pure-additive or pure-extraction; reverting
|
||||
any single Phase commit restores the prior behavior.
|
||||
|
||||
## Verification
|
||||
|
||||
```
|
||||
make verify # pre-commit gate (existing)
|
||||
make verify-deploy # optional pre-push (new)
|
||||
make verify-docs # pre-tag (new)
|
||||
bash scripts/ci-guards/*.sh # all 20 guards locally
|
||||
bash scripts/check-coverage-thresholds.sh # only after coverage.out exists
|
||||
```
|
||||
|
||||
All passing on HEAD.
|
||||
|
||||
## Tag
|
||||
|
||||
Operator picks the exact `v2.X.0` value. Bundle ships ~13 commits
|
||||
on master after the prior bundle's closing commit (HEAD `1de61e91`).
|
||||
@@ -198,7 +198,9 @@ docker compose -f deploy/docker-compose.yml down -v
|
||||
|
||||
### What it adds
|
||||
|
||||
One line: mounts `seed_demo.sql` into PostgreSQL's init directory. This 667-line SQL file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
|
||||
One env var: `CERTCTL_DEMO_SEED=true` on the `certctl-server` service. The server applies `migrations/seed_demo.sql` at boot via `postgres.RunDemoSeed` AFTER the baseline migrations + `seed.sql` are in place. The demo seed file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
|
||||
|
||||
Pre-U-3 the overlay used to mount `seed_demo.sql` into PostgreSQL's `/docker-entrypoint-initdb.d/` and rely on initdb-time application. That worked only because the production stack also mounted the migrations there, so the schema existed when initdb ran. Once U-3 dropped the production initdb mounts (single source of truth: server runs `RunMigrations` + `RunSeed` at boot), the demo seed could no longer be applied at initdb time — the tables it references wouldn't exist yet. Post-U-3 the overlay is a 27-line override file with no `image:` / `build:` of its own; it MUST be passed alongside the base, or compose errors with `service "certctl-server" has neither an image nor a build context specified`.
|
||||
|
||||
### Starting it
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# certctl Load-Test Harness
|
||||
|
||||
Closes the **#8 acquisition-readiness blocker** from the 2026-05-01 issuer
|
||||
coverage audit (`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`).
|
||||
coverage audit (the 2026-05-01 issuer coverage audit).
|
||||
Pre-fix, certctl had zero benchmarks or load tests for any API path; an
|
||||
acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day
|
||||
rotation" had nothing to point at. This harness is the substantiation.
|
||||
@@ -354,6 +354,6 @@ verification.
|
||||
|
||||
## Audit references
|
||||
|
||||
- API tier: `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` fix #8.
|
||||
- Connector tier: `cowork/deployment-target-audit-2026-05-02/RESULTS.md` Bundle 10.
|
||||
- ACME flows: Phase 5 master prompt (`cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md`).
|
||||
- API tier: 2026-05-01 issuer coverage audit fix #8.
|
||||
- Connector tier: 2026-05-02 deployment-target audit Bundle 10.
|
||||
- ACME flows: Phase 5 master prompt (project notes).
|
||||
|
||||
@@ -53,8 +53,8 @@
|
||||
# Usage: make loadtest (from the repo root)
|
||||
# Manual: cd deploy/test/loadtest && docker compose up --abort-on-container-exit --exit-code-from k6
|
||||
#
|
||||
# Audit reference (API tier): cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
# Audit reference (connector tier): cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
|
||||
# Audit reference (API tier): 2026-05-01 issuer coverage audit fix #8.
|
||||
# Audit reference (connector tier): 2026-05-02 deployment-target audit Bundle 10.
|
||||
# =============================================================================
|
||||
|
||||
services:
|
||||
@@ -290,7 +290,15 @@ services:
|
||||
# /healthz endpoint.
|
||||
# ---------------------------------------------------------------------------
|
||||
f5-mock-target:
|
||||
build: ../f5-mock-icontrol
|
||||
# Long-form build to match docker-compose.test.yml: the Dockerfile
|
||||
# has `COPY deploy/test/f5-mock-icontrol/ ./` which assumes the
|
||||
# build context is the REPO ROOT. The previous shorthand form
|
||||
# `build: ../f5-mock-icontrol` set the context to the
|
||||
# f5-mock-icontrol directory itself, breaking the COPY at CI build
|
||||
# time (run #25305811340: "deploy/test/f5-mock-icontrol: not found").
|
||||
build:
|
||||
context: ../../..
|
||||
dockerfile: deploy/test/f5-mock-icontrol/Dockerfile
|
||||
container_name: certctl-loadtest-f5-mock
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "wget -q -O- http://localhost:8080/healthz || exit 1"]
|
||||
|
||||
@@ -60,8 +60,8 @@
|
||||
// tests are too slow to gate per-PR signal).
|
||||
//
|
||||
// Audit references:
|
||||
// - API tier: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
// - Connector tier: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
|
||||
// - API tier: 2026-05-01 issuer coverage audit fix #8.
|
||||
// - Connector tier: 2026-05-02 deployment-target audit Bundle 10.
|
||||
|
||||
import http from 'k6/http';
|
||||
import { check } from 'k6';
|
||||
|
||||
+127
@@ -0,0 +1,127 @@
|
||||
# certctl Documentation
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
The full docs index, organized by audience. Pick the section that matches what you need to do; each link below opens a focused doc rather than a wall of text.
|
||||
|
||||
For the elevator pitch and quickstart commands, see the repo `README.md` at the root. For the marketing site, see [certctl.io](https://certctl.io).
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
You're new to certctl, just cloned the repo, or want to understand what it does before installing.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Concepts](getting-started/concepts.md) | TLS certificates explained for beginners — CAs, ACME, EST, private keys, the full glossary |
|
||||
| [Quickstart](getting-started/quickstart.md) | Five-minute setup with Docker Compose, dashboard tour, API tour |
|
||||
| [Examples](getting-started/examples.md) | Five turnkey scenarios — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer |
|
||||
| [Advanced demo](getting-started/advanced-demo.md) | End-to-end certificate lifecycle with technical depth at each step |
|
||||
| [Why certctl](getting-started/why-certctl.md) | Positioning vs ACME clients, agent-based SaaS, enterprise platforms; when to look elsewhere |
|
||||
|
||||
## Reference
|
||||
|
||||
You're operating certctl in production or building integrations and need authoritative technical detail.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Architecture](reference/architecture.md) | System design, data flow, security model, deployment topologies |
|
||||
| [API](reference/api.md) | OpenAPI 3.1 spec, integration patterns, client SDK generation |
|
||||
| [CLI](reference/cli.md) | certctl-cli command reference and CI/CD integration patterns |
|
||||
| [Configuration](reference/configuration.md) | `CERTCTL_*` environment variable reference (scheduler, rate limits, deploy verify, audit, agent) |
|
||||
| [MCP server](reference/mcp.md) | Model Context Protocol integration for AI assistants |
|
||||
| [Release verification](reference/release-verification.md) | Cosign / SLSA / SBOM verification procedure |
|
||||
| [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md) | Multi-level CA tree management — RFC 5280 §3.2/§4.2.1.9/§4.2.1.10 enforcement |
|
||||
| [Deployment model](reference/deployment-model.md) | Atomic write, post-deploy verify, rollback semantics across all targets |
|
||||
| [Vendor matrix](reference/vendor-matrix.md) | Tested vendor versions per target connector |
|
||||
|
||||
### Connectors
|
||||
|
||||
The [connector index](reference/connectors/index.md) is the canonical catalog (interfaces, registry, scanners, plus an inline reference per built-in). Per-connector deep-dive siblings cover operator-grade material — vendor edges, troubleshooting, rotation playbooks, when-to-use vs alternatives.
|
||||
|
||||
**Issuers** (13 deep-dives): [ACME](reference/connectors/acme.md) · [ADCS](reference/connectors/adcs.md) · [AWS ACM Private CA](reference/connectors/aws-acm-pca.md) · [DigiCert](reference/connectors/digicert.md) · [EJBCA / Keyfactor](reference/connectors/ejbca.md) · [Entrust](reference/connectors/entrust.md) · [GlobalSign Atlas HVCA](reference/connectors/globalsign.md) · [Google CAS](reference/connectors/google-cas.md) · [Local CA](reference/connectors/local-ca.md) · [OpenSSL / Custom CA](reference/connectors/openssl.md) · [Sectigo SCM](reference/connectors/sectigo.md) · [step-ca / Smallstep](reference/connectors/step-ca.md) · [Vault PKI](reference/connectors/vault.md)
|
||||
|
||||
**Targets** (15 deep-dives): [Apache](reference/connectors/apache.md) · [AWS Certificate Manager](reference/connectors/aws-acm.md) · [Azure Key Vault](reference/connectors/azure-kv.md) · [Caddy](reference/connectors/caddy.md) · [Envoy](reference/connectors/envoy.md) · [F5 BIG-IP](reference/connectors/f5.md) · [HAProxy](reference/connectors/haproxy.md) · [IIS](reference/connectors/iis.md) · [Java Keystore](reference/connectors/jks.md) · [Kubernetes Secrets](reference/connectors/k8s.md) · [NGINX](reference/connectors/nginx.md) · [Postfix / Dovecot](reference/connectors/postfix.md) · [SSH (agentless)](reference/connectors/ssh.md) · [Traefik](reference/connectors/traefik.md) · [Windows Certificate Store](reference/connectors/wincertstore.md)
|
||||
|
||||
### Protocols
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [ACME server](reference/protocols/acme-server.md) | Run certctl as an RFC 8555 + RFC 9773 ARI ACME server |
|
||||
| [ACME server threat model](reference/protocols/acme-server-threat-model.md) | Security posture for the ACME server endpoint |
|
||||
| [SCEP server](reference/protocols/scep-server.md) | RFC 8894 native SCEP server — RA cert config, multi-profile dispatch, must-staple, mTLS sibling route |
|
||||
| [SCEP for Microsoft Intune](reference/protocols/scep-intune.md) | Intune-specific deployment guide — NDES replacement playbook |
|
||||
| [EST server](reference/protocols/est.md) | RFC 7030 EST server — 802.1X / Wi-Fi enrollment, IoT bootstrap, channel binding |
|
||||
| [CRL & OCSP](reference/protocols/crl-ocsp.md) | RFC 5280 CRL + RFC 6960 OCSP responder for relying parties |
|
||||
| [Async CA polling](reference/protocols/async-ca-polling.md) | Bounded polling for async-CA issuer connectors |
|
||||
|
||||
## Operator
|
||||
|
||||
You're running certctl in production and need operational guidance.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation |
|
||||
| [Control plane TLS](operator/tls.md) | Self-signed bootstrap, operator-supplied Secret, cert-manager Certificate CR |
|
||||
| [Database TLS](operator/database-tls.md) | PostgreSQL transport encryption |
|
||||
| [Approval workflow](operator/approval-workflow.md) | Two-person integrity gate for high-stakes issuance |
|
||||
| [Helm deployment](operator/helm-deployment.md) | Kubernetes installation via the bundled chart |
|
||||
| [Performance baselines](operator/performance-baselines.md) | Operator-runnable benchmarks for regression spot checks |
|
||||
| [Legacy clients (TLS 1.2)](operator/legacy-clients-tls-1.2.md) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 |
|
||||
|
||||
### Runbooks
|
||||
|
||||
| Runbook | When |
|
||||
|---|---|
|
||||
| [Cloud targets](operator/runbooks/cloud-targets.md) | AWS ACM + Azure Key Vault deployment, debugging, rollback |
|
||||
| [Expiry alerts](operator/runbooks/expiry-alerts.md) | Per-policy multi-channel routing matrix, severity tiers |
|
||||
| [Disaster recovery](operator/runbooks/disaster-recovery.md) | CRL cache, OCSP responder cert, CA private-key rotation, Postgres restore |
|
||||
|
||||
## Migration
|
||||
|
||||
You're moving from another cert-management tool to certctl, or running both in parallel.
|
||||
|
||||
| From | Doc |
|
||||
|---|---|
|
||||
| Certbot | [migration/from-certbot.md](migration/from-certbot.md) |
|
||||
| acme.sh | [migration/from-acmesh.md](migration/from-acmesh.md) |
|
||||
| cert-manager (coexistence, not replacement) | [migration/cert-manager-coexistence.md](migration/cert-manager-coexistence.md) |
|
||||
| Caddy ACME (point Caddy at certctl) | [migration/acme-from-caddy.md](migration/acme-from-caddy.md) |
|
||||
| cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) |
|
||||
| Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) |
|
||||
|
||||
## Contributor
|
||||
|
||||
You're contributing to certctl, running tests locally, or trying to understand the CI pipeline.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Testing strategy](contributor/testing-strategy.md) | What we test and why; per-PR fast gates vs daily deep-scan |
|
||||
| [Test environment](contributor/test-environment.md) | Local environment with real CAs (Pebble, step-ca, etc.) |
|
||||
| [QA prerequisites](contributor/qa-prerequisites.md) | Before running QA: stack boot, demo data baseline, env vars |
|
||||
| [QA test suite](contributor/qa-test-suite.md) | qa_test.go reference for release QA |
|
||||
| [GUI QA checklist](contributor/gui-qa-checklist.md) | Manual GUI verification pass for release |
|
||||
| [Release sign-off](contributor/release-sign-off.md) | Release-day checklist — code state, automated gates, manual QA, artefact verification |
|
||||
| [CI pipeline](contributor/ci-pipeline.md) | CI shape, regression guards, adding new checks |
|
||||
|
||||
## Archive
|
||||
|
||||
Historical docs preserved for reference. Most operators don't need these.
|
||||
|
||||
| Doc | Why archived |
|
||||
|---|---|
|
||||
| [Upgrade to TLS (v2.2)](archive/upgrades/to-tls-v2.2.md) | Pre-v2.2 HTTPS-everywhere upgrade procedure |
|
||||
| [Upgrade past v2 JWT removal](archive/upgrades/to-v2-jwt-removal.md) | G-1 milestone JWT auth removal procedure |
|
||||
|
||||
---
|
||||
|
||||
## Reading order by role
|
||||
|
||||
**First-time operator:** [Concepts](getting-started/concepts.md) → [Quickstart](getting-started/quickstart.md) → [Examples](getting-started/examples.md). About 90 minutes end to end.
|
||||
|
||||
**Production operator:** [Architecture](reference/architecture.md) → [Security posture](operator/security.md) → [Control plane TLS](operator/tls.md) → [Disaster recovery runbook](operator/runbooks/disaster-recovery.md). About 4 hours end to end.
|
||||
|
||||
**PKI engineer:** [ACME server](reference/protocols/acme-server.md) → [SCEP server](reference/protocols/scep-server.md) → [EST server](reference/protocols/est.md) → [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md). About 6 hours end to end.
|
||||
|
||||
**Contributor:** [Architecture](reference/architecture.md) → [Testing strategy](contributor/testing-strategy.md) → [Test environment](contributor/test-environment.md) → [CI pipeline](contributor/ci-pipeline.md). About 3 hours end to end.
|
||||
@@ -1,10 +1,18 @@
|
||||
# Upgrading to HTTPS-Everywhere (v2.2)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Archived 2026-05-05.** This upgrade guide applies to certctl < v2.2.
|
||||
> Current operators on v2.2+ already have HTTPS-only control planes and
|
||||
> don't need this procedure. For the steady-state TLS reference, see
|
||||
> [`docs/operator/tls.md`](../../operator/tls.md). Preserved here for
|
||||
> late upgraders coming off pre-v2.2 releases.
|
||||
|
||||
certctl's control plane is HTTPS-only as of v2.2. There is no `http` mode, no `auto` mode, no dual-listener bind, no N-release migration window. The cutover is a single step. Out-of-date agents that still point at `http://…` fail at the TCP/TLS handshake layer on first connect after the upgrade and stay `Offline` in the dashboard until their env block is updated and the fleet is rolled.
|
||||
|
||||
This doc walks operators through the cutover for the two shipped deployment topologies — docker-compose and Helm — and documents the failure modes and rollback posture explicitly.
|
||||
|
||||
For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](tls.md). This doc is the narrow "how do I upgrade" procedure.
|
||||
For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](../../operator/tls.md). This doc is the narrow "how do I upgrade" procedure.
|
||||
|
||||
## Preconditions
|
||||
|
||||
@@ -22,7 +30,7 @@ There is no schema migration tied to this release; the only at-rest state that c
|
||||
|
||||
## Procedure — docker-compose operators
|
||||
|
||||
The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)
|
||||
The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](../../operator/tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)
|
||||
|
||||
1. **Pull the HTTPS-everywhere release.** From the repo root:
|
||||
|
||||
@@ -68,7 +76,7 @@ The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init conta
|
||||
|
||||
## Procedure — Helm operators
|
||||
|
||||
The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](tls.md) for the full pattern catalog.
|
||||
The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](../../operator/tls.md) for the full pattern catalog.
|
||||
|
||||
1. **Provision cert material.** Pick one of:
|
||||
|
||||
@@ -182,13 +190,13 @@ Once every agent is `Online`, confirm a few invariants:
|
||||
- `curl -sS -o /dev/null -w "%{http_code}\n" http://localhost:8443/health` returns `000` with `Connection refused` (no HTTP listener). Plaintext is gone.
|
||||
- `openssl s_client -connect localhost:8443 -tls1_2 </dev/null` fails the handshake. TLS 1.2 is rejected.
|
||||
- `openssl s_client -connect localhost:8443 -tls1_3 </dev/null` succeeds and prints the server's SAN list. TLS 1.3 is live.
|
||||
- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](tls.md).
|
||||
- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](../../operator/tls.md).
|
||||
|
||||
Update your runbooks. Every `http://certctl.example.com` URL in internal documentation, monitoring config, and on-call playbooks should become `https://certctl.example.com` plus a CA-trust note.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`tls.md`](tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
|
||||
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough (post-HTTPS)
|
||||
- [`test-env.md`](test-env.md) — integration test environment (HTTPS-only)
|
||||
- [`tls.md`](../../operator/tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
|
||||
- [`quickstart.md`](../../getting-started/quickstart.md) — docker-compose walkthrough (post-HTTPS)
|
||||
- [`test-env.md`](../../contributor/test-environment.md) — integration test environment (HTTPS-only)
|
||||
- Milestone spec: `prompts/https-everywhere-milestone.md`
|
||||
@@ -1,8 +1,17 @@
|
||||
# Upgrading past G-1 — `CERTCTL_AUTH_TYPE=jwt` removal
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Archived 2026-05-05.** This upgrade guide applies to operators
|
||||
> upgrading past the G-1 milestone (the `CERTCTL_AUTH_TYPE=jwt` removal).
|
||||
> Current operators on post-G-1 releases don't need this. For the
|
||||
> steady-state security posture reference, see
|
||||
> [`docs/operator/security.md`](../../operator/security.md). Preserved
|
||||
> here for late upgraders.
|
||||
|
||||
If your certctl deployment currently sets `CERTCTL_AUTH_TYPE=jwt` (or `server.auth.type=jwt` in Helm), the next certctl upgrade will fail-fast at startup with a dedicated diagnostic. This guide explains why, what to switch to, and how to keep JWT/OIDC at your edge.
|
||||
|
||||
For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`upgrade-to-tls.md`](upgrade-to-tls.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.
|
||||
For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`to-tls-v2.2.md`](to-tls-v2.2.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.
|
||||
|
||||
## Why we removed it
|
||||
|
||||
@@ -98,7 +107,7 @@ services:
|
||||
# ... rest of the certctl env block unchanged
|
||||
```
|
||||
|
||||
Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).
|
||||
Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](../../operator/tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).
|
||||
|
||||
### Traefik ForwardAuth pattern (Kubernetes)
|
||||
|
||||
@@ -147,8 +156,8 @@ There is no on-disk state that changes with this upgrade — no migrations to ro
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`architecture.md`](architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
|
||||
- [`tls.md`](tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
|
||||
- [`architecture.md`](../../reference/architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
|
||||
- [`tls.md`](../../operator/tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
|
||||
- [`../deploy/helm/certctl/README.md`](../deploy/helm/certctl/README.md) — Helm-chart-flavored guidance.
|
||||
- `internal/config/config.go::ValidAuthTypes` — the single source of truth for what's accepted post-G-1.
|
||||
- `internal/repository/postgres/db.go::wrapPingError` — unrelated; pattern for runtime diagnostic of operator misconfiguration.
|
||||
@@ -1,341 +0,0 @@
|
||||
# NIST SP 800-57 Key Management Alignment
|
||||
|
||||
NIST SP 800-57 Part 1 Rev 5 (May 2020) is the authoritative US government guidance on cryptographic key management. This document maps certctl's implementation to its recommendations. certctl follows NIST guidance where applicable; this guide documents the alignment and identifies gaps for future roadmap planning.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Key Generation (Section 6.1)](#key-generation-section-61)
|
||||
2. [Key Storage and Protection (Sections 6.3, 6.4)](#key-storage-and-protection-sections-63-64)
|
||||
3. [Cryptoperiods (Section 5.3, Table 1)](#cryptoperiods-section-53-table-1)
|
||||
4. [Key States and Transitions (Section 5.2)](#key-states-and-transitions-section-52)
|
||||
5. [Algorithm Recommendations (Section 5.1, SP 800-131A)](#algorithm-recommendations-section-51-sp-800-131a)
|
||||
6. [Key Distribution and Transport (Section 6.2)](#key-distribution-and-transport-section-62)
|
||||
7. [Revocation and Compromise (NIST SP 800-57 Part 3)](#revocation-and-compromise-nist-sp-800-57-part-3)
|
||||
8. [Alignment Summary Table](#alignment-summary-table)
|
||||
9. [Gaps and Remediation Roadmap](#gaps-and-remediation-roadmap)
|
||||
- [V2 (Current)](#v2-current)
|
||||
- [V3 (Planned: 2026)](#v3-planned-2026)
|
||||
- [V5 (Planned: 2027+)](#v5-planned-2027)
|
||||
- [Post-Quantum (2027+)](#post-quantum-2027)
|
||||
10. [References](#references)
|
||||
11. [Questions or Corrections?](#questions-or-corrections)
|
||||
|
||||
## Key Generation (Section 6.1)
|
||||
|
||||
certctl generates certificate keys on agent infrastructure using Go's `crypto/rand` for entropy, backed by `/dev/urandom` on Linux and `CryptGenRandom` on Windows. Key generation happens as follows:
|
||||
|
||||
**Agent-Side Key Generation (Production Default)**
|
||||
- Agents generate ECDSA P-256 key pairs per certificate using `crypto/ecdsa` + `crypto/elliptic` (Go stdlib)
|
||||
- Key generation triggered by `AwaitingCSR` job state in renewal/issuance workflows
|
||||
- Agent creates Certificate Signing Request (CSR) with `x509.CreateCertificateRequest`, signed with the agent's private key
|
||||
- Only the CSR crosses the network to the control plane; private key material never leaves the agent
|
||||
- Configuration: `CERTCTL_KEYGEN_MODE=agent` (default, production)
|
||||
|
||||
**Server-Side Key Generation (Demo Only)**
|
||||
- Available for development and testing via `CERTCTL_KEYGEN_MODE=server`
|
||||
- Explicitly logged as a warning at startup: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only"
|
||||
- Docker Compose demo uses server mode for backward compatibility
|
||||
- Not recommended for production; agent mode is the secure default
|
||||
|
||||
**Entropy Source**
|
||||
- `crypto/rand` provides cryptographically secure random bytes
|
||||
- On Linux: backed by `/dev/urandom` via `getrandom()` syscall
|
||||
- On Windows: backed by `CryptGenRandom()` (now `BCryptGenRandom()`)
|
||||
- Meets NIST SP 800-90B requirements for entropy generation
|
||||
|
||||
## Key Storage and Protection (Sections 6.3, 6.4)
|
||||
|
||||
certctl implements tiered key storage with different protection profiles based on key purpose.
|
||||
|
||||
**Agent Private Keys**
|
||||
- Stored on agent filesystem at `CERTCTL_KEY_DIR` (default: `/var/lib/certctl/keys`)
|
||||
- File permissions: 0600 (read/write by agent process only, no world/group access)
|
||||
- One PEM file per certificate, organized by certificate ID
|
||||
- Accessible only to the agent process; isolated from other processes
|
||||
- For container deployments: use Docker volumes with restricted permissions (`-v /var/lib/certctl/keys:0600`)
|
||||
|
||||
**Issuing CA Keys (Local CA Connector)**
|
||||
- Loaded from disk at server startup via `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH` env vars
|
||||
- Supports RSA (PKCS#1, PKCS#8) and ECDSA (SEC1, PKCS#8) key formats
|
||||
- Validates certificate constraints before use:
|
||||
- `IsCA=true` flag present
|
||||
- `KeyUsageCertSign` extension set
|
||||
- Valid certificate chain (for sub-CA mode)
|
||||
- Keys held in memory during server runtime (no on-disk caching after load)
|
||||
- Cleared from memory only on server shutdown
|
||||
|
||||
**Sub-CA Mode (Enterprise Integration)**
|
||||
- CA certificate and key signed by upstream enterprise root (e.g., Active Directory Certificate Services)
|
||||
- Certctl acts as subordinate CA, inheriting issuer DN from upstream CA
|
||||
- All issued certificates chain to enterprise trust anchor
|
||||
- CA key protection inherits upstream root's key management practices
|
||||
- Configured via: `CERTCTL_CA_CERT_PATH=/path/to/ca.crt` and `CERTCTL_CA_KEY_PATH=/path/to/ca.key`
|
||||
|
||||
**NIST Gap: HSM Storage**
|
||||
NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for certctl Pro (V3), enabling integration with:
|
||||
- AWS CloudHSM
|
||||
- Azure Dedicated HSM
|
||||
- Thales Luna, Gemalto SafeNet, YubiHSM (on-premises)
|
||||
- PKCS#11-compatible devices
|
||||
|
||||
## Cryptoperiods (Section 5.3, Table 1)
|
||||
|
||||
NIST recommends cryptoperiods (key validity durations) based on key type and security requirements. certctl enforces cryptoperiods through certificate profiles and renewal policies.
|
||||
|
||||
**Certificate Profile Enforcement**
|
||||
- Certificate profiles (M11a) define `max_ttl` constraint per enrollment profile
|
||||
- All certificates issued through a profile cannot exceed the profile's max_ttl
|
||||
- Profile configuration example:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"name": "Production Web Certs",
|
||||
"max_ttl_seconds": 31536000, // 1 year max
|
||||
"allowed_key_algorithms": ["ECDSA_P256"],
|
||||
"required_sans": ["example.com"]
|
||||
}
|
||||
```
|
||||
|
||||
**Renewal Thresholds**
|
||||
- Renewal policies with configurable `alert_thresholds_days`: `[30, 14, 7, 0]` (days before expiry)
|
||||
- Background scheduler checks renewal eligibility every 1 hour
|
||||
- Certificates transitioned to `Expiring` status at 30 days, `Expired` at 0 days
|
||||
- Renewal workflow can be triggered manually or automatically
|
||||
|
||||
**NIST Cryptoperiod Recommendations vs certctl Implementation**
|
||||
|
||||
| Key Type | NIST Recommendation | certctl Implementation |
|
||||
|----------|---------------------|------------------------|
|
||||
| CA signing key | 3–10 years | Configured via CA certificate not-after date; inheritable from upstream CA in sub-CA mode |
|
||||
| End-entity web server cert | 1–3 years (trending shorter) | Profile `max_ttl` configurable; ACME issuer typically 90 days; SC-081v3 mandating 47 days by 2029 |
|
||||
| Code signing cert | 2–8 years | Profile enforcement via `max_ttl`; not primary certctl use case |
|
||||
| Short-lived credentials | < 1 hour recommended | Profile TTL < 1 hour; exempt from CRL/OCSP (expiry is sufficient revocation); auto-expiry on scheduler tick |
|
||||
| OCSP signing key | 1–2 years | Embedded OCSP responder uses issuing CA key (same period as issuer) or delegated signing cert |
|
||||
| TLS/SSL interoperability cert | 1–2 years | Trending 1 year or less; certctl's ACME/sub-CA/step-ca issuers all support short periods |
|
||||
|
||||
## Key States and Transitions (Section 5.2)
|
||||
|
||||
NIST defines lifecycle states for keys: pre-activation, active, suspended, deactivated, compromised, and destroyed. certctl maps these to certificate and job states:
|
||||
|
||||
| NIST Key State | certctl Equivalent | Implementation |
|
||||
|---|---|---|
|
||||
| **Pre-activation** | `Pending` job state / `AwaitingCSR` | Job created but key not yet generated; awaiting agent CSR submission (agent-mode) or server keygen (demo mode) |
|
||||
| **Active** | Certificate status `Active` | Cert deployed to targets and in use; within validity period (not before < now < not after) |
|
||||
| **Suspended** | Job state `AwaitingApproval` | Interactive approval holds deployment job pending human review; resumes on approval or cancels on rejection |
|
||||
| **Deactivated** | Certificate status `Expired` | Past not-after date; auto-transitioned by scheduler every 2 minutes; renewal eligible |
|
||||
| **Compromised** | Certificate status `Revoked` | Issued via `POST /api/v1/certificates/{id}/revoke` with RFC 5280 revocation reason |
|
||||
| **Destroyed** | Archived (implementation detail) | Operator responsibility; certctl retains all certs in audit trail for compliance; no destructive deletion API |
|
||||
|
||||
**State Transition Audit Trail**
|
||||
All transitions logged to immutable `audit_events` table with:
|
||||
- Event type (e.g., `certificate_revoked`, `renewal_job_completed`)
|
||||
- Actor (authenticated user or agent ID)
|
||||
- Timestamp (RFC3339)
|
||||
- Resource (certificate ID)
|
||||
- Reason (revocation reason code, approval reason, etc.)
|
||||
- HTTP method, path, status (for API calls)
|
||||
|
||||
Example audit entry for revocation:
|
||||
```json
|
||||
{
|
||||
"id": "ae-2024-0615",
|
||||
"event_type": "certificate_revoked",
|
||||
"actor": "ops-alice@example.com",
|
||||
"timestamp": "2024-06-15T14:23:00Z",
|
||||
"resource_id": "cert-web-prod-2024",
|
||||
"resource_type": "certificate",
|
||||
"description": "Revoked: reason=keyCompromise",
|
||||
"body_hash": "sha256:a1b2c3d..."
|
||||
}
|
||||
```
|
||||
|
||||
## Algorithm Recommendations (Section 5.1, SP 800-131A)
|
||||
|
||||
NIST SP 800-131A Rev 2 (January 2024) categorizes cryptographic algorithms as Approved, Conditionally Approved, or Disallowed. certctl implements only NIST-approved algorithms:
|
||||
|
||||
| Algorithm | NIST Status | certctl Support | Notes |
|
||||
|-----------|-------------|-----------------|-------|
|
||||
| **ECDSA P-256** | Approved (128-bit security strength) | Default for agent-side keygen | Meets NIST curve requirements (FIPS 186-4) |
|
||||
| **ECDSA P-384** | Approved (192-bit security strength) | Supported via profile configuration | Higher security margin; slower than P-256 |
|
||||
| **ECDSA P-521** | Approved (256-bit security strength) | Supported via profile configuration | Rarely needed; overkill for TLS |
|
||||
| **RSA 2048** | Approved minimum (112-bit security, transitioning) | Supported via all issuers | Deprecated path; migrate to 3072+ by 2030 per NIST |
|
||||
| **RSA 3072** | Approved (128-bit security) | Supported via all issuers | Recommended minimum for long-term security |
|
||||
| **RSA 4096** | Approved (192-bit security) | Supported via all issuers | Supported but slower; overkill for most TLS |
|
||||
| **SHA-256** | Approved | Used throughout | CSR signing, certificate fingerprints, audit body hashing, CRL/OCSP signing |
|
||||
| **SHA-384** | Approved (192-bit) | Supported where algorithm selection available | Used in some CA signing scenarios |
|
||||
| **SHA-512** | Approved (256-bit) | Supported where algorithm selection available | Rarely needed; SHA-256 suffices for most use cases |
|
||||
| **SHA-1** | Deprecated | Not used in certctl | Browsers reject SHA-1 certs; certctl never generates them |
|
||||
|
||||
**Algorithm Enforcement via Profiles**
|
||||
Certificate profiles enforce allowed key algorithms:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"allowed_key_algorithms": ["ECDSA_P256", "ECDSA_P384", "RSA3072"]
|
||||
}
|
||||
```
|
||||
|
||||
**Post-Quantum Cryptography (Tracking)**
|
||||
NIST has finalized PQC standards (FIPS 204, FIPS 205) in August 2024:
|
||||
- **ML-KEM** (Kyber): Approved key encapsulation mechanism
|
||||
- **ML-DSA** (Dilithium): Approved digital signature algorithm
|
||||
- **SLH-DSA** (SPHINCS+): Approved stateless hash-based signature scheme
|
||||
|
||||
certctl will track NIST's PQC roadmap and plan integration when hybrid PQC+classical certificate formats reach browser/infrastructure support. Currently, pure PQC certificates are not widely interoperable.
|
||||
|
||||
## Key Distribution and Transport (Section 6.2)
|
||||
|
||||
NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize exposure during transit. certctl implements a zero-transmission-of-private-keys model:
|
||||
|
||||
**Private Key Distribution**
|
||||
- Agent-side keygen model: Private keys never leave agent infrastructure
|
||||
- CSR transmitted over HTTPS (TLS 1.2+) with mutual TLS optional
|
||||
- API key authentication via `Authorization: Bearer <api-key>` header
|
||||
- All API calls logged to immutable audit trail
|
||||
|
||||
**Signed Certificate Distribution**
|
||||
- Certificates (public component) distributed via `GET /agents/{id}/work` over HTTPS
|
||||
- Work endpoint enriches deployment jobs with certificate PEM and metadata
|
||||
- Certificate PEM is idempotent (same cert always returns same bytes)
|
||||
|
||||
**Target Deployment**
|
||||
- Deployment to targets via local filesystem write (NGINX, Apache, HAProxy)
|
||||
- No network transmission of private keys to targets
|
||||
- Agents read local private key from `CERTCTL_KEY_DIR` on deployment
|
||||
- For appliances without agents (F5 BIG-IP, IIS), proxy agent pattern:
|
||||
- Proxy agent runs in same trust zone as appliance
|
||||
- Proxy agent holds target API credentials (iControl, WinRM)
|
||||
- Control plane never communicates with appliance directly
|
||||
- Deployment request includes certificate and proxy agent ID
|
||||
- Proxy agent executes deployment via appliance API
|
||||
|
||||
**Revocation Distribution**
|
||||
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
|
||||
- Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
|
||||
- 24-hour validity period
|
||||
- Includes all revoked serials, reasons, and revocation timestamps
|
||||
- Served unauthenticated so relying parties without certctl API credentials can fetch it
|
||||
- Subject to URL caching; OCSP preferred for real-time revocation
|
||||
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
|
||||
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
|
||||
- Signed by issuing CA (or delegated OCSP signing cert)
|
||||
- Responds with good/revoked/unknown status
|
||||
- Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
|
||||
- Real-time, more bandwidth-efficient than CRL polling
|
||||
|
||||
## Revocation and Compromise (NIST SP 800-57 Part 3)
|
||||
|
||||
NIST SP 800-57 Part 3 covers revocation (Section 2.5) when keys are suspected compromised or no longer needed. certctl implements comprehensive revocation infrastructure:
|
||||
|
||||
**Revocation API**
|
||||
- Endpoint: `POST /api/v1/certificates/{id}/revoke`
|
||||
- Request body:
|
||||
```json
|
||||
{
|
||||
"reason": "keyCompromise",
|
||||
"reason_text": "Private key exposed in log file"
|
||||
}
|
||||
```
|
||||
- Supports all 8 RFC 5280 revocation reason codes:
|
||||
- `unspecified` — no specific reason provided
|
||||
- `keyCompromise` — private key suspected compromised
|
||||
- `caCompromise` — issuing CA key compromised
|
||||
- `affiliationChanged` — subject org/affiliation changed
|
||||
- `superseded` — cert superseded by newer cert
|
||||
- `cessationOfOperation` — key no longer in use
|
||||
- `certificateHold` — temporary hold (rarely used)
|
||||
- `privilegeWithdrawn` — subject authorization withdrawn
|
||||
|
||||
**Revocation Recording**
|
||||
- Certificate status updated to `Revoked`
|
||||
- Entry recorded in `certificate_revocations` table with:
|
||||
- Certificate serial number
|
||||
- Revocation timestamp
|
||||
- Revocation reason code
|
||||
- Issuer ID
|
||||
- Idempotent (revoking an already-revoked cert is safe; returns 200 OK)
|
||||
|
||||
**Issuer Notification (Best-Effort)**
|
||||
- Control plane calls `issuer.RevokeCertificate(ctx, serial, reason)` on issuing connector
|
||||
- Failure does not block the revocation (async, logged, retried)
|
||||
- Supported issuers:
|
||||
- Local CA: generates new CRL immediately
|
||||
- ACME: submits revocation to ACME server (RFC 8555 Section 7.6)
|
||||
- step-ca: calls `/revoke` API
|
||||
- OpenSSL: executes user-provided revocation script
|
||||
|
||||
**Revocation Notifications**
|
||||
- Notifiers triggered after revocation recorded: Slack, Teams, PagerDuty, OpsGenie, email, webhook
|
||||
- Message includes certificate common name, issuer, reason, actor, timestamp
|
||||
- Delivery is asynchronous and retried on failure
|
||||
|
||||
**CRL and OCSP Distribution**
|
||||
- CRL updated on every revocation (or scheduled refresh for non-issued revocations)
|
||||
- OCSP responder queries revocation table in real-time
|
||||
- Short-lived certificate exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
**Bulk Revocation for Large-Scale Compromise Response** (V2.2) — NIST SP 800-57 Part 3 emphasizes rapid revocation when keys are compromised. `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria (profile, owner, agent, issuer) in a single operation. This enables operators to execute fleet-wide revocation for key compromise events affecting multiple certificates. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring every certificate is recorded in the audit trail with the incident reason.
|
||||
|
||||
**Revocation Audit Trail**
|
||||
All revocation events logged:
|
||||
- Event type: `certificate_revoked` or `bulk_revocation_initiated` (for fleet operations)
|
||||
- Actor: authenticated user or service
|
||||
- Reason code: RFC 5280 enum (or incident justification for bulk operations)
|
||||
- Timestamp: RFC3339
|
||||
- Issuer notification status: success or error reason
|
||||
- Filter criteria: profile_id, owner_id, agent_id, issuer_id (for bulk revocation)
|
||||
|
||||
## Alignment Summary Table
|
||||
|
||||
| NIST SP 800-57 Area | Status | Coverage | Notes |
|
||||
|---|---|---|---|
|
||||
| **Key Generation** | ✅ Aligned | 100% | Agent-side ECDSA P-256 using crypto/rand; server mode flagged as demo-only |
|
||||
| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V3 Pro |
|
||||
| **Cryptoperiods** | ✅ Aligned | 100% | Profile-enforced max_ttl; threshold-based renewal alerting |
|
||||
| **Key States** | ✅ Aligned | 100% | Full lifecycle tracking with immutable audit trail |
|
||||
| **Algorithms** | ✅ Aligned | 100% | NIST-approved algorithms only; post-quantum tracking in progress |
|
||||
| **Key Distribution** | ✅ Aligned | 100% | Private keys never transmitted; CSR/cert over TLS; agent-local deployment |
|
||||
| **Revocation** | ✅ Aligned | 100% | CRL, OCSP, all RFC 5280 reason codes; real-time updates |
|
||||
|
||||
## Gaps and Remediation Roadmap
|
||||
|
||||
### V2 (Current)
|
||||
- [x] Agent-side key generation
|
||||
- [x] Profile-enforced cryptoperiods
|
||||
- [x] CRL and OCSP distribution
|
||||
- [x] RFC 5280 revocation support
|
||||
- [x] Immutable audit trail
|
||||
|
||||
### V2.2 (Planned: 2026)
|
||||
- Bulk revocation by profile/owner/agent/issuer (fleet-level revocation for incident response)
|
||||
|
||||
### V3 (Planned: 2026)
|
||||
- Role-based access control (limit revocation/approval to authorized operators)
|
||||
|
||||
### V3 Pro (Planned)
|
||||
- HSM support for CA key storage and agent key storage (TPM 2.0, PKCS#11)
|
||||
- FIPS 140-2/3 validated crypto module (BoringCrypto build or external FIPS library)
|
||||
- Key destruction API (explicit secure erasure of agent keys)
|
||||
- Key escrow / recovery mechanism (backup encrypted private keys for disaster recovery)
|
||||
|
||||
### Post-Quantum (2027+)
|
||||
- ML-KEM and ML-DSA support when browser/TLS ecosystem supports hybrid certificates
|
||||
- Migration path documentation (how to transition existing RSA certs to PQC)
|
||||
|
||||
## References
|
||||
|
||||
- NIST SP 800-57 Part 1 Rev 5 (May 2020): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf
|
||||
- NIST SP 800-131A Rev 2 (January 2024): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
|
||||
- FIPS 186-4 (Digital Signature Standard): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
|
||||
- RFC 5280 (X.509 PKI Certificate and CRL Profile): https://tools.ietf.org/html/rfc5280
|
||||
- RFC 8555 (Automatic Certificate Management Environment): https://tools.ietf.org/html/rfc8555
|
||||
- NIST FIPS 204 (ML-DSA): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.204.pdf
|
||||
- NIST FIPS 205 (ML-KEM): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.205.pdf
|
||||
|
||||
## Questions or Corrections?
|
||||
|
||||
This document reflects certctl's implementation as of March 2026. For the latest code, refer to:
|
||||
- Key generation: `cmd/agent/main.go` (agent keygen) and `internal/service/renewal.go` (server keygen)
|
||||
- Key storage: `internal/config/config.go` (CERTCTL_KEY_DIR, CERTCTL_CA_CERT_PATH)
|
||||
- Revocation: `internal/service/revocation.go` and `internal/api/handler/certificates.go`
|
||||
- Audit trail: `internal/api/middleware/audit.go`
|
||||
@@ -1,825 +0,0 @@
|
||||
# PCI-DSS 4.0 Compliance Mapping
|
||||
|
||||
This guide maps certctl's existing capabilities to PCI-DSS 4.0 requirements relevant to TLS certificate and cryptographic key management. It is **not a compliance attestation** — a qualified security assessor (QSA) must evaluate your organization's complete control environment. Rather, this document helps you understand which PCI-DSS control objectives certctl supports and where operator responsibility lies.
|
||||
|
||||
Organizations subject to PCI-DSS typically need to demonstrate control over certificate issuance, renewal, rotation, revocation, and key management. Certctl automates the technical controls for certificate lifecycle; compliance depends on how you deploy, monitor, and audit it.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [Requirement 4: Protect Data in Transit](#requirement-4-protect-data-in-transit)
|
||||
- [4.2.1 — Strong Cryptography for Transmission](#421--strong-cryptography-for-transmission)
|
||||
- [4.2.2 — Certificate Inventory and Validation](#422--certificate-inventory-and-validation)
|
||||
3. [Requirement 3: Protect Stored Cardholder Data (Key Management)](#requirement-3-protect-stored-cardholder-data-key-management)
|
||||
- [3.6 — Cryptographic Key Documentation](#36--cryptographic-key-documentation)
|
||||
- [3.7 — Key Lifecycle Procedures](#37--key-lifecycle-procedures)
|
||||
4. [Requirement 8: Identify and Authenticate](#requirement-8-identify-and-authenticate)
|
||||
- [8.3 — Strong Authentication](#83--strong-authentication)
|
||||
- [8.6 — Application Account Management](#86--application-account-management)
|
||||
5. [Requirement 10: Log and Monitor](#requirement-10-log-and-monitor)
|
||||
- [10.2 — Implement Automated Audit Logging](#102--implement-automated-audit-logging)
|
||||
- [10.3 — Protect Audit Trail](#103--protect-audit-trail)
|
||||
- [10.4 — Promptly Review and Address Audit Trail Exceptions](#104--promptly-review-and-address-audit-trail-exceptions)
|
||||
- [10.7 — Retain and Protect Audit Trail History](#107--retain-and-protect-audit-trail-history)
|
||||
6. [Requirement 6: Develop and Maintain Secure Systems and Applications](#requirement-6-develop-and-maintain-secure-systems-and-applications)
|
||||
- [6.3.1 — Security Coding Practices](#631--security-coding-practices)
|
||||
- [6.5.10 — Broken Authentication and Cryptography Prevention](#6510--broken-authentication-and-cryptography-prevention)
|
||||
7. [Requirement 7: Restrict Access by Business Need-to-Know](#requirement-7-restrict-access-by-business-need-to-know)
|
||||
- [7.2 — Implement Access Control](#72--implement-access-control)
|
||||
8. [Evidence Summary Table](#evidence-summary-table)
|
||||
9. [Operator Responsibilities](#operator-responsibilities)
|
||||
10. [V3 Enhancements for PCI-DSS](#v3-enhancements-for-pci-dss)
|
||||
11. [Next Steps for Compliance](#next-steps-for-compliance)
|
||||
12. [Questions?](#questions)
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
Your QSA will request evidence that your certificate and key management systems meet specific PCI-DSS 4.0 requirements. For each applicable requirement, this guide identifies:
|
||||
|
||||
1. **Which certctl features support the control** — API endpoints, database tables, background processes
|
||||
2. **What evidence you can produce** — audit logs, dashboard metrics, API queries, deployment configs
|
||||
3. **Operator responsibilities** — what you must do outside certctl (policy, monitoring, access control)
|
||||
4. **Status** — Available (v1.0 shipped), Planned (future release), or Operator Responsibility (outside scope)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 4: Protect Data in Transit
|
||||
|
||||
**Objective**: Ensure strong cryptography is used to protect sensitive data during transmission.
|
||||
|
||||
### 4.2.1 — Strong Cryptography for Transmission
|
||||
|
||||
**Requirement**: Use appropriate and current cryptographic algorithms for all TLS and SSH connections protecting card data in transit.
|
||||
|
||||
**certctl Support**:
|
||||
- **Automated TLS certificate lifecycle** — Certctl issues TLS certificates to NGINX, Apache HAProxy targets via `POST /api/v1/deployments`. Certificates include RSA 2048-bit and ECDSA P-256 key types (configurable per profile, M11a).
|
||||
- **Control plane TLS enforcement** — All REST API endpoints served exclusively over HTTPS. Agent-to-server heartbeat and work polling use TLS. No plaintext protocol options.
|
||||
- **Issuer connector key negotiation** — ACME v2 (Let's Encrypt, ZeroSSL) validates issuer cryptography. Local CA enforces RSA/ECDSA constraints. step-ca integration ensures Smallstep's cryptography standards.
|
||||
- **Certificate profiles** (M11a) document allowed key types and minimum key sizes per environment (development, production, cardholder-network).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate inventory via `GET /api/v1/certificates` with key algorithm and size (serial JSON).
|
||||
- Issued certificate details showing RSA 2048+ or ECDSA P-256 for all deployed certificates.
|
||||
- Audit trail (`GET /api/v1/audit`) showing issuer connector selection and certificate profile assignment per certificate.
|
||||
- Target deployment logs showing TLS certificate installation on NGINX/Apache/HAProxy.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure certificate profiles for your environments with approved key algorithms.
|
||||
- Audit cipher suite configuration on deployed targets (certctl deploys certs; you verify target TLS settings).
|
||||
- Periodically review `CERTCTL_KEYGEN_MODE` — must be `agent` in production (never `server`).
|
||||
- Monitor issuer connector configuration to ensure issuers meet your cryptography standards.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 4.2.2 — Certificate Inventory and Validation
|
||||
|
||||
**Requirement**: Ensure all TLS/SSL certificates used for data transmission are valid, current, and meet required cryptographic standards.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Managed Certificate Inventory** — Full CRUD API (`/api/v1/certificates`) with sortable, filterable list. Fields: common name, SANs, subject, issuer, serial number, key type/size, not-before/after dates, issuer ID, profile ID, owner, team, status (Active/Expiring/Expired/Revoked).
|
||||
|
||||
- **Filesystem Certificate Discovery** (M18b) — Agents scan configured directories (`CERTCTL_DISCOVERY_DIRS` env var) for existing PEM/DER certificates every 6 hours and on startup. Control plane deduplicates by SHA-256 fingerprint. Three triage statuses: Unmanaged (not managed by certctl), Managed (linked to a managed certificate), Dismissed (operator-marked as out-of-scope).
|
||||
- API endpoints:
|
||||
- `GET /api/v1/discovered-certificates?status=Unmanaged` — find orphaned certs
|
||||
- `GET /api/v1/discovery-summary` — aggregate counts by status
|
||||
- `POST /api/v1/discovered-certificates/{id}/claim` — link to managed certificate
|
||||
- `POST /api/v1/discovered-certificates/{id}/dismiss` — mark out-of-scope
|
||||
|
||||
- **Expiration Threshold Alerting** — Renewal policies support `alert_thresholds_days` (default 30, 14, 7, 0). Background scheduler evaluates daily; certificates transition to Expiring/Expired status automatically. Notifications sent to owners via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
|
||||
|
||||
- **Revocation Infrastructure** (M15a, M15b, M-006):
|
||||
- Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
|
||||
- CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` — DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
|
||||
- Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
- **Stats API** (M14) — Real-time visibility:
|
||||
- `GET /api/v1/stats/summary` — total certs, by status, by issuer
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration distribution (weekly buckets)
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance job success rates
|
||||
- `GET /api/v1/certificates` with `?sort=-notAfter&fields=id,commonName,notAfter,status` — sparse, sorted inventory
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
|
||||
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
|
||||
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
|
||||
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
|
||||
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
|
||||
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate storage locations (e.g., `/etc/nginx/certs`, `/etc/apache2/certs`, `/usr/local/share/ca-certificates`).
|
||||
- Regularly triage discovered certificates: `GET /api/v1/discovered-certificates?status=Unmanaged`, claim or dismiss each.
|
||||
- Set renewal policies for all certificate profiles with appropriate `alert_thresholds_days` (recommendation: 30, 14, 7, 0).
|
||||
- Monitor expiration dashboard and respond to Expiring alerts before certificates expire.
|
||||
- Verify that issued certificates meet your organization's cryptography standards (key type, key size, SANs).
|
||||
- Test CRL/OCSP endpoints periodically to confirm they are reachable and signed correctly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, discovery M18b, revocation M15a/M15b)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 3: Protect Stored Cardholder Data (Key Management)
|
||||
|
||||
**Objective**: Render cardholder data unreadable anywhere it is stored; protect cryptographic keys used to encrypt data.
|
||||
|
||||
### 3.6 — Cryptographic Key Documentation
|
||||
|
||||
**Requirement**: Document and implement all key management processes and procedures covering generation, storage, archival, destruction, and change; protect cryptographic keys; and restrict access to keys to the minimum required.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Profile Documentation** (M11a) — Named profiles define allowed key types, maximum TTL, and allowed EKUs per use case. Each profile is a documented policy:
|
||||
```json
|
||||
{
|
||||
"id": "p-web-tls",
|
||||
"name": "Web TLS Production",
|
||||
"allowed_key_types": ["RSA_2048", "ECDSA_P256"],
|
||||
"max_ttl_seconds": 31536000,
|
||||
"require_sans": true,
|
||||
"description": "Production TLS certs for external web services"
|
||||
}
|
||||
```
|
||||
|
||||
- **Owner and Team Tracking** (M11b) — Every certificate is assigned an owner (person + email) and optionally a team. This documents key responsibility and escalation paths.
|
||||
|
||||
- **Issuer Connector Specification** — Configuration and API endpoints document which CA and protocol issues each certificate:
|
||||
- `GET /api/v1/issuers/{id}` returns issuer type (local-ca, acme, step-ca, openssl), CA endpoint, authentication method, constraints
|
||||
- Each issuer type has documented key handling (e.g., Local CA loads CA key from `CERTCTL_CA_CERT_PATH`, step-ca via JWK provisioner)
|
||||
|
||||
- **Immutable Audit Trail** (M19) — Every certificate lifecycle event recorded in append-only `audit_events` table:
|
||||
- `certificate_issued` — when certificate created, by whom, issuer type, profile
|
||||
- `certificate_renewed` — when renewed, by whom, issuer
|
||||
- `certificate_revoked` — when revoked, by whom, RFC 5280 reason code
|
||||
- `certificate_deployed` — when deployed to target, by agent, target type
|
||||
- Query: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}`
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate profiles: `GET /api/v1/profiles` showing documented key types, max TTLs, constraints per environment.
|
||||
- Certificate-to-owner mapping: `GET /api/v1/certificates` with owner/team fields.
|
||||
- Issuer configuration audit: `GET /api/v1/issuers` showing CA endpoints, key storage paths, auth methods.
|
||||
- Audit trail for a certificate: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete lifecycle.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Define and document certificate profiles for each environment and use case.
|
||||
- Assign owner and team to each certificate via API or dashboard.
|
||||
- Document issuer connector configuration (CA endpoint, auth method, key storage location).
|
||||
- Maintain baseline audit trail exports for compliance evidence.
|
||||
- Establish certificate retirement policy (how long to retain audit records after certificate expiry/revocation).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 3.7 — Key Lifecycle Procedures
|
||||
|
||||
**Requirement**: Generate, store, protect, access, and destroy cryptographic keys used to encrypt data in transit or at rest.
|
||||
|
||||
This requirement covers key generation, storage, rotation, and destruction. Certctl addresses the certificate/TLS key portion (not symmetric encryption keys used for cardholder data at rest — those are outside scope).
|
||||
|
||||
#### 3.7.1 — Key Generation
|
||||
|
||||
**Requirement**: Generate new keys using strong cryptography.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Generation** (M8) — Production mode (default `CERTCTL_KEYGEN_MODE=agent`):
|
||||
- Agents generate ECDSA P-256 key pairs using `crypto/ecdsa` + `crypto/elliptic.P256()` + `crypto/rand` (cryptographically secure random).
|
||||
- Key generation happens **only on the agent**, never on the control plane.
|
||||
- Agent submits Certificate Signing Request (CSR) with public key to control plane via `POST /api/v1/agents/{id}/csr`.
|
||||
- Issued certificate is returned; private key remains on agent at `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`).
|
||||
|
||||
- **Server-Side Fallback** (demo/development only) — `CERTCTL_KEYGEN_MODE=server`:
|
||||
- Control plane generates RSA 2048-bit or ECDSA P-256 keys using `crypto/rand` + `crypto/rsa`.
|
||||
- Server signs CSR and stores the private key in the certificate version record for agent deployment. **Security note:** In server keygen mode, the control plane holds private keys — this is why agent keygen mode is the recommended default for production.
|
||||
- **Must not be used in production.** Explicit warning logged: `server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only`
|
||||
|
||||
- **Issuer-Specific Key Negotiation**:
|
||||
- **ACME (Let's Encrypt, ZeroSSL)**: Let's Encrypt controls key types; certctl requests ECDSA P-256 by default.
|
||||
- **Local CA**: Supports RSA 2048+, ECDSA (P-256, P-384), PKCS#8 format. Key algorithm inherited from CA cert or specified via profile.
|
||||
- **step-ca**: Smallstep's provisioner defines key type; certctl respects server constraints.
|
||||
- **OpenSSL / Custom CA**: User-provided signing script; key type depends on CA backend.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment configuration: `CERTCTL_KEYGEN_MODE=agent` in production (verify in `docker-compose.yml`, Kubernetes manifests, or systemd units).
|
||||
- Agent log excerpt showing key generation: Go `crypto/ecdsa.GenerateKey(elliptic.P256())` via agent process logs with CSR submission timestamp.
|
||||
- Certificate CSR audit: `GET /api/v1/audit?type=certificate_issued` showing CSR fingerprint (SHA-256 hash of CSR PEM).
|
||||
- Renewal job logs showing agent-submitted CSR, not server-generated key.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enforce `CERTCTL_KEYGEN_MODE=agent` in all production deployments.** Never use `server` mode outside demos.
|
||||
- Verify agent hardware is adequately isolated (crypto/rand relies on OS `/dev/urandom` quality).
|
||||
- Monitor `CERTCTL_KEY_DIR` on agents for unauthorized file access (use OS-level file audit if available).
|
||||
- Backup agent key directory (`/var/lib/certctl/keys`) as part of disaster recovery procedure.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.2 — Key Storage and Access Control
|
||||
|
||||
**Requirement**: Restrict cryptographic key access to the minimum required and protect keys from unauthorized access.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Storage** (M8) — Private keys written to `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`):
|
||||
- File permissions: `0600` (readable/writable by agent process owner only).
|
||||
- Filename convention: one file per certificate (e.g., `web-tls-prod.key`, `api-service.key`).
|
||||
- No key data passed over the network between agent and control plane (CSR only).
|
||||
- Keys used locally by agent to sign TLS handshakes, never transmitted to control plane or other systems.
|
||||
|
||||
- **Control Plane Key Storage** — Sensitive credentials managed via environment variables or `.env` files:
|
||||
- CA private key path: `CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH` (for Local CA sub-CA mode).
|
||||
- ACME account key: embedded in ACME issuer config (not stored separately; ACME library handles in memory).
|
||||
- step-ca provisioner key: `CERTCTL_STEPCA_KEY_PATH` env var (path to JWK private key file, loaded into memory during runtime).
|
||||
- API keys: `CERTCTL_API_KEY` (SHA-256 hashed in database, plaintext never stored).
|
||||
- Database credentials: `CERTCTL_DATABASE_URL` in `.env` file, not in source code.
|
||||
|
||||
- **Docker Compose Credential Management** — `.env` file (git-ignored) holds all secrets:
|
||||
```bash
|
||||
CERTCTL_API_KEY=sk-test-...
|
||||
CERTCTL_DATABASE_URL=postgres://user:pass@db:5432/certctl
|
||||
CERTCTL_CA_KEY_PATH=/run/secrets/ca.key
|
||||
```
|
||||
Credentials never in `docker-compose.yml` or Dockerfile.
|
||||
|
||||
- **Kubernetes Secrets** (operator responsibility) — Deploy control plane with:
|
||||
```yaml
|
||||
env:
|
||||
- name: CERTCTL_DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: database-url
|
||||
- name: CERTCTL_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: api-key
|
||||
```
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Agent key directory listing (without keys): `ls -la /var/lib/certctl/keys` (shows file count, permissions, timestamps).
|
||||
- Deployment manifest (`docker-compose.yml` or Kubernetes YAML) showing secrets via env var or Secret object (not inline).
|
||||
- `.env` file (do not share contents, only confirm existence and git-ignore status).
|
||||
- API key hash verification: `GET /api/v1/auth/check` with API key, verifying hash matching without plaintext exposure.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Store `.env` and credential files outside version control.** Verify `.gitignore` includes `.env`, `*.key`, `ca.key`, etc.
|
||||
- **Restrict file system access to `/var/lib/certctl/keys` on agents** via OS-level permissions (Linux: `chmod 0700`, owned by agent user).
|
||||
- **Limit CA key file read access** — `CERTCTL_CA_KEY_PATH` should be readable only by certctl server process (OS permissions).
|
||||
- **Rotate API keys periodically** (recommendation: annually or when personnel changes). No audit trail for API key rotation (outside certctl scope).
|
||||
- **Backup private key stores** (agent key dirs, CA key file) as part of disaster recovery. Encrypt backups at rest.
|
||||
- **Monitor access logs** to `/var/lib/certctl/keys` and CA key file location (use OS audit or file integrity monitoring).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.3 — Key Rotation
|
||||
|
||||
**Requirement**: Rotate cryptographic keys upon expiration or compromise.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Automated Certificate Renewal** — Renewal policies trigger certificate renewal automatically:
|
||||
- Background scheduler checks every 60 minutes (configurable via `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL`).
|
||||
- For each policy, evaluates all managed certificates: if `(not-after - now) <= policy.renewal_threshold_days`, trigger renewal.
|
||||
- Renewal job created in AwaitingCSR state; agent receives work, generates new key pair, submits new CSR.
|
||||
- Issuer connector signs new CSR with new key; old key discarded by agent after new certificate installed.
|
||||
- New certificate deployed to target via deployment job.
|
||||
|
||||
- **Expiration-Based Rotation** — Certificate profiles (M11a) define `max_ttl_seconds` (e.g., 31536000 for 1 year, 3600 for short-lived certs):
|
||||
- Short-lived certificates (TTL < 1 hour) rotate every deployment cycle, providing defense-in-depth (RFC 5280 revocation not needed).
|
||||
- Longer-lived certs (90/180/365 days) rotated via renewal policy thresholds (30/14/7 day alerts).
|
||||
|
||||
- **Renewal Audit Trail** — Every renewal recorded:
|
||||
- `GET /api/v1/audit?type=certificate_renewed&resource_id={cert_id}` shows each renewal, old serial, new serial, issuer, actor.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Renewal policy configuration: `GET /api/v1/policies` showing `renewal_threshold_days` and `alert_thresholds_days`.
|
||||
- Renewal job history: `GET /api/v1/jobs?type=Renewal&status=Completed` with timestamp, before/after serial numbers.
|
||||
- Certificate version history: `GET /api/v1/certificates/{id}/versions` showing all issued versions, dates, issuers.
|
||||
- Audit trail: `GET /api/v1/audit?type=certificate_renewed` for trending and compliance reporting.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Define renewal policies for all certificate profiles** with appropriate thresholds (typically 30 days before expiration for 90+ day certs, more aggressive for shorter-lived).
|
||||
- **Monitor renewal job success** via dashboard (M14 charts show renewal success trends) and alerts.
|
||||
- **Investigate renewal failures** (stuck AwaitingCSR, issuer connectivity, deployment errors) promptly to avoid expired certificates.
|
||||
- **Test renewal workflow in staging environment** before rolling out to production.
|
||||
- **Document key rotation schedule** for your organization (renewal policy thresholds, approval workflows if AwaitingApproval).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.4 — Key Destruction
|
||||
|
||||
**Requirement**: Render cryptographic keys unreadable and unusable when they reach the end of their cryptographic lifetime.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Revocation API** (M15a) — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — general revocation
|
||||
- `keyCompromise` — suspected key compromise
|
||||
- `caCompromise` — CA compromise
|
||||
- `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn` — lifecycle management
|
||||
- Revocation recorded in `certificate_revocations` table with timestamp and reason.
|
||||
- Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
|
||||
- Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
|
||||
- CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
|
||||
- Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
|
||||
|
||||
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
|
||||
|
||||
- **Private Key Destruction on Agent** — When certificate renewed or revoked:
|
||||
- Agent removes old private key file from `CERTCTL_KEY_DIR` when new certificate deployed.
|
||||
- Job status tracking confirms old key is no longer needed.
|
||||
- No audit trail of key deletion (private keys don't pass through control plane).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
|
||||
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
|
||||
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
|
||||
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Revoke certificates immediately upon key compromise suspicion** using reason code `keyCompromise`.
|
||||
- **Revoke certificates at end of lifecycle** (host decommissioning, service sunset) using reason code `cessationOfOperation`.
|
||||
- **Monitor CRL/OCSP availability** — ensure clients can check revocation status (test with TLS validator tools).
|
||||
- **Establish certificate revocation procedure** (who can revoke, approval workflow if required, documentation).
|
||||
- **Physically destroy backup private keys** (if offline backups are kept) when certificate is revoked or after archival period expires.
|
||||
- **Test revocation workflow in staging** — issue test cert, revoke, verify OCSP/CRL reflects revocation within SLA.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 8: Identify and Authenticate
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know, and authenticate and manage all access.
|
||||
|
||||
### 8.3 — Strong Authentication
|
||||
|
||||
**Requirement**: Authentication mechanisms must use strong cryptography and render authentication credentials (passwords, passphrases, keys) unreadable during transmission and storage.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **API Key Authentication** — All REST API endpoints require authentication (default):
|
||||
- Bearer token format: `Authorization: Bearer sk-...`
|
||||
- Key stored as SHA-256 hash in database (plaintext never persisted).
|
||||
- Comparison uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks.
|
||||
- Configuration: `CERTCTL_AUTH_TYPE=api-key` (enforced by default, no opt-out without explicit env var).
|
||||
|
||||
- **GUI Authentication Context** — Web dashboard login flow:
|
||||
- Login page (`/login`) accepts API key entry.
|
||||
- AuthProvider context stores API key in session (localStorage in browser, sent in Authorization header for all API calls).
|
||||
- 401 Unauthorized responses trigger automatic redirect to login.
|
||||
- Logout button clears session.
|
||||
- No session server-side (stateless API).
|
||||
|
||||
- **Credential Transmission** — All API traffic over TLS:
|
||||
- HTTPS enforced at server level (no plaintext HTTP).
|
||||
- API key transmitted in Authorization header (not URL parameter, not cookie).
|
||||
- Browser to server: TLS.
|
||||
- Agent to server: TLS.
|
||||
- No credential logging (audit records the per-key actor `Name`, never the Bearer token; logs redact the `Authorization` header).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- API configuration: `CERTCTL_AUTH_TYPE=api-key` in deployment manifest.
|
||||
- Key inventory: `CERTCTL_API_KEYS_NAMED` env var (format `name:key:admin,...`) — seeds the in-memory `NamedAPIKey{Name, Key, Admin}` struct at `internal/api/middleware/middleware.go:29`. Keys are constant-time-compared (`subtle.ConstantTimeCompare`) against the Bearer token. No database table stores them; protect the env var contents at rest via a secrets manager (Vault / AWS Secrets Manager / Kubernetes Secrets / Docker Secrets).
|
||||
- API audit log: `GET /api/v1/audit?action=api_call` showing per-key actor names (`Name` field of matched `NamedAPIKey`) on every call, with zero plaintext or hashed key material recorded.
|
||||
- TLS certificate on control plane: `openssl s_client -connect {server}:8443` showing valid certificate, TLS 1.2+, strong cipher.
|
||||
- GUI login flow: browser network tab showing Authorization header (token value redacted in compliance report).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Issue API keys to users/systems** requiring API access (outside certctl; you maintain key registry).
|
||||
- **Rotate API keys using zero-downtime rotation** — `CERTCTL_AUTH_SECRET` supports comma-separated keys (e.g., `new-key,old-key`). Add the new key, migrate clients, then remove the old key. Recommendation: rotate at least annually, or immediately when personnel changes.
|
||||
- **Revoke API keys immediately** when user leaves or token is compromised (set `enabled=false` in API key management — not yet implemented in v1, owner must track manually).
|
||||
- **Enforce strong TLS** on control plane: TLS 1.2+, modern ciphers (configure on reverse proxy or `CERTCTL_TLS_*` env vars if operator-controlled).
|
||||
- **Protect `.env` and credential files** where API key is defined (restrict file system access, no version control).
|
||||
- **Monitor API audit trail** for suspicious access patterns (many 401 errors, access from unexpected IPs, etc.).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 8.6 — Application Account Management
|
||||
|
||||
**Requirement**: Users' system access must be restricted to the minimum level of application functions or data needed to perform duties. Application accounts (non-human) must use strong authentication.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **No Application Account Management in v1** — Certctl does not manage user accounts (no user directory, LDAP, OIDC).
|
||||
- All authentication via API key (service-to-service or human user with API key).
|
||||
- No per-user roles or permissions (that's V3 RBAC feature).
|
||||
- Single API key shared across team or one key per automation script (operator's responsibility to manage).
|
||||
|
||||
- **Credentials Not in Source Code** — Security hardening:
|
||||
- API keys via `CERTCTL_API_KEY` env var (not in `main.go`, Dockerfile, `docker-compose.yml`).
|
||||
- Database credentials via `CERTCTL_DATABASE_URL` in `.env` (git-ignored).
|
||||
- CA private key path via `CERTCTL_CA_CERT_PATH`/`CERTCTL_CA_KEY_PATH` (not inline).
|
||||
|
||||
- **Service Account Isolation** (planned for V3) — Future RBAC will support:
|
||||
- Automation script API keys with scoped permissions (e.g., read-only, renew-only, deploy-only).
|
||||
- OIDC/SSO for human users with fine-grained role assignment (admin, operator, viewer).
|
||||
- Audit trail showing which account/role performed each action.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment manifest (Dockerfile, docker-compose.yml) showing no hardcoded API keys, database credentials, or CA key paths.
|
||||
- `.env` file existence (confirm via CI or compliance check, without sharing contents).
|
||||
- `.gitignore` configuration showing `.env`, `*.key`, secrets excluded.
|
||||
- Code review: grep `main.go`, `config.go` for `CERTCTL_API_KEY` — should only see env var reference, not hardcoded values.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Manage API keys externally** (issue, rotate, revoke).
|
||||
- **Document who/what has API key access** (automation scripts, team members, third-party integrations).
|
||||
- **Rotate application credentials** (API keys, database passwords) according to your organization's policy.
|
||||
- **Segregate credentials** — one API key per automation script where possible, or use V3 RBAC scoping.
|
||||
- **Monitor application account usage** via audit trail — `GET /api/v1/audit` filtered by action/actor.
|
||||
|
||||
**Status**: **Available in part** (v1.0: credentials out of source code). **Planned V3**: scoped API keys and RBAC.
|
||||
|
||||
---
|
||||
|
||||
## Requirement 10: Log and Monitor
|
||||
|
||||
**Objective**: Log and monitor access to network resources and cardholder data.
|
||||
|
||||
### 10.2 — Implement Automated Audit Logging
|
||||
|
||||
**Requirement**: Automatically log and monitor all access to system components and records containing cardholder data.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable API Audit Log** (M19) — Middleware captures every API call:
|
||||
- `audit_events` table (append-only, no UPDATE/DELETE):
|
||||
- `method`: HTTP method (GET, POST, PUT, DELETE)
|
||||
- `path`: API endpoint path only, excluding query parameters (e.g., `/api/v1/certificates` — query strings intentionally omitted to prevent sensitive data persistence in the append-only audit trail)
|
||||
- `actor`: authenticated user/service (extracted from API key or context)
|
||||
- `body_hash`: SHA-256 hash of request body (truncated to 16 chars, first 8 chars shown in logs)
|
||||
- `status_code`: HTTP response status (200, 201, 400, 401, 404, 500, etc.)
|
||||
- `latency_ms`: request duration in milliseconds
|
||||
- `timestamp`: RFC 3339 timestamp
|
||||
|
||||
- **Certificate Lifecycle Events** — Higher-level events logged separately:
|
||||
- `certificate_issued` — new certificate created, issuer, profile, profile ID
|
||||
- `certificate_renewed` — certificate renewed, old/new serial, renewal policy
|
||||
- `certificate_revoked` — certificate revoked, RFC 5280 reason code
|
||||
- `certificate_deployed` — certificate deployed to target, agent, target type
|
||||
- `certificate_validated` — validation job result (success/failure reason)
|
||||
|
||||
- **Job Lifecycle Events** — Job status transitions:
|
||||
- `job_created` — renewal/issuance/deployment/validation job created
|
||||
- `job_status_updated` — job state change (Pending → AwaitingCSR → Running → Completed/Failed)
|
||||
|
||||
- **Policy and Configuration Events** — Administrative changes:
|
||||
- `policy_created`, `policy_updated`, `policy_deleted` — renewal policy changes
|
||||
- `profile_created`, `profile_updated`, `profile_deleted` — certificate profile changes
|
||||
- `issuer_created`, `issuer_deleted` — CA connector registration changes
|
||||
|
||||
- **Excluded Paths** — Health/readiness probes not logged to reduce noise:
|
||||
- `GET /health` (excluded by default)
|
||||
- `GET /ready` (excluded by default)
|
||||
- Configurable via `CERTCTL_AUDIT_EXCLUDE_PATHS` env var
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Audit trail export: `GET /api/v1/audit` or manual database query, showing sample events with timestamp, actor, action, resource.
|
||||
- API call audit log: Query `audit_events` table showing method, path, actor, status code for last 24-48 hours.
|
||||
- Configuration changes: `GET /api/v1/audit?type=policy_created,policy_updated,issuer_created` showing who changed what and when.
|
||||
- Certificate lifecycle: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete issuance → deployment → renewal/revocation history.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enable audit logging** — it's on by default; verify `CERTCTL_AUDIT_EXCLUDE_PATHS` is not set to exclude certificate-related paths.
|
||||
- **Monitor audit log growth** — `audit_events` table will grow with every API call. Recommend database maintenance (log rotation policy, archival after 90 days, etc.).
|
||||
- **Export and archive audit logs** — periodically `SELECT * FROM audit_events WHERE timestamp > {date}` and export to secure storage (S3, syslog, SIEM).
|
||||
- **Establish audit review procedure** — QSA may request sample of logs; have export process documented.
|
||||
- **Test audit logging** — make API call, verify event appears in audit trail within seconds.
|
||||
|
||||
**Status**: **Available** (M19 shipped)
|
||||
|
||||
### 10.3 — Protect Audit Trail
|
||||
|
||||
**Requirement**: Promptly protect audit trail files from unauthorized modifications.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Append-Only Database Design** — PostgreSQL triggers and constraints prevent modification:
|
||||
- `audit_events` table has no `UPDATE` or `DELETE` triggers.
|
||||
- Application code never executes UPDATE/DELETE on `audit_events`.
|
||||
- Primary key is `id` (serial); new events always INSERT.
|
||||
|
||||
- **Read-Only API Access** — Audit events accessible only via read (`GET /api/v1/audit`):
|
||||
- No `POST /api/v1/audit/{id}` endpoint (no creation from API).
|
||||
- No `PUT /api/v1/audit/{id}` endpoint (no modification).
|
||||
- No `DELETE /api/v1/audit/{id}` endpoint (no deletion).
|
||||
- Only control plane can record events (via internal service layer, not exposed API).
|
||||
|
||||
- **Database Access Control** (operator responsibility) — PostgreSQL user permissions:
|
||||
- `certctl` application user: INSERT, SELECT on `audit_events`.
|
||||
- `certctl_read_only` user (for compliance/audit team): SELECT only on `audit_events`.
|
||||
- `postgres` superuser: restricted to DBA operations, logged separately by PostgreSQL.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database schema: `\d audit_events` showing columns, primary key, no UPDATE/DELETE triggers.
|
||||
- Application code review: `internal/service/audit.go` showing `RecordEvent(...)` as only INSERT operation.
|
||||
- API endpoint audit: grep `internal/api/handler/audit*.go` or `internal/api/router/router.go` — no PUT/DELETE routes for events.
|
||||
- PostgreSQL permissions: `psql -d certctl -c "\dp audit_events"` showing INSERT/SELECT grants only.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Restrict database access** — issue read-only PostgreSQL user for compliance/audit team (no write privileges).
|
||||
- **Enable PostgreSQL query logging** — log all database connections and operations for DBA audit trail.
|
||||
- **Backup audit logs** — regularly export `audit_events` to offsite storage (S3, archive tape, syslog aggregator) for long-term retention.
|
||||
- **Monitor database modifications** — alert if any UPDATE/DELETE is attempted on `audit_events` (log-based alerting or PostgreSQL event triggers).
|
||||
- **Encrypt audit exports** — if archiving to external storage, encrypt backups at rest.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 10.4 — Promptly Review and Address Audit Trail Exceptions
|
||||
|
||||
**Requirement**: Promptly review audit logs and investigate exceptions/anomalies.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Dashboard Charts** (M14) — Real-time observability:
|
||||
- **Renewal Success Trends** (30-day line chart) — shows job success rate; spikes in failures warrant investigation.
|
||||
- **Certificate Status Distribution** (donut chart) — shows Expiring/Expired counts; high Expired = missed renewals.
|
||||
- **Expiration Timeline** (90-day weekly heatmap) — shows upcoming expirations; bunching = renewal policy tuning needed.
|
||||
- **Issuance Rate** (30-day bar chart) — shows certificate creation/renewal activity; anomalies (zero issuances for weeks) indicate stopped automation.
|
||||
|
||||
- **Stats API** (M14) — Machine-readable trends:
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance/deployment success/failure counts per day.
|
||||
- `GET /api/v1/stats/summary` — total certs, counts by status.
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration buckets for forecasting.
|
||||
|
||||
- **Agent Fleet Overview** (M14) — Agent health visibility:
|
||||
- Pie chart: agent status distribution (healthy, offline, error).
|
||||
- Version breakdown: agent versions in use (identify outdated agents).
|
||||
- Per-agent detail: last heartbeat timestamp, OS/architecture, IP address, recent jobs.
|
||||
|
||||
- **Alert Notifications** (M3, M16a) — Configurable escalation:
|
||||
- Email alerts: certificate approaching expiration, renewal failure, revocation notification.
|
||||
- Webhook: custom HTTP POST to your monitoring system (Slack, Teams, PagerDuty, OpsGenie, custom webhook).
|
||||
- **Retry & Dead-Letter Queue** (I-005) — Transient notifier failures (SMTP timeout, webhook 5xx) are retried with exponential backoff (`2^n` minutes capped at 1h, 5-attempt budget) before landing in the terminal `dead` status. Operators monitor DLQ depth via the `certctl_notification_dead_total` Prometheus counter and requeue via the Notifications page Dead letter tab once the underlying outage is resolved. Closes the pre-I-005 silent-drop gap where a single 5xx could lose a compliance-relevant alert without evidence.
|
||||
- Deduplication: one alert per threshold/certificate per day (avoid alert fatigue).
|
||||
|
||||
- **Audit Trail Filtering and Export** (M13) — Compliance reporting:
|
||||
- `GET /api/v1/audit?actor={user}×tamp_after={date}` — filter audit log by actor, timestamp, type.
|
||||
- Export CSV/JSON via dashboard: audit page → select filters → "Export CSV" or "Export JSON".
|
||||
- Can export full audit trail for QSA review.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Dashboard screenshots: expiration timeline, renewal success trends, status distribution.
|
||||
- Job trend report: `GET /api/v1/stats/job-trends?days=90` showing success/failure rates.
|
||||
- Agent fleet health: `GET /api/v1/agents` showing heartbeat status, version count distribution.
|
||||
- Audit log sample: `GET /api/v1/audit?limit=100` showing certificate issuance/renewal/revocation activity.
|
||||
- Alert configuration: screenshot of renewal policy `alert_thresholds_days` (30, 14, 7, 0) and notifier settings (email, Slack, etc.).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dashboard charts weekly** — look for anomalies (high Expired count, failure spike, renewal stalled).
|
||||
- **Respond to alerts promptly** — expiration alert = investigate renewal (check job logs, issuer connectivity, agent heartbeat).
|
||||
- **Set alert thresholds appropriately** — default 30/14/7/0 days is a starting point; adjust per your SLA and staffing.
|
||||
- **Maintain alert distribution list** — ensure alerts reach the right on-call engineer/team.
|
||||
- **Archive and review audit logs** — export monthly/quarterly for compliance trending (e.g., "all certificate changes last quarter").
|
||||
- **Test alert delivery** — trigger a test renewal failure or manual revocation, verify alert is sent.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, M14 observable charts, M19 audit log)
|
||||
|
||||
### 10.7 — Retain and Protect Audit Trail History
|
||||
|
||||
**Requirement**: Retain audit trail history for at least one year and ensure it can be retrieved.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable Audit Trail** (M19) — `audit_events` table stores all API calls and certificate lifecycle events with timestamps.
|
||||
- **No Automatic Purge** — Certctl does not delete audit events. They remain in PostgreSQL indefinitely.
|
||||
- **Queryable History** — All events accessible via `GET /api/v1/audit` with time range, actor, resource filters.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database retention policy: confirm `audit_events` table has no DELETE triggers or maintenance jobs that purge events.
|
||||
- Sample audit query: `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '365 days'` showing one year+ of events.
|
||||
- Export procedure: documented process for exporting audit logs to cold storage (S3, archive tape, syslog).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Configure PostgreSQL backup/retention** — certctl relies on database backups for audit trail protection.
|
||||
- Backup `audit_events` table daily or per your RPO/RTO.
|
||||
- Retain backups for at least 1 year (configure retention policy on backup system).
|
||||
- Test restore procedure annually.
|
||||
|
||||
- **Export and archive audit logs** — periodically export `SELECT * FROM audit_events WHERE timestamp > {start_date}` to offsite storage.
|
||||
- Recommendation: monthly exports to S3 with versioning enabled.
|
||||
- Encrypt exports at rest.
|
||||
- Retain archives for at least 3 years (adjust per your compliance requirements).
|
||||
|
||||
- **Monitor audit log growth** — `audit_events` table will grow ~1-5 MB/day depending on API call volume.
|
||||
- Estimate: 10,000 API calls/day = ~50 MB/month.
|
||||
- Plan PostgreSQL storage and backup capacity accordingly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 6: Develop and Maintain Secure Systems and Applications
|
||||
|
||||
**Objective**: Develop and maintain secure systems and applications.
|
||||
|
||||
### 6.3.1 — Security Coding Practices
|
||||
|
||||
**Requirement**: Develop all custom application code in accordance with secure coding practices and include authentication, access control, input validation, and error handling.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Input Validation** — Centralized validators enforce strong input constraints:
|
||||
- Common name: max 253 chars, DNS-safe characters only, no leading/trailing hyphens.
|
||||
- CSR PEM: must be valid PEM format (regex validation).
|
||||
- Policy type: whitelist enum (Issuance, Renewal, Revocation, etc.).
|
||||
- API key: alphanumeric + hyphens only.
|
||||
- Implemented in `internal/domain/validation.go` and called from all handler layer inputs.
|
||||
|
||||
- **Error Handling** — No sensitive data leakage in error responses:
|
||||
- HTTP 500 errors return generic "Internal Server Error" message, not stack trace.
|
||||
- Database errors logged internally (structured slog), not exposed to client.
|
||||
- 404 errors do not reveal whether resource exists (consistent "Not Found" regardless of auth vs. not-found).
|
||||
|
||||
- **No Hardcoded Credentials** — All secrets via environment variables:
|
||||
- `CERTCTL_API_KEY`, `CERTCTL_DATABASE_URL`, `CERTCTL_CA_KEY_PATH` — env vars only.
|
||||
- Credentials not in `main.go`, Dockerfile, `docker-compose.yml`, or Git history.
|
||||
- `.env` file git-ignored and excluded from version control.
|
||||
|
||||
- **Dependency Management** — Go module pinning (`go.mod`):
|
||||
- All external dependencies pinned to specific versions.
|
||||
- No wildcard versions or `latest` tags.
|
||||
- CI runs `go mod verify` to detect tampering.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Code review: `internal/domain/validation.go` showing input validation functions (Common name length, CSR PEM, policy type, etc.).
|
||||
- Error handling audit: `internal/api/handler/certificates.go` showing HTTP error responses (no stack traces).
|
||||
- Credentials in source code check: `grep -r "CERTCTL_API_KEY\|DATABASE_URL\|CA_KEY" cmd/ internal/ | grep -v ".env"` (should only show env var references, not values).
|
||||
- `go.mod` review: no wildcard versions, all pinned.
|
||||
- CI workflow: `.github/workflows/ci.yml` showing `go mod verify` step.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dependency updates** — keep Go version current, update certctl dependencies regularly (security patches).
|
||||
- **Scan container images** — use Trivy, Clair, or similar to scan Docker images for known vulnerabilities.
|
||||
- **Maintain secure coding practices** in any custom issuer/target connectors you deploy (scripts for OpenSSL, BASH/PowerShell for IIS/F5).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 6.5.10 — Broken Authentication and Cryptography Prevention
|
||||
|
||||
**Requirement**: Prevent broken authentication and cryptography weaknesses.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Authentication** — API key with SHA-256 hashing, constant-time comparison (`crypto/subtle.ConstantTimeCompare`).
|
||||
- **Cryptography** — Go's `crypto/*` standard library (no weak ciphers). ECDSA P-256, RSA 2048+.
|
||||
- **TLS** — HTTPS enforced (no plaintext HTTP endpoints).
|
||||
- **No Sessions** — Stateless API (no session cookies, no session fixation risk).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 7: Restrict Access by Business Need-to-Know
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know and ensure users are authenticated and authorized.
|
||||
|
||||
### 7.2 — Implement Access Control
|
||||
|
||||
**Requirement**: Ensure proper user identity management and implement access controls based on business need-to-know.
|
||||
|
||||
**certctl v1 Support** (limited):
|
||||
- **Certificate Ownership** (M11b) — Each certificate assigned to owner (person + email) and optional team. Ownership is metadata; access control is not enforced at API level.
|
||||
- **Agent Groups** (M11b) — Renewal policies target specific agent groups (OS, architecture, CIDR, version). Groups are used for policy targeting, not user access control.
|
||||
- **Interactive Approval** (M11b) — `AwaitingApproval` job state allows manual approval/rejection of renewals (enforcement of business workflows, not user access control).
|
||||
|
||||
**certctl v3 Support** (planned):
|
||||
- **OIDC/SSO** — Okta, Azure AD, Google integration. Users log in via identity provider.
|
||||
- **Role-Based Access Control (RBAC)** — Three roles: admin (all operations), operator (issue/renew/deploy), viewer (read-only). Roles assigned via OIDC claims or group membership.
|
||||
- **Profile/Owner Gating** — Operator can renew only certificates assigned to their team; viewer cannot modify anything.
|
||||
- **Audit Trail Attribution** — Every action shows which user/role performed it.
|
||||
|
||||
**Evidence You Can Provide** (v1):
|
||||
- Certificate ownership mapping: `GET /api/v1/certificates` showing owner, team fields (metadata only; access not controlled).
|
||||
- Agent group targeting: `GET /api/v1/policies` showing `agent_group_id` field.
|
||||
- Interactive approval workflow: job detail showing `AwaitingApproval` state, approve/reject endpoints in API docs.
|
||||
|
||||
**Operator Responsibility** (v1):
|
||||
- **Manage API key distribution** externally — only issue API keys to authorized users/systems.
|
||||
- **Implement reverse proxy auth** (Nginx, Apache, Okta proxy) in front of certctl to enforce OIDC/LDAP (outside certctl).
|
||||
- **Plan for V3 RBAC** — budget for upgrade when finer-grained access control is needed.
|
||||
|
||||
**Planned** (V3):
|
||||
- Upgrade to certctl Pro with OIDC/RBAC and per-role audit trail.
|
||||
|
||||
**Status**: **Available in part** (v1.0: ownership metadata, agent group targeting). **Planned V3**: OIDC/RBAC enforcement.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
|
||||
|---|---|---|---|---|---|
|
||||
| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
|
||||
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
|
||||
| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
|
||||
| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
|
||||
| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
|
||||
| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
|
||||
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
|
||||
| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
|
||||
| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
|
||||
| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
|
||||
| **10.3** Audit Protection | Append-only table design, read-only API, DB permissions | API endpoint audit (no PUT/DELETE on events), DB schema | `audit_events` table, PostgreSQL GRANT SELECT | Immutable by design | Available |
|
||||
| **10.4** Review & Alert | Dashboard charts, stats API, notifier integrations | Dashboard (renewal trends, status pie, expiration heatmap), `GET /api/v1/stats/*` | Job results, alert config in policies | `GET /api/v1/audit?type=job_*` | Available |
|
||||
| **10.7** Retention | 1+ year in PostgreSQL, export/archive procedures | Database query `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '1 year'` | `audit_events` table retention (no auto-delete) | Manual export/archival (operator) | Available |
|
||||
| **6.3.1** Secure Coding | Input validation, error handling, no hardcoded secrets, dependency pinning | Code review (validation.go, handlers), error responses | `go.mod` with pinned versions, `.gitignore` | GitHub Actions CI with `go mod verify` | Available |
|
||||
| **7.2** Access Control | Ownership metadata, agent groups, interactive approval | `GET /api/v1/certificates` (owner/team), `GET /api/v1/agent-groups` | Certificate owner/team fields, agent group criteria | User identity from auth context | Available in part (V3: RBAC) |
|
||||
|
||||
---
|
||||
|
||||
## Operator Responsibilities
|
||||
|
||||
The following control objectives are **outside certctl's scope** and must be managed by your organization:
|
||||
|
||||
| Control Objective | Responsibility | Example Actions |
|
||||
|---|---|---|
|
||||
| **Network Segmentation** | Isolate certctl control plane from cardholder network | Place certctl on separate VLAN, firewall rules |
|
||||
| **Physical Security** | Restrict access to servers/databases | Data center access controls, logging |
|
||||
| **Personnel Screening** | Background checks for staff with access | HR/employment verification |
|
||||
| **Access Control Enforcement** | User authentication & authorization outside API | Implement reverse proxy with OIDC (V3: use certctl Pro RBAC) |
|
||||
| **Incident Response** | Procedures for certificate compromise or breach | Document key revocation process, alert escalation |
|
||||
| **Disaster Recovery** | Backup and restore procedures | Database backup schedule, offsite replication |
|
||||
| **Change Management** | Approval process for config/cert changes | CAB meetings, documented procedures |
|
||||
| **Vulnerability Scanning** | ASV scanning, penetration testing, code review | Annual PCI-DSS penetration test |
|
||||
| **Key Backup & Escrow** | Secure offline storage of CA private keys (if required) | Hardware security module (HSM) or encrypted vault |
|
||||
| **Audit Log Retention** | Long-term archival and protection of audit logs | Export to S3/syslog, retain 3+ years |
|
||||
| **QSA Engagement** | Schedule and coordination of compliance assessment | Annual audit with qualified security assessor |
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements for PCI-DSS
|
||||
|
||||
Certctl v3 (Pro) adds paid features that strengthen PCI-DSS compliance posture:
|
||||
|
||||
| Feature | PCI-DSS Benefit |
|
||||
|---|---|
|
||||
| **OIDC/SSO Authentication** | Centralized identity management, audit integration with corporate directory |
|
||||
| **Role-Based Access Control (RBAC)** | Least-privilege enforcement: admin, operator, viewer roles with profile/team gating |
|
||||
| **Bulk Revocation by Profile/Owner/Agent** | Rapid incident response (revoke all certs in cardholder network in minutes) |
|
||||
| **NATS Event Bus with JetStream Audit Streaming** | Real-time event streaming to SIEM (Splunk, ELK, Datadog) for centralized audit trail |
|
||||
| **Certificate Health Scores** | Proactive risk identification (composite scoring: expiration proximity, rotation age, key strength) |
|
||||
| **Advanced Search DSL** | Complex audit queries (POST /search with nested AND/OR, regex, field projection) for compliance reporting |
|
||||
| **CT Log Monitoring** | Detect unauthorized certificate issuance (security vulnerability detection) |
|
||||
| **DigiCert Issuer Connector** | Enterprise CA integration for compliance audits |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps for Compliance
|
||||
|
||||
1. **Review this mapping with your QSA** — Confirm which requirements apply to your cardholder data environment.
|
||||
|
||||
2. **Configure certctl for your environment**:
|
||||
- Set `CERTCTL_KEYGEN_MODE=agent` in production.
|
||||
- Define certificate profiles with approved key types.
|
||||
- Configure renewal policies with appropriate thresholds (e.g., 30 days for 90-day certs).
|
||||
- Enable notifier integrations (email, Slack, PagerDuty) for alerts.
|
||||
- Plan `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate locations.
|
||||
|
||||
3. **Implement operator controls**:
|
||||
- Document certificate management procedures (issuance, renewal, revocation, archival).
|
||||
- Establish API key rotation schedule.
|
||||
- Set up audit log export and archival (monthly to S3, retain 1+ year).
|
||||
- Configure PostgreSQL backups (daily, 1+ year retention).
|
||||
- Plan incident response (who revokes certs, escalation process, timeline).
|
||||
|
||||
4. **Test compliance readiness**:
|
||||
- Trigger a test renewal and verify CRL/OCSP publication.
|
||||
- Export audit trail and verify it shows expected events.
|
||||
- Test revocation workflow and confirm OCSP reflects status within 24 hours.
|
||||
- Run discovery scan and verify unknown certs are detected and triaged.
|
||||
|
||||
5. **Prepare evidence for QSA**:
|
||||
- API endpoint documentation (OpenAPI spec: `api/openapi.yaml`).
|
||||
- Audit log sample (last 90 days of events).
|
||||
- Configuration export (profiles, policies, issuer/target definitions).
|
||||
- Deployment manifest (showing env var config, no hardcoded secrets).
|
||||
- Test certificates and CRL/OCSP query results.
|
||||
|
||||
6. **Plan for V3** (if RBAC/centralized audit required):
|
||||
- Evaluate certctl Pro for OIDC/SSO and NATS audit streaming.
|
||||
- Assess integration with existing identity provider (Okta, Azure AD, etc.).
|
||||
|
||||
---
|
||||
|
||||
## Questions?
|
||||
|
||||
For additional guidance on certctl features and PCI-DSS mapping:
|
||||
- Review the [Architecture Guide](architecture.md) for system design.
|
||||
- Check [Connectors Documentation](connectors.md) for issuer/target/notifier capabilities.
|
||||
- Run the [Quick Start Guide](quickstart.md) to see features in action.
|
||||
- Consult your QSA for final compliance determination.
|
||||
|
||||
**Last Updated**: March 24, 2026 (certctl v1.0 with M18b discovery and M19 audit logging)
|
||||
@@ -1,587 +0,0 @@
|
||||
# SOC 2 Type II Compliance Mapping
|
||||
|
||||
This guide maps certctl's implemented features to AICPA SOC 2 Trust Service Criteria (TSC). It is **not a SOC 2 certification claim** — rather, it helps security engineers, auditors, and evaluators understand how certctl supports your organization's SOC 2 compliance posture. Use this as evidence input for your own control assessment during SOC 2 audits.
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
SOC 2 audits require evidence that your infrastructure meets specific Trust Service Criteria. Auditors ask: "Does your certificate management tooling support CC6.1 logical access controls?" This guide answers by mapping certctl's features to specific criteria and pointing to evidence (API endpoints, configuration, audit trail).
|
||||
|
||||
Each section includes:
|
||||
|
||||
- **The TSC requirement** — what the auditor is looking for
|
||||
- **certctl's implementation** — which features address it
|
||||
- **Evidence location** — where to find proof (API endpoint, config variable, source code, audit events)
|
||||
- **V2 vs V3 status** — whether feature is in the free community edition (V2) or paid Pro edition (V3)
|
||||
- **Operator responsibility** — aspects your organization must handle outside of certctl
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [CC6: Logical and Physical Access Controls](#cc6-logical-and-physical-access-controls)
|
||||
- [CC6.1 — Logical Access Security](#cc61--logical-access-security)
|
||||
- [CC6.2 — Prior to Issuing System Credentials](#cc62--prior-to-issuing-system-credentials)
|
||||
- [CC6.3 — Authentication Policies](#cc63--authentication-policies)
|
||||
- [CC6.7 — Information Transmission Protection](#cc67--information-transmission-protection)
|
||||
3. [CC7: System Operations](#cc7-system-operations)
|
||||
- [CC7.1 — System Monitoring](#cc71--system-monitoring)
|
||||
- [CC7.2 — Anomaly Detection](#cc72--anomaly-detection)
|
||||
- [CC7.3 — Incident Response](#cc73--incident-response)
|
||||
- [CC7.4 — Identify and Develop Risk Mitigation Activities](#cc74--identify-and-develop-risk-mitigation-activities)
|
||||
4. [A1: Availability](#a1-availability)
|
||||
- [A1.1/A1.2 — Availability and Recovery](#a11a12--availability-and-recovery)
|
||||
5. [CC8: Change Management](#cc8-change-management)
|
||||
- [CC8.1 — Change Control](#cc81--change-control)
|
||||
6. [Evidence Summary Table](#evidence-summary-table)
|
||||
7. [What Requires Operator Action](#what-requires-operator-action)
|
||||
8. [V3 Enhancements](#v3-enhancements)
|
||||
9. [Conclusion](#conclusion)
|
||||
|
||||
## CC6: Logical and Physical Access Controls
|
||||
|
||||
### CC6.1 — Logical Access Security
|
||||
|
||||
**Requirement**: The entity restricts logical access to digital and information assets and related facilities by applying user identity authentication, registration, access rights, and usage policies.
|
||||
|
||||
**certctl Implementation** (V2 — Community Edition):
|
||||
|
||||
- **API Key Authentication** — All `/api/v1/*` calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
|
||||
- **Standards-based enrollment and PKI distribution endpoints** — EST (`/.well-known/est/*`, RFC 7030), SCEP (`/scep`, `/scep/*`, RFC 8894), and CRL/OCSP (`/.well-known/pki/crl/{issuer_id}`, `/.well-known/pki/ocsp/{issuer_id}/{serial}`, RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer because these protocols cannot present certctl Bearer tokens. Authentication is enforced in-protocol: EST relies on CSR signature verification plus profile policy (RFC 7030 §3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous); SCEP requires a shared `challengePassword` in the PKCS#10 CSR attributes (OID 1.2.840.113549.1.9.7, RFC 8894 §3.2), validated with `crypto/subtle.ConstantTimeCompare`; CRL and OCSP are intentionally anonymous for relying-party accessibility. CWE-306 (missing authentication for a critical function) is closed for SCEP by `preflightSCEPChallengePassword` in `cmd/server/main.go`, which refuses to start the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware) and is pinned by the 27-subtest regression harness at `cmd/server/finalhandler_test.go`.
|
||||
- **GUI Authentication** — Web dashboard includes login screen requiring API key entry. Failed auth redirects to login on 401. Auth context persists across page navigation. Logout clears session.
|
||||
- **Configurable CORS** — API restricts cross-origin requests via `CERTCTL_CORS_ORIGINS` allowlist or wildcard. Preflight caching prevents chatty browser auth flows.
|
||||
- **Token Bucket Rate Limiting** — Per-IP rate limiting (configurable via `CERTCTL_RATE_LIMIT_RPS` / `CERTCTL_RATE_LIMIT_BURST`) returns 429 Too Many Requests with Retry-After header. Prevents credential stuffing and brute-force attacks.
|
||||
- **No Password Storage** — certctl does not store user passwords. API keys are the sole authentication mechanism. Your API key generation, distribution, and rotation policies are your responsibility (see "Operator Responsibility" below).
|
||||
- **Zero-Downtime Key Rotation** — `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `new-key,old-key`). All listed keys are validated with constant-time comparison. Operators can add a new key, migrate clients, then remove the old key — no service restart required for the client migration phase. A single-key warning is logged at startup to encourage rotation configuration.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- API auth implementation: `internal/api/middleware/auth.go`
|
||||
- Auth check endpoint: `GET /api/v1/auth/check` (validates credentials)
|
||||
- Auth info endpoint: `GET /api/v1/auth/info` (returns current auth mode, served without auth so GUI detects mode)
|
||||
- Rate limiting middleware: `internal/api/middleware/rate_limit.go`
|
||||
- CORS configuration: `cmd/server/main.go`, search for `CERTCTL_CORS_ORIGINS`
|
||||
- Final handler dispatch (authenticated vs. unauthenticated routing): `cmd/server/main.go:buildFinalHandler`
|
||||
- SCEP preflight gate (CWE-306 closure): `cmd/server/main.go:preflightSCEPChallengePassword`
|
||||
- SCEP service-layer defense-in-depth (rejects enrollment on empty challenge password, `crypto/subtle.ConstantTimeCompare`): `internal/service/scep.go`
|
||||
- Final handler dispatch regression harness (27 subtests): `cmd/server/finalhandler_test.go`
|
||||
- OpenAPI spec `security: []` overrides on unauthenticated paths: `api/openapi.yaml` (EST `/cacerts`, `/simpleenroll`, `/simplereenroll`, `/csrattrs`; SCEP `/scep` GET+POST; PKI `/crl/{issuer_id}`, `/ocsp/{issuer_id}/{serial}`)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC / SSO Integration** — Optional OIDC providers (Okta, Azure AD, Google) with multi-tenant support. API key fallback for service accounts.
|
||||
- **API Key Scoping** — Per-resource or per-action permissions (e.g., "read certificates from production only" or "issue certs, no revoke")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Generate and securely distribute API keys to authorized users and systems
|
||||
- Rotate API keys regularly (recommend quarterly)
|
||||
- Revoke API keys immediately upon employee departure
|
||||
- Do not commit API keys to version control (use `.env` or secrets management)
|
||||
- Implement your own IP allowlisting at the firewall if needed (certctl enforces CORS at the HTTP layer, not at network layer)
|
||||
|
||||
---
|
||||
|
||||
### CC6.2 — Prior to Issuing System Credentials
|
||||
|
||||
**Requirement**: The entity provisions, modifies, disables, and removes user identities and rights based on an authorization process that considers user responsibility level and changes in those responsibilities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Ownership Attribution** — Certificates can be assigned to an owner (email + name). Owner information is stored and audited (see CC7.2). Ownership is tracked through the lifecycle (issuance, renewal, deployment, revocation). Ownership reassignment is audited via the immutable audit trail.
|
||||
- **Team Assignment** — Owners can be organized into teams. Certificate policies can route notifications to team email addresses.
|
||||
- **Audit Trail Attribution** — Every API call records the actor (extracted from the API key or auth context). The audit trail is immutable — no retroactive modification of who did what.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Ownership domain model: `internal/domain/certificate.go` (OwnerID field)
|
||||
- Owner CRUD API: `GET /api/v1/owners`, `POST /api/v1/owners`, `DELETE /api/v1/owners/{id}`
|
||||
- Team CRUD API: `GET /api/v1/teams`, `POST /api/v1/teams`, `DELETE /api/v1/teams/{id}`
|
||||
- Audit trail API: `GET /api/v1/audit` (actor field in every record)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **RBAC (Role-Based Access Control)** — Predefined roles (Admin, Operator, Viewer) with profile-gated permissions. Administrators manage role assignments.
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Map certctl's ownership model to your organizational structure (departments, teams, on-call rotations)
|
||||
- Establish a formal access request and approval process
|
||||
- Remove ownership access when team members depart
|
||||
- Document your access review process (audit trail shows *who* made changes, but you must justify *why*)
|
||||
|
||||
---
|
||||
|
||||
### CC6.3 — Authentication Policies
|
||||
|
||||
**Requirement**: The entity determines, documents, communicates, and enforces authentication policies that support the identification and authentication of authorized internal and external users and the transmission of user credentials.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **API Key Policy** — All `/api/v1/*` access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup. The standards-based enrollment and PKI distribution endpoints (EST, SCEP, CRL, OCSP) are served unauthenticated at the HTTP layer per their respective RFCs; see CC6.1 for the full authentication contract and CWE-306 closure via `preflightSCEPChallengePassword`.
|
||||
- **Agent Authentication** — Agents authenticate to the server via API keys (same mechanism as users). Agent credentials are separate from user API keys.
|
||||
- **Private Key Policy** — Agent-side key generation is the default (`CERTCTL_KEYGEN_MODE=agent`). Server-side keygen (`CERTCTL_KEYGEN_MODE=server`) requires explicit configuration and logs a warning: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only".
|
||||
- **Password Policy** — Not applicable; certctl uses API keys exclusively. Password management is delegated to your organization's IAM system if you integrate OIDC/SSO (V3).
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Auth type configuration: `internal/config/config.go`, `CERTCTL_AUTH_TYPE` env var
|
||||
- Startup logging: `cmd/server/main.go` (logs auth mode at server startup)
|
||||
- Keygen mode configuration: `internal/config/config.go`, `CERTCTL_KEYGEN_MODE` env var
|
||||
- Keygen mode warning: `cmd/server/main.go` and `cmd/agent/main.go`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC Policy** — Mandatory MFA when OIDC is enabled
|
||||
- **API Key Expiration** — Automatic key rotation policies (e.g., 90-day expiration for user keys, no expiration for long-lived service account keys)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Document your API key generation and distribution policy
|
||||
- Establish a formal change control process for auth configuration changes
|
||||
- Test authentication failures (e.g., expired keys, malformed tokens) in a non-production environment
|
||||
- Integrate certctl authentication into your organization's IAM audit reports (who has API keys, when were they issued, who has revoked them)
|
||||
|
||||
---
|
||||
|
||||
### CC6.7 — Information Transmission Protection
|
||||
|
||||
**Requirement**: The entity restricts the transmission, movement, and removal of information in a manner that prevents unauthorized disclosure, whether through digital or non-digital means.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **TLS for Control Plane** — All API communication occurs over HTTPS (TLS 1.2+). Server uses `tls.Dial()` for outbound connections to issuers and targets. Configuration: `CERTCTL_SERVER_HOST` (default `127.0.0.1`) + `CERTCTL_SERVER_PORT` (default `8080`; Docker Compose maps to `8443`).
|
||||
- **Agent-to-Server Communication** — Agents submit CSRs and heartbeats over HTTPS to the server using the same TLS stack.
|
||||
- **Private Key Isolation** — Agents generate ECDSA P-256 private keys locally (`crypto/ecdsa` + `crypto/elliptic`). Private keys are never transmitted to the server — agents submit CSRs only. Private keys are stored on agent filesystem (`CERTCTL_KEY_DIR`, default `/var/lib/certctl/keys`) with 0600 (owner read/write only) permissions. Server-side keygen mode logs a development warning; production must use agent-side keygen.
|
||||
- **Certificate Storage** — Signed certificates are stored in PostgreSQL as PEM text (along with metadata). Certificates are not secrets and may be transmitted plaintext. Private keys are never stored on the control plane in production (agent-side keygen mode).
|
||||
- **Deployment via Target Connectors** — Target connectors write certificates and keys to local filesystem or network appliance APIs. For NGINX/Apache httpd, files are written with restrictive permissions (0600 for keys). For F5/IIS (V3+), credentials are scoped to a proxy agent in the same network zone — the server never holds network appliance credentials.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- TLS configuration: deploy certctl behind a TLS-terminating reverse proxy (NGINX, HAProxy, or cloud load balancer) or use a TLS sidecar
|
||||
- Agent keygen mode: `cmd/agent/main.go` (ECDSA key generation, filesystem storage with 0600)
|
||||
- Private key handling: `internal/connector/target/nginx/nginx.go` and similar (cert/key file write)
|
||||
- Server-side keygen deprecation: `internal/service/renewal.go` (log warning when enabled)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Hardware Security Module (HSM) Support** — Optional HSM backend for CA key storage (SubCA and Local CA modes)
|
||||
- **Secrets Rotation** — Encrypted key rotation without server restart
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Enable TLS on the control plane in production (deploy behind a TLS-terminating reverse proxy or load balancer with valid certificates)
|
||||
- Enforce TLS on agent-to-server communication via firewall rules (no cleartext HTTP)
|
||||
- Protect agent filesystem key storage with:
|
||||
- File-level permissions (already 0600)
|
||||
- Encrypted filesystems (LUKS, BitLocker, or cloud provider equivalents)
|
||||
- Backup encryption (keys backed up to vault or HSM, never in cleartext backups)
|
||||
- Restrict PostgreSQL access to authorized services only (network isolation, authentication)
|
||||
- For target systems, ensure network traffic from agents to targets is encrypted (TLS, IPsec, or VPN)
|
||||
|
||||
---
|
||||
|
||||
## CC7: System Operations
|
||||
|
||||
### CC7.1 — System Monitoring
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Endpoint** — `GET /health` returns 200 OK with service status. Consumed by Docker health checks and Kubernetes probes.
|
||||
- **Readiness Endpoint** — `GET /ready` returns 200 OK when the database is connected and migrations are applied.
|
||||
- **Background Scheduler Monitoring** — 12 background loops (8 always-on + 4 opt-in) run on a fixed schedule. Authoritative topology in `docs/architecture.md`:
|
||||
- Renewal loop (always-on, 1 hour): scans for certificates approaching renewal threshold
|
||||
- Job processor loop (always-on, 30 seconds): picks up pending/waiting jobs and advances their state
|
||||
- Job retry loop (always-on, 5 minutes, `CERTCTL_SCHEDULER_RETRY_INTERVAL`): retries Failed jobs (I-001)
|
||||
- Job timeout reaper loop (always-on, 10 minutes, `CERTCTL_JOB_TIMEOUT_INTERVAL`): fails AwaitingCSR/AwaitingApproval jobs past timeout (I-003)
|
||||
- Agent health check loop (always-on, 2 minutes): pings agents to detect downtime
|
||||
- Notification dispatcher loop (always-on, 1 minute): sends queued alerts
|
||||
- Notification retry loop (always-on, 2 minutes, `CERTCTL_NOTIFICATION_RETRY_INTERVAL`): exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005)
|
||||
- Short-lived cert expiry loop (always-on, 30 seconds): marks expired short-lived credentials
|
||||
- Network scanner loop (opt-in, 6 hours, `CERTCTL_NETWORK_SCAN_ENABLED`): scans enabled TLS endpoints for certificate discovery
|
||||
- Digest emailer loop (opt-in, 24 hours, `CERTCTL_DIGEST_INTERVAL`): sends scheduled certificate digest email to configured recipients
|
||||
- Endpoint health loop (opt-in, 60 seconds, `CERTCTL_HEALTH_CHECK_INTERVAL`): continuous TLS health probes (M48)
|
||||
- Cloud discovery loop (opt-in, 6 hours, `CERTCTL_CLOUD_DISCOVERY_INTERVAL`): cloud secret manager certificate discovery (M50)
|
||||
Each loop includes `atomic.Bool` idempotency guards, error handling, and structured slog failure logs.
|
||||
- **Metrics Endpoints** — Two formats for monitoring integration:
|
||||
- `GET /api/v1/metrics` — JSON object with gauges, counters, and uptime for custom dashboards
|
||||
- `GET /api/v1/metrics/prometheus` — Prometheus exposition format (`text/plain; version=0.0.4`) for native scraping by Prometheus, Grafana Agent, Datadog, and other OpenMetrics-compatible collectors
|
||||
- **Gauges** — `certctl_certificate_total`, `certctl_certificate_active`, `certctl_certificate_expiring`, `certctl_certificate_expired`, `certctl_certificate_revoked`, `certctl_agent_total`, `certctl_agent_active`, `certctl_job_pending`
|
||||
- **Counters** — `certctl_job_completed_total`, `certctl_job_failed_total`
|
||||
- **Uptime** — `certctl_uptime_seconds` (seconds since server start)
|
||||
All values are point-in-time snapshots computed from database tables.
|
||||
- **Structured Logging** — All scheduler operations, API calls, and connector actions log via `slog` (Go's structured logger). Logs include timestamp, level (DEBUG/INFO/WARN/ERROR), structured fields (e.g., `actor`, `resource_id`, `latency_ms`), and request IDs for tracing.
|
||||
- **Request ID Propagation** — Each HTTP request gets a unique ID (`X-Request-ID` header). The ID is included in all correlated logs, making it easy to trace a single request through multiple service layers.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health/readiness endpoints: `internal/api/handler/health.go`
|
||||
- Background scheduler: `internal/scheduler/scheduler.go` (Start method)
|
||||
- Metrics endpoint: `internal/api/handler/metrics.go`
|
||||
- Stats API endpoints (for detailed time-series): `internal/api/handler/stats.go`
|
||||
- `GET /api/v1/stats/summary` — dashboard KPIs
|
||||
- `GET /api/v1/stats/certificates-by-status` — cert counts by status
|
||||
- `GET /api/v1/stats/expiration-timeline?days=N` — cert expiry distribution
|
||||
- `GET /api/v1/stats/job-trends?days=N` — job completion/failure rates
|
||||
- `GET /api/v1/stats/issuance-rate?days=N` — cert issuance volume
|
||||
- Structured logging middleware: `internal/api/middleware/middleware.go`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure log aggregation (e.g., ELK, Datadog, Splunk) to centralize certctl logs
|
||||
- Set up alerting on scheduler loop failures (e.g., "renewal loop failed to complete within 2h")
|
||||
- Configure health check monitoring (e.g., Prometheus scrape of `/health` and `/ready`)
|
||||
- Establish thresholds for metrics (e.g., alert if `pending_jobs > 50` or `agents_healthy < total_agents`)
|
||||
- Document your log retention policy (audit requirement often mandates 1+ years)
|
||||
- Integrate certctl metrics into your broader observability stack (Grafana dashboards, SLO tracking)
|
||||
|
||||
---
|
||||
|
||||
### CC7.2 — Anomaly Detection
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
(This criterion overlaps CC7.1 and extends it to specific anomaly response mechanisms.)
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, URL path (query parameters intentionally excluded — see security note), actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp. **Security: Query parameters are excluded from the audit path** because they may contain cursor tokens, API keys, or sensitive filter values; since the audit trail is append-only with no deletion, any sensitive data recorded would persist permanently.
|
||||
- **Audit Trail API** — `GET /api/v1/audit?actor=...&action=...&resource_id=...&created_after=...&created_before=...` allows searching for anomalous patterns (e.g., "who accessed certificate XYZ and when?", "did anyone revoke certs at 2 AM?").
|
||||
- **Expiration Threshold Alerting** — Certificate renewal policies define alert thresholds (days before expiry): default `[30, 14, 7, 0]`. When a certificate approaches a threshold, a notification is enqueued. Deduplication prevents duplicate alerts for the same cert at the same threshold. Auto status transition: cert moves to `Expiring` status at 30 days, `Expired` at 0 days.
|
||||
- **Certificate Status Auto-Transitions** — When a cert is issued, it's `Active`. As expiry approaches, status auto-transitions to `Expiring` (at 30d threshold). At expiry, status becomes `Expired`. Revoked certs move to `Revoked`. These transitions are recorded in the audit trail.
|
||||
- **Notification Routing** — Alerts are sent via configured notifiers (Email, Slack, Teams, PagerDuty, OpsGenie). Certificates are routed to their owner's email address (or team email if no individual owner). This allows on-call teams to react to anomalies (e.g., "your production cert will expire in 7 days, request renewal now").
|
||||
- **Deployment Rollback** — If a deployment fails or an older certificate needs to be reactivated, operators can trigger a "rollback" via the GUI. This redeploys a previous certificate version to the target. Rollback actions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Audit middleware: `internal/api/middleware/audit.go`
|
||||
- Audit trail API: `internal/api/handler/audit.go`, `GET /api/v1/audit`
|
||||
- Expiration alerting: `internal/service/renewal.go` (CheckRenewal method)
|
||||
- Notification dispatcher: `internal/scheduler/scheduler.go` (notificationTicker)
|
||||
- Status transitions: `internal/service/certificate.go` (auto status update logic)
|
||||
- Audit trail CLI export: `certctl-cli audit export --format csv` / `--format json`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **SIEM Export** — Real-time audit event streaming to SIEM systems (via NATS event bus with JetStream sink)
|
||||
- **Anomaly Rules Engine** — Configurable rules (e.g., "alert if certificate revoked by non-admin", "alert if >10 certs issued in < 1 hour")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Integrate audit trail into your SIEM / log analysis platform
|
||||
- Define alerting rules and thresholds for anomalies (e.g., "revocation of critical cert", "mass issuance")
|
||||
- Establish a formal incident response workflow (audit trail shows *what* happened; you must decide *what to do* about it)
|
||||
- Regularly review audit logs (e.g., monthly compliance audit of who accessed what)
|
||||
- Configure email/Slack/Teams integration so on-call teams are notified of cert expirations immediately
|
||||
- Encrypt audit trail backups (ACID guarantees don't prevent theft of database backups)
|
||||
|
||||
---
|
||||
|
||||
### CC7.3 — Incident Response
|
||||
|
||||
**Requirement**: The entity detects, investigates, and responds to incidents by executing a defined incident response and management process that includes preparation, detection and analysis, containment, eradication, recovery, and post-incident activities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Revocation API** — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — catch-all
|
||||
- `keyCompromise` — private key was exposed
|
||||
- `caCompromise` — CA itself was compromised (rare)
|
||||
- `affiliationChanged` — certificate no longer applies to the organization
|
||||
- `superseded` — newer cert is in use
|
||||
- `cessationOfOperation` — service is shutting down
|
||||
- `certificateHold` — temporary revocation (can be "unhold" by reissue)
|
||||
- `privilegeWithdrawn` — access rights revoked
|
||||
Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
|
||||
- **CRL Endpoint** — `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
|
||||
- **OCSP Responder** — `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
|
||||
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
|
||||
- Certificate owner (email)
|
||||
- Configured webhooks (if you have a SIEM that subscribes)
|
||||
- Slack/Teams channels (if notifiers are configured)
|
||||
- **Bulk Revocation for Fleet-Wide Incidents** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. Essential for incident response: key compromise affecting multiple certs, CA distrust events, decommissioning a team's infrastructure. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring audit trail and notifications for every certificate.
|
||||
- **Short-Lived Cert Exemption** — Certificates with TTL < 1 hour (configured in profile) skip CRL/OCSP publication. Expiry is the revocation mechanism for short-lived certs (e.g., Kubernetes pod certs, session tokens).
|
||||
- **Deployment Rollback** — If a revoked cert is still deployed (shouldn't happen, but race conditions exist), operators can manually redeploy a previous version via the GUI. Rollback is audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Revocation API: `internal/api/handler/certificates.go`, `POST /api/v1/certificates/{id}/revoke`
|
||||
- Revocation domain model: `internal/domain/revocation.go` (RevocationReason type with RFC 5280 mapping)
|
||||
- CRL generation: `internal/service/certificate.go` (GenerateDERCRL method)
|
||||
- OCSP signing: `internal/service/certificate.go` (GetOCSPResponse method)
|
||||
- Revocation notifications: `internal/service/notification.go` (SendRevocationNotification)
|
||||
- Short-lived exemption: `internal/domain/revocation.go` (IsShortLivedCert check)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Revocation Automation** — Trigger revocation based on external events (e.g., employee termination, security breach alert from CT Log monitoring)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Establish an incident response policy (e.g., "keyCompromise → immediate deployment to new cert + notify CISO")
|
||||
- Ensure CRL/OCSP are accessible to all systems using the certs (e.g., CDN or highly-available endpoints if you host on-premises)
|
||||
- Test revocation workflow in staging (verify that revoked certs are actually blocked by clients)
|
||||
- Document justification for revocation (audit trail records *that* a cert was revoked, but not *why* — you must document it separately)
|
||||
- Integrate revocation notifications into your on-call rotation (don't let revocation alerts get lost)
|
||||
|
||||
---
|
||||
|
||||
### CC7.4 — Identify and Develop Risk Mitigation Activities
|
||||
|
||||
**Requirement**: The entity identifies, develops, and implements risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Renewal Job Tracking** — Renewal jobs track the certificate, target agents, and issuance outcome. Failed renewals are retried (configurable backoff). Job state diagram: Pending → Running → Completed (or Failed). Failed jobs trigger notifications.
|
||||
- **Agent Health Monitoring** — Health check loop (every 2m) pings all agents via heartbeat. If an agent misses 3 consecutive heartbeats, it's marked as `Unhealthy`. Unhealthy agents are excluded from new deployments.
|
||||
- **Job Cancellation** — Operators can cancel pending jobs via `POST /api/v1/jobs/{id}/cancel`. Useful when a renewal is already in progress elsewhere (multi-instance deployments) or when a certificate is being phased out.
|
||||
- **Interactive Approval** — Renewal/issuance jobs can be put in `AwaitingApproval` status. An authorized operator reviews the pending cert and approves or rejects it. Rejection records a reason in the audit trail. This provides a separation of duty between requestor and approver.
|
||||
- **Scheduled Scanning** — Agents scan configured directories for existing certs (M18b discovery). Operators triage discovered certs (claim = "we manage this now", dismiss = "this is unmanaged and we're OK with that"). Triage decisions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Job state machine: `internal/domain/job.go` (JobStatus enum)
|
||||
- Job retry logic: `internal/scheduler/scheduler.go` (jobProcessorTicker)
|
||||
- Agent health check: `internal/scheduler/scheduler.go` (healthCheckTicker)
|
||||
- Job cancellation: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/cancel`
|
||||
- Approval workflow: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/approve` / `reject`
|
||||
- Discovery scan results: `internal/api/handler/discovery.go`, `GET /api/v1/discovered-certificates`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Monitor renewal job success rate (are certs being renewed before expiry?)
|
||||
- Set up alert for unhealthy agents (missing 3+ heartbeats = broken agent, take action)
|
||||
- Establish a formal approval policy (who can approve certs? do they need to involve CISO?)
|
||||
- Test job cancellation and recovery flows in staging
|
||||
- Review discovered certs regularly (are there unmanaged certs that should be managed?)
|
||||
- Document your disaster recovery process (what if control plane database is corrupted?)
|
||||
|
||||
---
|
||||
|
||||
## A1: Availability
|
||||
|
||||
### A1.1/A1.2 — Availability and Recovery
|
||||
|
||||
**Requirement**: The entity obtains or generates, uses, retains, and disposes of information to enable the entity to meet its objectives and respond to its responsibility to provide information.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Probes** — `/health` and `/ready` endpoints support container orchestration (Docker Compose, Kubernetes, etc.). Docker Compose defines health checks for the server and database. Kubernetes would use liveness/readiness probes pointing to these endpoints.
|
||||
- **Database Migrations (Idempotent)** — PostgreSQL migrations use `IF NOT EXISTS` and `ON CONFLICT ... DO NOTHING` patterns. Migrations can be safely reapplied — no risk of doubling data or dropping tables mid-migration.
|
||||
- **Agent Panic Recovery** — Agent binary includes panic recovery in job execution loops. If an agent crashes during a deployment, the control plane marks the job as failed and can retry on a healthy agent.
|
||||
- **Exponential Backoff** — Agent-to-server communication uses exponential backoff (starting at 1s, capped at 5m) to handle transient network failures. This prevents thundering herd when the control plane is temporarily down.
|
||||
- **Docker Compose Deployment** — Includes health checks for server and database. Services auto-restart on failure.
|
||||
- **PostgreSQL Connection Pooling** — Server uses `database/sql` with configurable `MaxOpenConns` and `MaxIdleConns` (default 25/5). Prevents connection exhaustion.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health endpoints: `internal/api/handler/health.go`
|
||||
- Database migrations: `migrations/` directory (all use `IF NOT EXISTS`, idempotent patterns)
|
||||
- Agent panic recovery: `cmd/agent/main.go` (defer recover() in job execution)
|
||||
- Exponential backoff: `cmd/agent/main.go` (heartbeat and work poll backoff logic)
|
||||
- Connection pooling: `cmd/server/main.go` (SetMaxOpenConns, SetMaxIdleConns)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Multi-Region HA** — Control plane federation with etcd consensus (operator can run N replicas)
|
||||
- **PostgreSQL HA** — Replication standby with automatic failover (operator responsibility to configure)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure PostgreSQL backups (e.g., WAL archiving, daily full backups). Certctl stores certificates but *also* stores renewal policies, audit trail, deployment history.
|
||||
- Test backup/restore process in staging (broken backups are discovered during incidents)
|
||||
- Monitor disk usage (PostgreSQL will fail if `/var` fills up)
|
||||
- Plan capacity (how many certs, agents, jobs can your PostgreSQL handle? Certctl is tested with 10k+ certs, 100+ agents, but your infra may differ)
|
||||
- Set up high-availability PostgreSQL if you need zero-downtime upgrades
|
||||
- Implement network segmentation (only authorized services can reach certctl API and database)
|
||||
|
||||
---
|
||||
|
||||
## CC8: Change Management
|
||||
|
||||
### CC8.1 — Change Control
|
||||
|
||||
**Requirement**: The entity identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Certificate Profiles** — Named profiles define allowed key types, max TTL, required SANs, and permitted EKUs. Changes to profiles are common (e.g., "increase max TTL from 1 year to 3 years"). All profile changes are audited (who changed what, when). Profile updates are versioned.
|
||||
- **Policy Engine** — Renewal policies define alert thresholds and approval workflows. Policy changes (e.g., "lower alert threshold from 30 days to 14 days") are audited. Policies have violation rules (e.g., "flag certs longer than 3 years") — violations are recorded in the audit trail.
|
||||
- **Target Configuration** — When a new target (NGINX server, HAProxy load balancer) is added, it's registered with a name and configuration (JSON). Target deletions require confirmation (to prevent accidental removal). All target changes are audited.
|
||||
- **Immutable Audit Trail** — Every change (profile, policy, target, cert, agent, owner, team, approval, revocation, deployment) is recorded in `audit_events`. Audit records are append-only; no retroactive modification is possible. Audit trail is encrypted at rest (operator responsibility).
|
||||
- **GitHub Actions CI** — Pull requests must pass:
|
||||
- Go unit tests (`go test ./...`) with coverage gates (service layer ≥30%, handler layer ≥50%)
|
||||
- Go vet (static analysis)
|
||||
- Frontend TypeScript type checking (`tsc`)
|
||||
- Frontend Vitest unit tests
|
||||
- Frontend Vite build (ensures no broken imports)
|
||||
Only after all checks pass can the PR be merged and deployed.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Profile CRUD: `internal/api/handler/profiles.go`, `GET /api/v1/profiles` / `POST` / `PUT` / `DELETE`
|
||||
- Policy CRUD: `internal/api/handler/policies.go`
|
||||
- Target CRUD: `internal/api/handler/targets.go`
|
||||
- Audit trail: `internal/api/handler/audit.go`, `GET /api/v1/audit` (records action, actor, resource_id, timestamp)
|
||||
- CI configuration: `.github/workflows/ci.yml` (test, vet, coverage gates, build checks)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Change Approval Workflow** — Optional approval gate before profile/policy changes go live
|
||||
- **Feature Flags** — Enable/disable new features without redeployment (backward compatibility during rolling upgrades)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Implement formal change control (ticket system, approval, peer review)
|
||||
- Document the business justification for profile/policy changes
|
||||
- Test changes in a non-production environment before deploying to production
|
||||
- Have a rollback plan (can you revert a profile change instantly if it breaks issuance?)
|
||||
- Include certctl configuration changes in your change log (for audits and incident investigations)
|
||||
- Version control your certctl configuration (Docker Compose file, environment variables) so you can track changes
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| SOC 2 Criterion | certctl Feature | Evidence Location | V2 (Free) | V3 (Pro) | Operator Responsibility |
|
||||
|---|---|---|---|---|---|
|
||||
| **CC6.1** Logical Access Security | API Key Authentication (SHA-256 hashed, constant-time comparison) | `internal/api/middleware/auth.go` | ✅ | Enhanced | API key generation, distribution, rotation |
|
||||
| | GUI Login with API Key | `web/src/pages/LoginPage.tsx` | ✅ | Enhanced (OIDC) | NA |
|
||||
| | CORS Allowlist | `CERTCTL_CORS_ORIGINS` env var | ✅ | ✅ | Configure appropriately |
|
||||
| | Token Bucket Rate Limiting | `internal/api/middleware/rate_limit.go` | ✅ | ✅ | Monitor for brute-force attempts |
|
||||
| **CC6.2** Prior to Issuing System Credentials | Ownership Attribution | `GET /api/v1/owners`, audit trail records owner assignment | ✅ | Enhanced (RBAC) | Map to org structure, remove on departure |
|
||||
| | Team Assignment | `GET /api/v1/teams` | ✅ | ✅ | NA |
|
||||
| | Actor Attribution in Audit Trail | `GET /api/v1/audit` (actor field) | ✅ | ✅ | Justify all changes via separate documentation |
|
||||
| **CC6.3** Authentication Policies | API Key Enforcement | `CERTCTL_AUTH_TYPE=api-key` (default) | ✅ | Enhanced (OIDC, MFA) | Document policy, test failures, integrate into IAM audit |
|
||||
| | Agent Authentication | Separate API keys for agents | ✅ | ✅ | Rotate agent keys, monitor compromise |
|
||||
| | Agent-Side Key Generation | `CERTCTL_KEYGEN_MODE=agent` (default) | ✅ | ✅ | Protect agent filesystem keys via encryption/backup |
|
||||
| | Private Key Policy | Server-side keygen logs warning, disabled in production | ✅ | ✅ | Never use server-side keygen in production |
|
||||
| **CC6.7** Information Transmission Protection | TLS for Control Plane | Deploy behind TLS-terminating reverse proxy | ✅ | ✅ | Enable TLS in production via reverse proxy |
|
||||
| | Agent-to-Server HTTPS | Agents use HTTPS for all API calls | ✅ | ✅ | Enforce TLS via firewall rules |
|
||||
| | Private Key Isolation | Agent-side keygen (ECDSA P-256), keys stored 0600 on agent FS | ✅ | ✅ | Encrypt agent filesystems, backup securely |
|
||||
| | Pull-Only Deployment | Server never initiates outbound to agents/targets | ✅ | Enhanced (HSM, proxy agents) | Encrypt agent↔target comms, isolate proxy agents |
|
||||
| **CC7.1** System Monitoring | Health Endpoint | `GET /health`, `GET /ready` | ✅ | ✅ | Integrate into monitoring (Prometheus, DataDog) |
|
||||
| | Metrics JSON Endpoint | `GET /api/v1/metrics` (gauges, counters, uptime) | ✅ | ✅ | Set thresholds, configure alerting |
|
||||
| | Stats API (time-series) | `GET /api/v1/stats/*` (summary, status, expiration, jobs, issuance) | ✅ | ✅ | Integrate into dashboards, SLO tracking |
|
||||
| | Structured Logging | `slog` middleware with request IDs | ✅ | ✅ | Aggregate logs to SIEM, define retention policy |
|
||||
| | Background Scheduler | 12 loops (8 always-on: renewal 1h, jobs 30s, job retry 5m I-001, job timeout 10m I-003, health 2m, notifications 1m, notif retry 2m I-005, short-lived 30s; 4 opt-in: network scan 6h, digest 24h, endpoint health 60s M48, cloud discovery 6h M50) | ✅ | ✅ | Alert on scheduler loop failures |
|
||||
| **CC7.2** Anomaly Detection | Immutable API Audit Trail | `internal/api/middleware/audit.go`, `GET /api/v1/audit` | ✅ | Enhanced (SIEM export) | Integrate into SIEM, search for anomalies, archive long-term |
|
||||
| | Expiration Threshold Alerting | Configurable per-policy (default 30/14/7/0 days) | ✅ | ✅ | Configure thresholds, integrate notifications |
|
||||
| | Status Auto-Transitions | Active → Expiring (30d) → Expired (0d) | ✅ | ✅ | Monitor status changes in audit trail |
|
||||
| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
|
||||
| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
|
||||
| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
|
||||
| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
|
||||
| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
|
||||
| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
|
||||
| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
|
||||
| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
|
||||
| | Agent Health Monitoring | Health check loop (ping every 2m, mark unhealthy after 3 misses) | ✅ | ✅ | Alert on unhealthy agents, investigate |
|
||||
| | Job Cancellation | `POST /api/v1/jobs/{id}/cancel` | ✅ | ✅ | Test in staging |
|
||||
| | Interactive Approval | AwaitingApproval state, `POST /api/v1/jobs/{id}/approve\|reject` | ✅ | ✅ | Define approval policy, audit decisions |
|
||||
| | Certificate Discovery | Agents scan directories, triage (claim/dismiss) | ✅ | ✅ | Review discovered certs regularly |
|
||||
| **A1.1/A1.2** Availability and Recovery | Health Probes (Docker, Kubernetes) | `/health` and `/ready` endpoints | ✅ | ✅ | Use in container orchestration |
|
||||
| | Idempotent Migrations | `IF NOT EXISTS`, `ON CONFLICT ... DO NOTHING` | ✅ | ✅ | Test migration replay in staging |
|
||||
| | Agent Panic Recovery | Panic recovery in job loops | ✅ | ✅ | Monitor agent crashes in logs |
|
||||
| | Exponential Backoff | Agent heartbeat/work poll backoff (1s → 5m) | ✅ | ✅ | Monitor for control plane downtime |
|
||||
| | PostgreSQL Connection Pooling | MaxOpenConns=25, MaxIdleConns=5 (configurable) | ✅ | ✅ | Monitor connection usage |
|
||||
| **CC8.1** Change Control | Certificate Profiles | CRUD API + GUI, profile changes audited | ✅ | ✅ | Formal change control, test in staging |
|
||||
| | Policy Engine + Violations | CRUD API + GUI, policy changes audited | ✅ | ✅ | Document justification, implement approval workflow |
|
||||
| | Target Registration | CRUD API + GUI, changes audited | ✅ | ✅ | Confirm deletions, version control config |
|
||||
| | Immutable Audit Trail | Append-only `audit_events` table | ✅ | ✅ | Encrypt at rest, archive long-term, no manual edits |
|
||||
| | GitHub Actions CI | Unit tests, vet, coverage gates, build checks | ✅ | ✅ | Review PRs before merge, maintain test quality |
|
||||
|
||||
---
|
||||
|
||||
## What Requires Operator Action
|
||||
|
||||
**certctl is a tool, not a complete compliance solution.** Your organization must handle:
|
||||
|
||||
1. **Physical Security** — Protect the infrastructure (servers, network) running certctl. Certctl can't control who has physical access to your datacenter.
|
||||
|
||||
2. **Personnel Background Checks** — Before granting anyone API key access, conduct background checks per your policy. Certctl records *who* accessed *what*, but doesn't verify that people are trustworthy.
|
||||
|
||||
3. **Formal Incident Response Plan** — Certctl provides incident detection (anomalies in audit trail) and tools for response (revocation, rollback), but you must define *when* to use them and *who* decides.
|
||||
|
||||
4. **Access Review and Removal** — Certctl stores ownership, teams, and API keys. You must:
|
||||
- Regularly review who has access (quarterly or semi-annually)
|
||||
- Immediately revoke API keys for departing employees
|
||||
- Audit that removed access is actually removed (test that old keys fail)
|
||||
|
||||
5. **Log Retention and Archival** — Certctl logs to stdout (Docker) and stores audit events in PostgreSQL. You must:
|
||||
- Ship logs to a long-term archive (SIEM, S3, or equivalent)
|
||||
- Define retention policy (often 1-7 years per industry regulation)
|
||||
- Encrypt archived logs
|
||||
- Test that you can retrieve logs from archive (restoration drills)
|
||||
|
||||
6. **Encryption at Rest** — PostgreSQL data (including audit trail) is stored on disk. You must:
|
||||
- Enable transparent data encryption (TDE) on your database VM
|
||||
- Encrypt container persistent volumes (if using Kubernetes)
|
||||
- Encrypt database backups
|
||||
|
||||
7. **Network Segmentation** — Certctl API and database must be protected by network access controls. You must:
|
||||
- Firewall the control plane (only authorized services can connect)
|
||||
- Use VPN or private networks for agent-to-server communication
|
||||
- Isolate proxy agents (for F5, IIS, etc.) in the same network zone as their targets
|
||||
|
||||
8. **Capacity Planning** — Certctl's performance scales with your PostgreSQL. You must:
|
||||
- Estimate certificate inventory size (10k, 100k, 1M certs?)
|
||||
- Test Certctl with your expected scale in staging
|
||||
- Monitor disk usage, CPU, memory
|
||||
- Plan for growth (add PostgreSQL replicas, increase connection pool, etc.)
|
||||
|
||||
9. **Disaster Recovery** — Certctl data lives in PostgreSQL. You must:
|
||||
- Back up PostgreSQL regularly (daily or hourly, depending on RPO)
|
||||
- Test restore process in staging (broken backups discovered during incidents)
|
||||
- Have a runbook for failover to replica or recovery from backup
|
||||
- Document RTO/RPO targets (how long can cert management be down? how much data can you afford to lose?)
|
||||
|
||||
10. **Integration with Your IAM** — If using OIDC/SSO (V3), you must:
|
||||
- Configure your OIDC provider (Okta, Azure AD, Google)
|
||||
- Map user groups to Certctl roles (Admin, Operator, Viewer)
|
||||
- Manage MFA policy (enforce MFA if required)
|
||||
- Audit user provisioning/deprovisioning
|
||||
|
||||
11. **Documentation and Runbooks** — Certctl documents *what it does* (this guide), but you must document:
|
||||
- Your organization's certificate lifecycle policy (who requests, who approves, who deploys)
|
||||
- How to respond to specific incidents (cert compromise, CA compromise, agent down, renewal failed)
|
||||
- How to operate certctl (day-to-day tasks, escalation procedures)
|
||||
- Contact info for on-call teams
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements
|
||||
|
||||
**certctl Pro (V3, paid edition) adds features that significantly strengthen SOC 2 evidence:**
|
||||
|
||||
- **OIDC / SSO Integration** — Integrate with Okta, Azure AD, Google to replace API keys with federated identity. Enables MFA enforcement and centralized access management. Auditors love federated identity (easier to remove access at source).
|
||||
|
||||
- **Role-Based Access Control (RBAC)** — Predefined roles (Admin: full access; Operator: issue/renew/revoke, no policy changes; Viewer: read-only) with profile-gated enforcement. Allows separation of duties (e.g., junior operator can't change global policy).
|
||||
|
||||
- **NATS Event Bus** — Real-time audit streaming to your SIEM. Hybrid model: HTTP for synchronous APIs, NATS for async events (cert.issued, cert.expiring, agent.heartbeat, job.completed). JetStream persistence for replay and durability.
|
||||
|
||||
- **SIEM Export** — Automated export of audit trail to Splunk, ELK, DataDog, etc. (webhooks, syslog, or pull-based APIs). Makes it easy for security teams to hunt for anomalies.
|
||||
|
||||
- **Advanced Search DSL** — `POST /api/v1/search` with tree-based filters (nested AND/OR, regex, field projection). Enables complex compliance queries (e.g., "all certs issued in the last 30 days by team X that are longer than 1 year").
|
||||
|
||||
- **Bulk Revocation** — Revoke all certs issued by a profile, owner, or agent in one operation. Critical for large-scale incidents (e.g., "a team's CA key was compromised, revoke all their certs").
|
||||
|
||||
- **Certificate Health Scores** — Composite risk scoring (e.g., "this cert has no short-lived TTL enforcement, extends past your policy max, and hasn't been renewed in 2 years" → health=30%). Helps prioritize remediation.
|
||||
|
||||
- **Compliance Scoring** — Audit readiness reporting per certificate (e.g., "compliance=95% — missing only a 3-year max-TTL constraint"). Exportable compliance report.
|
||||
|
||||
- **DigiCert Issuer Connector** — OV/EV certificate issuance for public-facing services (web servers, CDNs). Complements Local CA for internal use.
|
||||
|
||||
- **CT Log Monitoring** — Passive detection of unauthorized cert issuance. Monitors public CT logs for certs matching your domains and alerts if unexpected certs appear (e.g., attacker obtained a cert for your domain).
|
||||
|
||||
- **F5 BIG-IP Implementation** — Full target connector with iControl REST API. Agents can deploy certs to F5 load balancers.
|
||||
|
||||
- **IIS Implementation** — Dual-mode: agent-local PowerShell (default) for servers with agents, or proxy agent WinRM (agentless targets). Full Windows Server integration.
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
certctl provides a strong foundation for SOC 2 compliance with API key authentication, immutable audit logging, automated alerting, and revocation capabilities. However, SOC 2 audits require evidence across your entire infrastructure — certctl is one piece. Use this guide to map certctl features to your audit questionnaire, then work with your auditors to identify gaps that must be filled by your own organizational policies and controls.
|
||||
|
||||
For a deeper SOC 2 discussion or a mock audit against this guide, contact your certctl Pro support team.
|
||||
@@ -1,122 +0,0 @@
|
||||
# Compliance Mapping Guides
|
||||
|
||||
certctl is a certificate lifecycle management tool, not a compliance product. It doesn't make you compliant — your organization, policies, and processes do that. What certctl provides is tooling that supports the technical controls auditors and evaluators look for when assessing certificate and key management practices.
|
||||
|
||||
These guides map certctl's features to three widely referenced compliance frameworks. They're designed for security engineers, IT auditors, and procurement teams evaluating certctl for environments with regulatory requirements.
|
||||
|
||||
## What's Covered
|
||||
|
||||
**[SOC 2 Type II](compliance-soc2.md)** — Maps certctl features to AICPA Trust Service Criteria. Covers logical access controls (CC6), system operations and monitoring (CC7), change management (CC8), and availability (A1). Most relevant for organizations undergoing SOC 2 audits where certificate management is in scope.
|
||||
|
||||
**[PCI-DSS 4.0](compliance-pci-dss.md)** — Maps certctl features to PCI Data Security Standard version 4.0 requirements. Covers data-in-transit protection (Req 4), cryptographic key management (Req 3), authentication (Req 8), audit logging (Req 10), secure development (Req 6), and access control (Req 7). Most relevant for organizations handling cardholder data where TLS certificates protect transmission channels.
|
||||
|
||||
**[NIST SP 800-57](compliance-nist.md)** — Maps certctl's key management practices to NIST Special Publication 800-57 Part 1 Rev 5 (2020). Covers key generation, storage, cryptoperiods, key state lifecycle, algorithm selection, key transport, and revocation. Most relevant for organizations aligning with US federal cryptographic guidance or using NIST as a key management baseline.
|
||||
|
||||
## What These Guides Are Not
|
||||
|
||||
These are mapping guides, not certification claims. certctl is not SOC 2 certified, PCI-DSS validated, or NIST-assessed. The guides document how certctl's technical implementation supports the controls these frameworks require — they do not replace your auditor's assessment, your organization's policies, or your security team's judgment.
|
||||
|
||||
The guides also clearly identify gaps where certctl's current implementation doesn't fully align with a framework's recommendations, features planned for future versions, and areas where operator action is required regardless of what certctl provides.
|
||||
|
||||
## How to Use These Guides
|
||||
|
||||
If you're evaluating certctl for a regulated environment, start with the framework your auditor cares about. Each guide includes an evidence summary table mapping specific compliance criteria to certctl features, API endpoints, and configuration — the kind of specifics your auditor will ask for.
|
||||
|
||||
If you're preparing for an audit and certctl is already deployed, use the "Operator Responsibilities" section of each guide to identify what your organization must manage beyond what certctl provides.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Framework | Primary Concern | Key certctl Features |
|
||||
|---|---|---|
|
||||
| SOC 2 Type II | Trust service criteria for SaaS/infrastructure | API audit trail, auth controls, monitoring, change management |
|
||||
| PCI-DSS 4.0 | Cardholder data protection | TLS lifecycle, key management, immutable logging, access control |
|
||||
| NIST SP 800-57 | Cryptographic key management | Agent-side keygen, key isolation, algorithm selection, revocation |
|
||||
|
||||
## Audit-Trail Integrity & Privacy (Bundle 6)
|
||||
|
||||
Two complementary controls protect the `audit_events` table against tampering and minimize PII exposure. Both apply automatically — no operator action is required at install time, but operators must understand the contract before responding to a legal-hold or retention request.
|
||||
|
||||
### Append-Only Enforcement (HIPAA §164.312(b))
|
||||
|
||||
<!-- Source: migrations/000018_audit_events_worm.up.sql -->
|
||||
|
||||
`audit_events` rows cannot be modified or deleted by the application role. Two layers:
|
||||
|
||||
| Layer | Mechanism | Surface |
|
||||
|---|---|---|
|
||||
| **DB trigger** | `audit_events_block_modification()` raises `check_violation` on `BEFORE UPDATE OR DELETE` | Catches any UPDATE / DELETE — including direct `psql` from the app role |
|
||||
| **App-role grant** | `REVOKE UPDATE, DELETE ON audit_events FROM certctl` | Defence-in-depth; the app role can't even attempt the modification |
|
||||
|
||||
**Verification.** From a `psql` session connected as the `certctl` app role:
|
||||
|
||||
```sql
|
||||
UPDATE audit_events SET actor = 'tampered' WHERE id = 'audit-001';
|
||||
-- ERROR: audit_events is append-only (Bundle-6 / M-017 / HIPAA §164.312(b))
|
||||
-- HINT: Use a compliance superuser role for legitimate retention operations.
|
||||
```
|
||||
|
||||
**Compliance superuser pattern.** Legitimate retention work (legal hold, GDPR right-to-be-forgotten, statutory purges) requires a separate PostgreSQL role provisioned out-of-band that bypasses the trigger. Certctl does NOT auto-create this role — operators provision it per their compliance policy. Suggested shape:
|
||||
|
||||
```sql
|
||||
-- One-time setup by a DBA. Stored procedure pattern keeps the
|
||||
-- compliance superuser audit-able too: every invocation should
|
||||
-- itself land in audit_events.
|
||||
CREATE ROLE certctl_compliance LOGIN PASSWORD '<strong-secret>';
|
||||
GRANT UPDATE, DELETE ON audit_events TO certctl_compliance;
|
||||
-- (optional) provision SECURITY DEFINER stored procedures that
|
||||
-- (a) record the retention reason in audit_events as the FIRST step
|
||||
-- (b) then perform the UPDATE/DELETE
|
||||
-- (c) all under the certctl_compliance role's grants.
|
||||
```
|
||||
|
||||
### Body Redaction (GDPR Art. 32, CWE-532)
|
||||
|
||||
<!-- Source: internal/service/audit_redact.go -->
|
||||
|
||||
`AuditService.RecordEvent` routes every `details` map through `RedactDetailsForAudit` BEFORE marshaling to the JSONB column. Two deny-lists:
|
||||
|
||||
| Category | Match | Replacement | Examples |
|
||||
|---|---|---|---|
|
||||
| **Credentials** | case-insensitive key match | `"[REDACTED:CREDENTIAL]"` | `api_key`, `password`, `token`, `*_pem`, `eab_secret`, `acme_account_key`, `signature` |
|
||||
| **PII** | case-insensitive key match | `"[REDACTED:PII]"` | `email`, `phone`, `ssn`, `dob`, `name`, `address`, `postal_code`, `ip_address` |
|
||||
|
||||
Nested maps and arrays are walked recursively — sensitive keys at any depth get scrubbed. The redactor is mutation-free (the caller's original map is unchanged) so service-layer code that reuses the map elsewhere is safe.
|
||||
|
||||
**Operator visibility — `redacted_keys` array.** The redacted map includes a `redacted_keys` array listing every dotted-path that was scrubbed. This surfaces the redaction footprint to compliance auditors without exposing values. Example before/after:
|
||||
|
||||
```jsonc
|
||||
// Caller's input map (e.g., from a service handler):
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "abc123secret",
|
||||
"contact": { "email": "ops@example.com", "role": "admin" }
|
||||
}
|
||||
}
|
||||
|
||||
// Persisted in audit_events.details:
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "[REDACTED:CREDENTIAL]",
|
||||
"contact": { "email": "[REDACTED:PII]", "role": "admin" }
|
||||
},
|
||||
"redacted_keys": ["config.eab_secret", "config.contact.email"]
|
||||
}
|
||||
```
|
||||
|
||||
**Maintenance.** When introducing a new credential-bearing field anywhere in the codebase, add the key name to `credentialKeys` (or `piiKeys`) in `internal/service/audit_redact.go`. The unit test suite in `audit_redact_test.go` exercises every entry and proves case-insensitivity + JSON round-trip safety.
|
||||
|
||||
## certctl Pro (V3) Enhancements
|
||||
|
||||
Several compliance-relevant features are planned for certctl Pro:
|
||||
|
||||
- **OIDC/SSO** — Enterprise identity provider integration (SOC 2 CC6.1, PCI-DSS 8.3)
|
||||
- **RBAC** — Role-based access control with admin/operator/viewer roles (SOC 2 CC6.3, PCI-DSS 7.2)
|
||||
- **NATS Audit Streaming** — Real-time audit event streaming to SIEM systems (SOC 2 CC7.2, PCI-DSS 10.2)
|
||||
- **Bulk Revocation** — Fleet-wide incident response capability (NIST SP 800-57 Section 5.4)
|
||||
- **Health/Compliance Scoring** — Automated compliance posture assessment per certificate
|
||||
@@ -1,7 +1,9 @@
|
||||
# CI Pipeline — Operator Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> Authoritative guide to certctl's CI pipeline shape.
|
||||
> Per `cowork/ci-pipeline-cleanup-prompt.md` Phase 12.
|
||||
> Per the ci-pipeline-cleanup spec, Phase 12.
|
||||
|
||||
## Trigger model
|
||||
|
||||
@@ -0,0 +1,68 @@
|
||||
# GUI QA Checklist
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Manual GUI verification pass for release sign-off. Vitest covers component-level behavior; this checklist covers end-to-end flows that only land correctly when the React SPA, the REST API, and the database are all wired together.
|
||||
|
||||
## Prereqs
|
||||
|
||||
The full stack must be running and healthy per [`qa-prerequisites.md`](qa-prerequisites.md). Open `https://localhost:8443` in a fresh browser session (Incognito / Private mode is fine — avoids cached state from previous QA passes).
|
||||
|
||||
## Pages to verify
|
||||
|
||||
For each page, the verification is "open it, confirm it renders without console errors, exercise the documented action, confirm the action lands as expected."
|
||||
|
||||
| Page | Action to verify | Expected result |
|
||||
|---|---|---|
|
||||
| `/dashboard` | Page loads, all 4 stat cards populate | Total / Active / Expiring / Expired counts match `GET /api/v1/stats/summary` |
|
||||
| `/certificates` | Inventory list paginates | "Next page" button works; URL updates with cursor; row count consistent |
|
||||
| `/certificates/<id>` | Detail page opens for any cert | Cert chain renders, deployment status shows, audit timeline visible |
|
||||
| `/issuers` | Catalog renders all configured issuers | Each issuer card shows last-used / status; clicking opens detail |
|
||||
| `/issuers/<id>` | Issuer config form | Edit + Save round-trips through `PATCH /api/v1/issuers/<id>` |
|
||||
| `/issuers/hierarchy` | CA tree view | Multi-level hierarchy renders; admin-gated CRUD buttons present for admins only |
|
||||
| `/agents` | Fleet view | Online/offline status accurate; OS/arch grouping correct |
|
||||
| `/agents/<id>` | Agent detail | Last heartbeat, registered date, deployment job history |
|
||||
| `/agents/groups` | Agent groups CRUD | Create + edit + delete a test group; verify dynamic membership matching |
|
||||
| `/jobs` | Job queue | Filter by status / type works; click into a job opens detail |
|
||||
| `/jobs/<id>` | Job detail | Status, retries, logs, owner attribution |
|
||||
| `/policies` | Renewal policies CRUD | Edit AlertChannels matrix, save, verify backend reflects change |
|
||||
| `/profiles` | Certificate profiles | EKU constraints + max TTL editable; profile binding works |
|
||||
| `/notifications` | Notifier config | Test connection button against each configured notifier |
|
||||
| `/discovery` | Discovery triage | Claim / Dismiss buttons round-trip to backend |
|
||||
| `/network-scans` | Scan target CRUD | Create scan target, trigger immediate scan, results appear |
|
||||
| `/audit` | Audit trail | Filter by actor / action / time range; CSV export works |
|
||||
| `/short-lived` | Short-lived credential dashboard | Live TTL countdown updates; auto-refresh every 10s |
|
||||
| `/observability` | Observability dashboard | Charts render: expiration heatmap, renewal trends, issuance rate |
|
||||
| `/health` | Health monitor | TLS endpoint health: healthy / degraded / down states accurate |
|
||||
| `/digest` | Digest preview | Email preview renders; "Send digest" button dispatches |
|
||||
| `/owners` | Owners CRUD | Create owner with team, edit, delete (after reassigning certs) |
|
||||
| `/teams` | Teams CRUD | Create + delete; verify cascade removes orphan owners |
|
||||
| `/scep` | SCEP admin tabs | Profiles / Intune Monitoring / Recent Activity all populate |
|
||||
| `/est` | EST admin tabs | Profiles / Recent Activity / Trust Bundle all populate |
|
||||
| `/login` | Login flow | API key entry persists for the session; bad key rejected |
|
||||
|
||||
## Console hygiene
|
||||
|
||||
Open browser DevTools and confirm:
|
||||
|
||||
- No uncaught exceptions on any page
|
||||
- No 404 / 500 responses in the Network tab from API calls
|
||||
- No CORS errors
|
||||
- No CSP violations
|
||||
|
||||
## Mobile / narrow-viewport
|
||||
|
||||
The dashboard is desktop-first but should not break catastrophically on narrow viewports. Resize the browser to 380px width; confirm:
|
||||
|
||||
- Sidebar collapses to a hamburger menu
|
||||
- Tables either scroll horizontally or stack on mobile
|
||||
- Forms remain usable
|
||||
|
||||
## Accessibility spot-check
|
||||
|
||||
- Tab through any single page using only the keyboard. Every interactive element must be reachable, and the focus indicator must be visible.
|
||||
- Lighthouse accessibility audit on `/dashboard`: target ≥ 90.
|
||||
|
||||
## Sign-off
|
||||
|
||||
Document any deviations in the release sign-off matrix at [`release-sign-off.md`](release-sign-off.md).
|
||||
@@ -0,0 +1,99 @@
|
||||
# QA Prerequisites
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operational prereqs for running release QA against certctl. Before any of the contributor-facing testing surfaces (test-environment.md, gui-qa-checklist.md, release-sign-off.md) are useful, the local stack needs to be in a known-good state.
|
||||
|
||||
## Why manual QA on top of automated tests?
|
||||
|
||||
Automated tests mock dependencies and run in isolation. Manual QA validates the full integrated stack: real PostgreSQL, real HTTP, real agent binary, real file I/O, real scheduler timing. It catches issues that unit tests can't: migration ordering, Docker networking, env var parsing, browser rendering, and timing-dependent scheduler behavior.
|
||||
|
||||
## Environment setup
|
||||
|
||||
**Step 1: Start the full stack.**
|
||||
|
||||
```bash
|
||||
cd deploy && docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -d
|
||||
```
|
||||
|
||||
This builds three containers (postgres, certctl-server, certctl-agent) and runs them on a bridge network. The `--build` flag ensures you're testing the current code, not a stale image. The `demo` overlay is an override file (no `image:` or `build:` of its own) that layers `CERTCTL_DEMO_SEED=true` onto the base — both files must be passed in that order or compose errors with `service "certctl-server" has neither an image nor a build context specified`. The seed populates the database with realistic fixtures.
|
||||
|
||||
**Step 2: Wait for healthy state.**
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 30); do
|
||||
STATUS=$(docker compose ps --format json 2>/dev/null | jq -r 'select(.Health != null) | "\(.Name): \(.Health)"' 2>/dev/null)
|
||||
echo "$STATUS"
|
||||
echo "$STATUS" | grep -q "unhealthy\|starting" || break
|
||||
sleep 2
|
||||
done
|
||||
```
|
||||
|
||||
Why: Docker Compose starts containers in dependency order (postgres → server → agent), but "started" doesn't mean "ready." Health checks confirm postgres accepts connections, the server responds on `/health`, and the agent process is running.
|
||||
|
||||
**Step 3: Set shell variables used throughout the QA flow.**
|
||||
|
||||
```bash
|
||||
export SERVER=https://localhost:8443
|
||||
export API_KEY="change-me-in-production"
|
||||
export AUTH="Authorization: Bearer $API_KEY"
|
||||
export CT="Content-Type: application/json"
|
||||
export CACERT="--cacert ./deploy/test/certs/ca.crt"
|
||||
```
|
||||
|
||||
Every curl command in QA docs uses these variables. Setting them once avoids typos and keeps the docs copy-pasteable.
|
||||
|
||||
> **Note:** The default Docker Compose sets `CERTCTL_AUTH_TYPE: none` for the demo overlay, meaning auth is disabled. Tests that exercise auth require flipping this to `api-key`; instructions are in the relevant test docs.
|
||||
|
||||
**Step 4: Build CLI and MCP server binaries on the host.**
|
||||
|
||||
```bash
|
||||
go build -o certctl-cli ./cmd/cli/...
|
||||
go build -o certctl-mcp ./cmd/mcp-server/...
|
||||
```
|
||||
|
||||
The CLI and MCP server are separate binaries that talk to the server over HTTP. Building them verifies the code compiles and produces the executables you'll test later.
|
||||
|
||||
## Demo data baseline
|
||||
|
||||
The seed data (`migrations/seed.sql` + `migrations/seed_demo.sql`) pre-populates the database with realistic fixtures. Confirm it loaded:
|
||||
|
||||
```bash
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary | jq .
|
||||
```
|
||||
|
||||
**Expected shape:**
|
||||
|
||||
```json
|
||||
{
|
||||
"total_certificates": 15,
|
||||
"active_certificates": ...,
|
||||
"expiring_certificates": ...,
|
||||
"expired_certificates": ...,
|
||||
"pending_renewals": ...
|
||||
}
|
||||
```
|
||||
|
||||
**Reference IDs in the demo data** (used across QA docs):
|
||||
|
||||
| Resource | IDs | Count |
|
||||
|---|---|---|
|
||||
| Teams | `t-platform`, `t-security`, `t-payments`, `t-frontend`, `t-data` | 5 |
|
||||
| Owners | `o-alice`, `o-bob`, `o-carol`, `o-dave`, `o-eve` | 5 |
|
||||
| Policies | `rp-standard`, `rp-urgent`, `rp-manual` | 3 |
|
||||
| Issuers | `iss-local`, `iss-acme-le`, `iss-stepca`, `iss-digicert` | 4 |
|
||||
| Agents | `ag-web-prod`, `ag-web-staging`, `ag-lb-prod`, `ag-iis-prod`, `ag-data-prod` | 5 |
|
||||
| Targets | `tgt-nginx-prod`, `tgt-nginx-staging`, `tgt-f5-prod`, `tgt-iis-prod`, `tgt-nginx-data` | 5 |
|
||||
| Profiles | `prof-standard-tls`, `prof-internal-mtls`, `prof-short-lived`, `prof-high-security` | 4 |
|
||||
| Certificates | `mc-api-prod`, `mc-web-prod`, `mc-pay-prod`, etc. | 15 |
|
||||
| Agent Groups | `ag-linux-prod`, `ag-linux-amd64`, `ag-windows`, `ag-datacenter-a`, `ag-manual` | 5 |
|
||||
| Network Scan Targets | `nst-dc1-web`, `nst-dc2-apps`, `nst-dmz` | 3 |
|
||||
|
||||
## Once these are green
|
||||
|
||||
Move to the appropriate downstream surface:
|
||||
|
||||
- [`test-environment.md`](test-environment.md) — full local environment tutorial with real CAs (Pebble, step-ca, etc.)
|
||||
- [`gui-qa-checklist.md`](gui-qa-checklist.md) — manual GUI test pass
|
||||
- [`release-sign-off.md`](release-sign-off.md) — release-day checklist
|
||||
- [`testing-strategy.md`](testing-strategy.md) — what we test in CI vs daily deep-scan vs manual QA
|
||||
@@ -1,14 +1,16 @@
|
||||
# QA Test Suite Guide (`qa_test.go`)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
|
||||
>
|
||||
> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
|
||||
> **Self-contained.** Through 2026-05-04 this doc was a companion to a separate `docs/testing-guide.md` (the *what* to test) — that companion was pruned during the Phase 5 docs overhaul (its content dispersed across the audience-organized doc tree). The Part-by-Part Coverage Map below is now the canonical inventory of QA Parts.
|
||||
|
||||
---
|
||||
|
||||
## Test Suite Health (regenerate via `make qa-stats`)
|
||||
|
||||
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; CI's QA-doc drift guards (`.github/workflows/ci.yml`) catch out-of-date Part / cert / issuer counts on every PR. **Last regenerated: 2026-04-27 (Bundle P).**
|
||||
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; the QA-doc seed-count drift guard (`.github/workflows/ci.yml::QA-doc seed-count drift guard`) catches out-of-date cert / issuer counts on every PR. The Part-count drift guard retired in the 2026-05-04 docs overhaul Phase 5 (testing-guide.md was pruned; Part counts are now tracked inside `qa_test.go` itself, not against an external doc). **Last regenerated: 2026-04-27 (Bundle P).**
|
||||
|
||||
| Metric | Value | Target | Status |
|
||||
|---|---|---|---|
|
||||
@@ -18,23 +20,22 @@
|
||||
| Frontend test files | 38 | n/a | ℹ |
|
||||
| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
|
||||
| `t.Skip` sites | 60 | each carries valid rationale (Bundle O audit) | ✓ |
|
||||
| `qa_test.go` Part_* subtests | 53 | tracks `testing-guide.md` Parts (3 `## Part 15-17` covered indirectly via Parts 42–46) | ✓ |
|
||||
| `testing-guide.md` Parts | 56 | n/a | ℹ |
|
||||
| `qa_test.go` Part_* subtests | 53 | covers 49 of 56 historical QA Parts directly + Parts 15–17 indirectly via Parts 42–46 | ✓ |
|
||||
| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in `coverage-matrix.md` |
|
||||
| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
|
||||
| Race detector clean (`-count=10`) | partial (`-count=3` clean per Phase 0) | 0 races | ⚠ |
|
||||
|
||||
## What Is This File?
|
||||
|
||||
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
|
||||
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates the historical QA Part inventory (preserved in the Part-by-Part Coverage Map below) against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
|
||||
|
||||
It covers **49 of 56 Parts** of the testing guide as automation; the remaining 7 are
|
||||
either manual-only by design or pending QA-suite coverage:
|
||||
|
||||
- **49 `Part_*` automation wrappers**, **~159 leaf subtests** — API calls, database queries, source file checks, performance benchmarks
|
||||
- **11 fully skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
|
||||
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually per `docs/testing-guide.md` until QA-suite automation lands
|
||||
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
|
||||
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually until QA-suite automation lands; the Part-by-Part Coverage Map below describes the surface area each Part covers
|
||||
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human (Coverage Map below describes each)
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -147,8 +148,8 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
|
||||
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
|
||||
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
|
||||
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually per `docs/testing-guide.md::Part 23` |
|
||||
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually per `docs/testing-guide.md::Part 24` |
|
||||
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually — see the Coverage Map row |
|
||||
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually — see the Coverage Map row |
|
||||
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
|
||||
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
|
||||
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
|
||||
@@ -163,7 +164,7 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
|
||||
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
|
||||
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
|
||||
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
|
||||
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
|
||||
@@ -178,12 +179,12 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
|
||||
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
|
||||
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
|
||||
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually per `docs/testing-guide.md::Part 55` |
|
||||
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually per `docs/testing-guide.md::Part 56` |
|
||||
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually — see the Coverage Map row |
|
||||
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually — see the Coverage Map row |
|
||||
|
||||
**Totals (verified 2026-04-27):** 49 `Part_*` automation wrappers, ~159 leaf subtests, 11 fully
|
||||
skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
|
||||
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE '^## Part [0-9]+:' docs/testing-guide.md`
|
||||
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to count Part_* automation wrappers
|
||||
and `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to re-verify.
|
||||
|
||||
## Coverage by Risk Class
|
||||
@@ -192,14 +193,14 @@ A buyer's QA lead reading this doc wants "where are the existential bugs caught?
|
||||
|
||||
| Risk class | Description | Parts in scope | Automation status |
|
||||
|---|---|---|---|
|
||||
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in `testing-guide.md`) |
|
||||
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in the Coverage Map below) |
|
||||
| **High** (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
|
||||
| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 42–46) |
|
||||
| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
|
||||
| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
|
||||
| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
| **Audit-relevant** | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
|
||||
This is the table acquisition reviewers screenshot for their report. When a new Part lands in `testing-guide.md`, classify it here; the QA-doc Part-count drift guard (`.github/workflows/ci.yml::QA-doc Part-count drift guard`) catches the count mismatch.
|
||||
This is the table acquisition reviewers screenshot for their report. When a new Part_* subtest lands in `qa_test.go`, classify it here.
|
||||
|
||||
## Test Categories
|
||||
|
||||
@@ -231,11 +232,11 @@ Timed API requests with threshold assertions:
|
||||
|
||||
## What This Test Does NOT Cover
|
||||
|
||||
These gaps must be filled by manual testing per `docs/testing-guide.md`:
|
||||
These gaps must be filled by manual testing — see each Coverage Map row for surface-area description:
|
||||
|
||||
### Not Yet Automated (Parts 23, 24, 55, 56)
|
||||
|
||||
These Parts are documented in `docs/testing-guide.md` but have no `Part_*` automation
|
||||
These historical QA Parts are listed in the Coverage Map below but have no `Part_*` automation
|
||||
in `qa_test.go` yet. They are operator-runnable from the manual playbook; QA-suite
|
||||
automation should land before the next acquisition-grade release.
|
||||
|
||||
@@ -429,7 +430,7 @@ grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
|
||||
|
||||
When a new feature ships:
|
||||
|
||||
1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
|
||||
1. **Add a Part section** in `qa_test.go` following the numbering convention in the Coverage Map below
|
||||
2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
|
||||
3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
|
||||
4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
|
||||
@@ -0,0 +1,93 @@
|
||||
# Release Sign-Off
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Release-day checklist for tagging a new certctl release. Walks through the gates that must be green before pushing the tag, in the order they should be verified.
|
||||
|
||||
## Pre-release: code state
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| `master` is at the commit you intend to tag | `git log -1 --format='%H %s'` | ☐ |
|
||||
| Working tree clean | `git status -sb` | ☐ |
|
||||
| Local matches GitHub | `curl -sS https://api.github.com/repos/certctl-io/certctl/commits/master \| grep -oE '"sha": "[a-f0-9]+"' \| head -1` matches local | ☐ |
|
||||
| `WORKSPACE-CHANGELOG.md` updated with the release's milestones | manual review | ☐ |
|
||||
| `certctl/CHANGELOG.md` updated (release-facing) | manual review | ☐ |
|
||||
| Migration ladder ends cleanly | `ls migrations/*.up.sql \| sort \| tail -3` shows the right last migration | ☐ |
|
||||
|
||||
## Pre-release: automated gates (CI)
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| CI pipeline green on the tag-target commit | GitHub Actions web UI | ☐ |
|
||||
| `make verify` clean locally | run from repo root | ☐ |
|
||||
| `go test -race -count=1 ./...` clean | full race check | ☐ |
|
||||
| `golangci-lint run ./...` clean | local lint | ☐ |
|
||||
| `govulncheck ./...` clean | vulnerability scan | ☐ |
|
||||
| Coverage thresholds met (service ≥55%, handler ≥60%, domain ≥40%, middleware ≥30%) | `go test -coverprofile=cover.out ./... && go tool cover -func=cover.out` | ☐ |
|
||||
| Frontend type-check + Vitest + Vite build clean | `cd web && npm run typecheck && npm run test && npm run build` | ☐ |
|
||||
|
||||
## Pre-release: manual QA passes
|
||||
|
||||
| Surface | Checklist | Pass |
|
||||
|---|---|---|
|
||||
| Local stack boots clean from scratch | `qa-prerequisites.md` Steps 1-4 green | ☐ |
|
||||
| GUI QA checklist | `gui-qa-checklist.md` end to end | ☐ |
|
||||
| End-to-end test environment | `test-environment.md` Steps 1-14 green | ☐ |
|
||||
| Performance baselines | `performance-baselines.md` four spot checks within bounds | ☐ |
|
||||
| Helm chart deploys clean | `helm-deployment.md` install + verify | ☐ |
|
||||
| ACME server interop (cert-manager) | `make acme-cert-manager-test` green | ☐ |
|
||||
| ACME server RFC conformance (lego) | `make acme-rfc-conformance-test` green | ☐ |
|
||||
|
||||
## Release artefact verification
|
||||
|
||||
After the release workflow runs (triggered by tag push), verify the published artefacts:
|
||||
|
||||
| Artefact | How to verify | Pass |
|
||||
|---|---|---|
|
||||
| Cosign keyless OIDC signature on `checksums.txt` | per `docs/reference/release-verification.md` step 2 | ☐ |
|
||||
| SLSA Level 3 provenance on each binary | step 3 | ☐ |
|
||||
| Container image signature + SBOM + provenance | step 4 | ☐ |
|
||||
| Release notes published on GitHub Releases page | manual review | ☐ |
|
||||
| ghcr.io images at `ghcr.io/certctl-io/certctl-{server,agent}:<tag>` pullable | `docker pull` round-trips | ☐ |
|
||||
|
||||
## Branch protection + tag push
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| `master` branch protection rule allows the tag push | Repository Settings → Branches | ☐ |
|
||||
| Tag pushed | `git tag -s v<version> -m 'Release v<version>'; git push origin v<version>` | ☐ |
|
||||
| Release workflow kicked off in GitHub Actions | watch the Actions tab | ☐ |
|
||||
|
||||
## Post-release
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| Release workflow completed without errors | GitHub Actions | ☐ |
|
||||
| Sample binary downloaded and Cosign-verified by an operator who is not the release author | another team member | ☐ |
|
||||
| `WORKSPACE-CHANGELOG.md` notes the tag commit SHA | manual edit | ☐ |
|
||||
| workspace-tracking "Active Focus" → "Current tag" updated | manual edit | ☐ |
|
||||
| `certctl.io/index.html` star count + `data-gh-version` rendering picks up the new tag | open the landing page in 6+ hours (cache TTL) | ☐ |
|
||||
| Reddit / Hacker News / LinkedIn announcement drafted (if a major release) | per the operator's promotion playbook | ☐ |
|
||||
|
||||
## If a gate fails
|
||||
|
||||
Revert the tag push immediately:
|
||||
|
||||
```bash
|
||||
git push --delete origin v<version>
|
||||
git tag -d v<version>
|
||||
```
|
||||
|
||||
Investigate, fix, re-tag.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/contributor/qa-prerequisites.md`](qa-prerequisites.md) — local stack prereqs
|
||||
- [`docs/contributor/test-environment.md`](test-environment.md) — full local environment tutorial
|
||||
- [`docs/contributor/gui-qa-checklist.md`](gui-qa-checklist.md) — GUI manual QA pass
|
||||
- [`docs/contributor/testing-strategy.md`](testing-strategy.md) — what we test in CI vs deep-scan vs manual QA
|
||||
- [`docs/contributor/ci-pipeline.md`](ci-pipeline.md) — CI shape and regression guards
|
||||
- [`docs/operator/performance-baselines.md`](../operator/performance-baselines.md) — performance regression spot checks
|
||||
- [`docs/operator/helm-deployment.md`](../operator/helm-deployment.md) — Helm install + verify
|
||||
- [`docs/reference/release-verification.md`](../reference/release-verification.md) — Cosign / SLSA / SBOM verification procedure
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl Testing Environment
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
A step-by-step guide to running certctl locally with real certificate authorities. Every command is spelled out. Every expected output is shown. If something goes wrong, the troubleshooting section tells you exactly what to check.
|
||||
|
||||
---
|
||||
@@ -171,7 +173,7 @@ curl --cacert "$CA" -f https://localhost:8443/health
|
||||
|
||||
Expect `{"status":"ok"}`. If `curl` errors with `SSL certificate problem: unable to get local issuer certificate`, the init container hasn't finished yet — wait a few seconds and retry. If the file doesn't exist at all, the bind mount didn't populate; `docker compose -f docker-compose.test.yml logs certctl-tls-init` should show the self-sign ran.
|
||||
|
||||
For a full explanation of the cert provisioning patterns (self-signed bootstrap, operator-supplied, cert-manager), see [`tls.md`](tls.md). For the one-step cutover from the old plaintext test harness to HTTPS, see [`upgrade-to-tls.md`](upgrade-to-tls.md).
|
||||
For a full explanation of the cert provisioning patterns (self-signed bootstrap, operator-supplied, cert-manager), see [`tls.md`](../operator/tls.md). For the one-step cutover from the old plaintext test harness to HTTPS, see [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -811,17 +813,30 @@ All containers share a bridge network (`certctl-test`, subnet 10.30.50.0/24) wit
|
||||
|
||||
### Key Generation Flow (Agent-Side)
|
||||
|
||||
```
|
||||
Server creates job (AwaitingCSR) → Agent polls, sees job →
|
||||
Agent generates ECDSA P-256 key pair locally →
|
||||
Agent creates CSR (public key + CN + SANs) →
|
||||
Agent POSTs CSR to server → Server signs via issuer →
|
||||
Server stores cert, creates Deployment job (Pending) →
|
||||
Agent polls, sees Deployment job →
|
||||
Agent fetches signed cert from server →
|
||||
Agent reads local private key from /var/lib/certctl/keys/ →
|
||||
Agent writes cert + key + chain to /nginx-certs/ (shared volume) →
|
||||
Job marked Completed
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant Srv as certctl-server
|
||||
participant Iss as Issuer connector
|
||||
participant Agt as certctl-agent
|
||||
participant FS as /var/lib/certctl/keys/<br/>(local agent FS)
|
||||
participant Vol as /nginx-certs/<br/>(shared volume)
|
||||
|
||||
Srv->>Srv: create Job (AwaitingCSR)
|
||||
Agt->>Srv: poll for jobs
|
||||
Srv-->>Agt: Job(AwaitingCSR)
|
||||
Agt->>FS: generate ECDSA P-256 keypair
|
||||
Agt->>Agt: build CSR (pubkey + CN + SANs)
|
||||
Agt->>Srv: POST CSR
|
||||
Srv->>Iss: sign CSR
|
||||
Iss-->>Srv: signed cert
|
||||
Srv->>Srv: store cert; create Deployment Job (Pending)
|
||||
Agt->>Srv: poll for jobs
|
||||
Srv-->>Agt: Job(Deployment)
|
||||
Agt->>Srv: GET signed cert
|
||||
Agt->>FS: read private key
|
||||
Agt->>Vol: write cert + key + chain
|
||||
Agt->>Srv: mark Job(Completed)
|
||||
```
|
||||
|
||||
### Shared Volume Architecture
|
||||
@@ -1,12 +1,14 @@
|
||||
# certctl Testing Strategy & Deep-Scan Operator Runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This doc covers the **testing topology** (per-PR fast gates vs. daily deep-scan
|
||||
gates), and the **operator runbook** for re-running each deep-scan tool locally
|
||||
when the CI receipt is ambiguous or when an operator wants to validate a fix
|
||||
before the next scheduled scan.
|
||||
|
||||
For the manual end-to-end QA playbook, see [`testing-guide.md`](testing-guide.md).
|
||||
For the security posture / per-finding closure log, see [`security.md`](security.md).
|
||||
For the manual end-to-end QA playbook, see [`testing-guide.md`](../testing-guide.md).
|
||||
For the security posture / per-finding closure log, see [`security.md`](../operator/security.md).
|
||||
|
||||
## CI workflow split
|
||||
|
||||
@@ -53,7 +55,7 @@ the bug the mutant introduced).
|
||||
|
||||
**Acceptance threshold:** ≥80% mutation kill ratio per package. Surviving
|
||||
mutants below that threshold get triaged in
|
||||
`cowork/comprehensive-audit-2026-04-25/d003-mutation-results.md` — either
|
||||
the project's 2026-04-25 mutation-results notes — either
|
||||
ship a targeted unit test that kills the mutant, or document an
|
||||
equivalent-mutation justification.
|
||||
|
||||
@@ -191,8 +193,8 @@ Re-run any of the deep-scan tools locally when:
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/security.md`](security.md) — security posture, per-finding closure log.
|
||||
- [`docs/testing-guide.md`](testing-guide.md) — manual end-to-end QA playbook.
|
||||
- [`docs/operator/security.md`](../operator/security.md) — security posture, per-finding closure log.
|
||||
- [`docs/testing-guide.md`](../testing-guide.md) — manual end-to-end QA playbook.
|
||||
- [`.github/workflows/ci.yml`](../.github/workflows/ci.yml) — per-PR fast gates.
|
||||
- [`.github/workflows/security-deep-scan.yml`](../.github/workflows/security-deep-scan.yml) — daily deep-scan gates.
|
||||
- [`scripts/install-security-tools.sh`](../scripts/install-security-tools.sh) — Go-host-installed tools (the docker-based tools are not in this script).
|
||||
-1606
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,7 @@
|
||||
# Advanced Demo: Certificate Lifecycle End-to-End
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This demo goes beyond browsing pre-loaded data. You'll create a team, register an owner, set up an issuer, create a certificate, trigger renewal, and watch everything appear in the dashboard in real time. Each step includes a technical explanation of what's happening inside certctl and why the system is designed that way.
|
||||
|
||||
**Time**: 15-20 minutes
|
||||
@@ -363,7 +365,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
| `issuer_id` | Links to the issuer connector that will sign this certificate. Determines which CA backend is used. |
|
||||
| `renewal_policy_id` | Links to a `renewal_policies` row that defines: how many days before expiry to renew (`renewal_window_days`), whether auto-renewal is enabled (`auto_renew`), max retries, and retry interval. The default policy (`rp-default`) renews 30 days before expiry. |
|
||||
| `status` | Set to `Pending` because the certificate hasn't been issued yet. The scheduler will pick it up, or you can trigger renewal manually. |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"pci": "true"` for compliance scoping). |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"environment": "production"` for fleet scoping). |
|
||||
|
||||
**Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details.
|
||||
|
||||
@@ -603,7 +605,7 @@ curl -s "$API/api/v1/audit?created_after=2026-03-24T09:00:00Z" | jq '.data | len
|
||||
|
||||
The audit middleware (M19) records every HTTP request: method, path, status code, actor, request body SHA-256 hash, and latency. This creates a complete API audit trail without blocking responses (logging happens asynchronously).
|
||||
|
||||
**Why immutable audit:** Compliance frameworks (SOC 2 Type II, PCI-DSS, ISO 27001) require tamper-evident audit logs. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
**Why immutable audit:** tamper-evident audit logs are a hard requirement when an attacker has compromised the API server. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
|
||||
**Check the dashboard.** The "Audit" view shows the full timeline of all actions across the system with filtering and CSV/JSON export.
|
||||
|
||||
@@ -701,7 +703,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
|
||||
**Why `environment` matters:** The environment field isn't just metadata — it feeds the policy engine. A policy rule with type `AllowedEnvironments` can restrict which environments are valid. If someone tries to create a certificate with `environment: "yolo"`, the policy engine flags a violation. In a mature deployment, you'd enforce policies strictly: production certificates must use a trusted CA (not Local CA), staging certificates can use Let's Encrypt staging, and development certificates can use the Local CA.
|
||||
|
||||
**Why `pci: true` in tags:** Tags are free-form, but they enable powerful filtering and compliance scoping. A security team could query `GET /api/v1/certificates?tags.pci=true` (not implemented yet, but the JSONB column supports it) to find all PCI-scoped certificates and verify they meet compliance requirements.
|
||||
**Why arbitrary tags in metadata:** Tags are free-form, but they enable powerful filtering and fleet scoping. A security team could query `GET /api/v1/certificates?tags.regulated=true` (not implemented yet, but the JSONB column supports it) to find all certificates marked regulated and verify they meet whatever requirements that label maps to.
|
||||
|
||||
**Refresh the dashboard** — you'll see the new payment gateway certificate. Try filtering by environment or status to see how both certificates appear alongside the demo data.
|
||||
|
||||
@@ -778,7 +780,7 @@ Check existing violations:
|
||||
curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq .
|
||||
```
|
||||
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific non-compliance.
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific violation.
|
||||
|
||||
**In the dashboard**, click "Policies" in the sidebar to see all active rules and which certificates are violating them.
|
||||
|
||||
@@ -844,7 +846,7 @@ curl -s -X POST $API/api/v1/profiles \
|
||||
|
||||
**How it works:** Certificate profiles are stored in the `certificate_profiles` table with a `allowed_key_algorithms` JSONB column that defines which key types and minimum sizes are acceptable. When a certificate is assigned to a profile, the profile constraints are enforced during CSR validation. The `max_validity_days` field controls the maximum certificate lifetime — profiles with values translating to under 1 hour enable short-lived certificate mode, where certs are exempt from CRL/OCSP.
|
||||
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce compliance requirements across the fleet.
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce policy across the fleet.
|
||||
|
||||
**In the dashboard**, click "Profiles" in the sidebar to see and manage certificate profiles.
|
||||
|
||||
@@ -894,17 +896,17 @@ Approve or reject them:
|
||||
# Approve a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Verified key type meets compliance requirements"}' | jq .
|
||||
-d '{"reason": "Verified key type meets policy"}' | jq .
|
||||
|
||||
# Reject a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet PCI requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy"}' | jq .
|
||||
```
|
||||
|
||||
**How it works:** When a renewal policy has `auto_renew` set to false, renewal jobs enter the `AwaitingApproval` state instead of being processed immediately. An operator must explicitly approve or reject the job via the API or the GUI. Approved jobs transition to `Pending` and are picked up by the job processor. Rejected jobs move to `Cancelled` with the provided reason recorded in the audit trail.
|
||||
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. PCI-scoped certificates, certs with specific compliance requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. High-value certificates, certs with specific policy requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
|
||||
**In the dashboard:** Click "Jobs" in the sidebar, filter by status "AwaitingApproval", and you'll see a list of renewal jobs waiting for approval. Each job shows the certificate, issuer, and requested validity period. Click a job to open its detail view and see the Approve / Reject buttons with a reason text field. After approval or rejection, the job status updates in real-time and the audit trail records the decision.
|
||||
|
||||
@@ -987,7 +989,7 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
|
||||
## Part 15: MCP Server for AI Integration (M18a)
|
||||
|
||||
certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with Claude, Cursor, and other AI assistants:
|
||||
certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with any MCP-compatible AI client:
|
||||
|
||||
```bash
|
||||
# Build the MCP server
|
||||
@@ -1008,19 +1010,19 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
- **Binary support** — handles DER-encoded CRL and OCSP responses without mangling
|
||||
- **Error translation** — converts HTTP errors to user-readable messages
|
||||
|
||||
**Example usage from Claude:**
|
||||
**Example usage:**
|
||||
|
||||
```
|
||||
User: What certificates are expiring in the next 30 days?
|
||||
|
||||
Claude uses the MCP tools to:
|
||||
The AI client uses the MCP tools to:
|
||||
1. Call tools.listCertificates with filters: {status: "Expiring"}
|
||||
2. Parse the response
|
||||
3. Display: "mc-api-prod expires in 12 days. mc-cdn-prod expires in 8 days..."
|
||||
|
||||
User: Revoke mc-payments due to key compromise
|
||||
|
||||
Claude uses the MCP tools to:
|
||||
The AI client uses the MCP tools to:
|
||||
1. Call tools.revokeCertificate with id="mc-payments" reason="keyCompromise"
|
||||
2. Return the audit trail entry showing revocation recorded
|
||||
```
|
||||
@@ -1,5 +1,7 @@
|
||||
# Understanding Certificates: A Beginner's Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential.
|
||||
|
||||
## Contents
|
||||
@@ -123,7 +125,7 @@ At no point does the private key leave the agent. This is a fundamental security
|
||||
|
||||
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
|
||||
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for audit reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
@@ -220,7 +222,7 @@ certctl implements revocation using three complementary mechanisms:
|
||||
|
||||
**Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`. The CRL is **pre-generated** by a scheduler-driven loop (`crlGenerationLoop`, default interval 1 hour, configurable via `CERTCTL_CRL_GENERATION_INTERVAL`) and persisted in the `crl_cache` table — HTTP fetches read from the cache rather than rebuilding per request, so a busy CA does not DOS itself at scale. Concurrent regeneration requests for the same issuer are coalesced via an in-tree singleflight gate.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](../reference/protocols/crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
|
||||
|
||||
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
|
||||
|
||||
@@ -242,7 +244,7 @@ Every action in certctl — issuing a certificate, renewing one, deploying to a
|
||||
|
||||
### Audit Trail
|
||||
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for compliance (SOC 2, PCI-DSS, ISO 27001) and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for audit and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
|
||||
### Notifications
|
||||
|
||||
@@ -256,7 +258,7 @@ The CLI supports both table and JSON output formats (`--format table` or `--form
|
||||
|
||||
### MCP Server (AI Integration)
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.
|
||||
|
||||
@@ -279,7 +281,7 @@ This gives you a three-step triage workflow:
|
||||
|
||||
Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.
|
||||
|
||||
This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
|
||||
This is a prerequisite for multi-CA migration, audit reviews, and building confidence that you've found all the certificates that matter.
|
||||
|
||||
### Observability
|
||||
|
||||
@@ -291,4 +293,4 @@ The agent fleet overview page groups agents by OS, architecture, and version, sh
|
||||
|
||||
Now that you understand the concepts, head to the [Quick Start Guide](quickstart.md) to get certctl running locally in under 5 minutes. You'll see a pre-loaded dashboard with demo certificates, explore the API, and understand how everything fits together.
|
||||
|
||||
For a deeper look at the system design, see the [Architecture Guide](architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](mcp.md). For the full API reference, see the [OpenAPI Spec Guide](openapi.md).
|
||||
For a deeper look at the system design, see the [Architecture Guide](../reference/architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](../reference/mcp.md). For the full API reference, see the [OpenAPI Spec Guide](../reference/api.md).
|
||||
@@ -1,5 +1,7 @@
|
||||
# Deployment Examples
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Five turnkey docker-compose scenarios, each runnable in under 5 minutes. Pick the one closest to your setup.
|
||||
|
||||
## Which Example Should I Use?
|
||||
@@ -32,7 +34,7 @@ docker compose up -d
|
||||
|
||||
The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../examples/acme-nginx/acme-nginx.md).
|
||||
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](migrate-from-certbot.md).
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](../migration/from-certbot.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -52,7 +54,7 @@ docker compose up -d
|
||||
|
||||
The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
|
||||
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](migrate-from-acmesh.md).
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](../migration/from-acmesh.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -105,7 +107,7 @@ docker compose up -d
|
||||
|
||||
The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../examples/multi-issuer/multi-issuer.md).
|
||||
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](certctl-for-cert-manager-users.md).
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](../migration/cert-manager-coexistence.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -117,4 +119,4 @@ These 5 scenarios cover the most common deployment patterns, but certctl support
|
||||
|
||||
**Targets:** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell or WinRM proxy), Postfix, Dovecot, F5 BIG-IP (coming soon).
|
||||
|
||||
See [Connector Reference](connectors.md) for configuration details on every issuer and target.
|
||||
See [Connector Reference](../reference/connectors/index.md) for configuration details on every issuer and target.
|
||||
@@ -1,5 +1,7 @@
|
||||
# Quick Start Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Certificate lifespans are dropping to **47 days by 2029**. At that cadence, a team managing 100 certificates is processing 7+ renewals per week — every week, forever. Manual processes break. certctl automates the entire lifecycle: issuance, renewal, deployment, revocation, and audit — with zero human intervention.
|
||||
|
||||
This guide gets you running in 5 minutes and walks you through everything certctl does.
|
||||
@@ -120,7 +122,7 @@ curl --cacert "$CA" https://localhost:8443/health
|
||||
{"status":"healthy"}
|
||||
```
|
||||
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/tls.md`](tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/upgrade-to-tls.md`](upgrade-to-tls.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/operator/tls.md`](../operator/tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/archive/upgrades/to-tls-v2.2.md`](../archive/upgrades/to-tls-v2.2.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
|
||||
## Open the Dashboard
|
||||
|
||||
@@ -130,7 +132,7 @@ Open **https://localhost:8443** in your browser. Your browser will warn about th
|
||||
>
|
||||
> **Key rotation:** `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `CERTCTL_AUTH_SECRET=new-key,old-key`). Both keys are valid simultaneously, enabling zero-downtime rotation: add the new key, roll clients over, then remove the old key.
|
||||
|
||||
The dashboard comes pre-loaded with 35 demo certificates across 5 issuers, 8 agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization.
|
||||
The dashboard comes pre-loaded with demo data covering certificates across multiple issuers, agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization. (Re-derive exact counts via `grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l`.)
|
||||
|
||||
### What you're looking at
|
||||
|
||||
@@ -322,7 +324,7 @@ curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve
|
||||
# Reject a pending job
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy requirements"}' | jq .
|
||||
```
|
||||
|
||||
## Certificate Discovery
|
||||
@@ -436,7 +438,7 @@ export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # MCP is env-vars-only; no CLI flag
|
||||
./mcp-server
|
||||
```
|
||||
|
||||
Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
Exposes the full REST API via MCP over stdio transport. Ask your MCP client: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
|
||||
## Demo Data Reference
|
||||
|
||||
@@ -447,7 +449,7 @@ Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certif
|
||||
| Issuers | 5 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, ZeroSSL (EAB), Custom OpenSSL CA |
|
||||
| Agents | 9 | 8 real agents (linux/darwin/windows, amd64/arm64) + server-scanner (network discovery) |
|
||||
| Targets | 8 | NGINX prod, NGINX staging, NGINX data, HAProxy, Apache, IIS, Traefik, Caddy |
|
||||
| Certificates | 35 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Certificates | 32 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Jobs | 50+ | 90 days of issuance, renewal, deployment jobs + 2 AwaitingApproval |
|
||||
| Discovered Certs | 12 | Unmanaged (filesystem + network), Managed (linked), Dismissed |
|
||||
| Discovery Scans | 8 | Historical + recent agent filesystem scans + network TLS scans |
|
||||
@@ -480,7 +482,7 @@ A suggested 5-minute flow:
|
||||
6. **Agent fleet** — "Agents handle key generation locally (ECDSA P-256). Private keys never leave your infrastructure."
|
||||
7. **Discovery** — "Agents scan filesystems, server probes TLS endpoints. We find what you're not managing yet."
|
||||
8. **Bulk operations** — "Select multiple certs, renew or revoke in bulk. At 47-day lifespans with hundreds of certs, this is essential."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for compliance."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for review."
|
||||
10. **CLI + MCP** — "Terminal users get `certctl-cli`. AI assistants get MCP integration. Everything is API-first."
|
||||
|
||||
## Tear Down
|
||||
@@ -496,7 +498,7 @@ The `-v` flag removes the PostgreSQL data volume for a clean slate.
|
||||
**Ready to deploy with your stack?** The [Deployment Examples](examples.md) page has 5 turnkey docker-compose scenarios — pick the one closest to your setup and have it running in minutes. It also covers migration paths from Certbot, acme.sh, and cert-manager.
|
||||
|
||||
- **[Deployment Examples](examples.md)** — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer
|
||||
- **[Advanced Demo](demo-advanced.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Reference](connectors.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Advanced Demo](advanced-demo.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](../reference/architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Reference](../reference/connectors/index.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Concepts Guide](concepts.md)** — TLS certificates, CAs, and private keys explained from scratch
|
||||
@@ -1,5 +1,7 @@
|
||||
# Why certctl?
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Certificate management is broken at every scale between "one domain on Let's Encrypt" and "Fortune 500 budget for Venafi." certctl fills that gap: a self-hosted platform that automates the entire certificate lifecycle, works with any CA, deploys to any server, and keeps private keys on your infrastructure. It's free, source-available, and you own everything.
|
||||
|
||||
## The Math That Forces the Decision
|
||||
@@ -32,17 +34,22 @@ This isn't a premium feature. It's the default behavior, free. Most alternatives
|
||||
|
||||
### 2. CA-Agnostic Issuer Architecture
|
||||
|
||||
certctl works with any certificate authority, not just ACME providers. Nine issuer connectors ship today, all free:
|
||||
certctl works with any certificate authority, not just ACME providers. Twelve issuer connectors ship today, all free:
|
||||
|
||||
- **ACME v2** (Let's Encrypt, ZeroSSL, Google Trust Services, Buypass) — HTTP-01, DNS-01, DNS-PERSIST-01 challenges, External Account Binding, ACME Renewal Information (RFC 9773), certificate profile selection
|
||||
- **HashiCorp Vault PKI** — `/v1/{mount}/sign/{role}` API, token auth
|
||||
- **DigiCert CertCentral** — async order model, OV/EV support
|
||||
- **Sectigo SCM** — async order model, DV/OV/EV support, 3-header auth
|
||||
- **Google Cloud CAS** — Certificate Authority Service, OAuth2 service account auth, CA pool selection
|
||||
- **AWS ACM Private CA** — managed private CA on AWS, IAM-authenticated, SDK-waiter for issuance
|
||||
- **Entrust Certificate Services** — Entrust CA Gateway with mTLS auth, approval-pending support
|
||||
- **GlobalSign Atlas HVCA** — region-pinned commercial CA with dual mTLS + API key/secret auth
|
||||
- **EJBCA / Keyfactor** — self-hosted open-source / Keyfactor enterprise CA, mTLS or OAuth2
|
||||
- **step-ca** (Smallstep) — native /sign API with JWK provisioner auth
|
||||
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root)
|
||||
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root); supports multi-level CA tree mode
|
||||
- **OpenSSL / Custom CA** — delegate signing to any shell script
|
||||
- **EST enrollment** (RFC 7030) — device certs for WiFi/802.1X, MDM, IoT
|
||||
|
||||
EST (RFC 7030) and SCEP (RFC 8894) are protocol surfaces, not separate issuers — they dispatch to whichever issuer above is configured for the EST/SCEP profile.
|
||||
|
||||
Every connector implements the same interface. Running multiple CAs in parallel — Let's Encrypt for public certs, Vault for internal services, your enterprise CA for legacy systems — is configuration, not code.
|
||||
|
||||
@@ -56,19 +63,19 @@ A reload command can exit 0 while the certificate doesn't take effect — wrong
|
||||
|
||||
The three differentiators above get the headlines, but the feature surface is wider than most paid platforms:
|
||||
|
||||
**13 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix, Dovecot, SSH (agentless), Windows Certificate Store, and Java Keystore. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
**15 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix/Dovecot (dual-mode), SSH (agentless), Windows Certificate Store, Java Keystore, Kubernetes Secrets, AWS Certificate Manager, and Azure Key Vault. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
|
||||
**Network certificate discovery** — active TLS scanning of CIDR ranges finds certificates you didn't know existed. Agents also scan local filesystems for PEM/DER files. Everything feeds into a triage workflow where you claim, dismiss, or import discovered certs into management.
|
||||
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete. Mapped to SOC 2, PCI-DSS 4.0, and NIST SP 800-57 compliance frameworks with published evidence guides.
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete.
|
||||
|
||||
**Policy engine** — 5 rule types (allowed issuers, allowed domains, required metadata, allowed environments, renewal lead time) with violation tracking and severity levels.
|
||||
|
||||
**PKI compliance** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
**Revocation infrastructure** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
|
||||
**Prometheus metrics** — `/api/v1/metrics/prometheus` in standard exposition format. Works with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics.
|
||||
|
||||
**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via Claude, Cursor, or any MCP-compatible client. No other certificate platform offers this.
|
||||
**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via any MCP-compatible client. No other certificate platform offers this.
|
||||
|
||||
**Full REST API** — OpenAPI 3.1-documented operations covering the entire platform. CLI tool with 10 subcommands. Helm chart for Kubernetes deployment. Scheduled certificate digest emails. Certificate export in PEM and PKCS#12. S/MIME support with EKU-aware issuance.
|
||||
|
||||
@@ -82,7 +89,7 @@ ACME clients solve one slice of the problem — issuance and renewal from ACME C
|
||||
|
||||
### vs. Agent-Based SaaS
|
||||
|
||||
The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 9 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
|
||||
The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 12 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
|
||||
|
||||
### vs. Commercial PKI Platforms
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
# Caddy Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running Caddy 2.7+ and
|
||||
> want it to ACME-issue from certctl (your internal CA, your private
|
||||
> PKI, or a local sub-CA chained under an enterprise root) instead of
|
||||
> Let's Encrypt. The Caddyfile changes are minimal; the load-bearing
|
||||
> piece is trusting certctl's bootstrap CA so Caddy's ACME client can
|
||||
> talk to certctl over HTTPS.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Caddy 2.7+. Target audience: operator running Caddy on a VM
|
||||
or container who wants Caddy to ACME-issue from certctl instead of
|
||||
@@ -10,7 +19,7 @@ Let's Encrypt.
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
|
||||
Step 2.
|
||||
- Caddy 2.7.x or later. `caddy version` should show 2.7.0+.
|
||||
- Network reachability: Caddy → certctl-server's HTTPS listener (port
|
||||
@@ -149,7 +158,7 @@ psql -c "SELECT actor, action, resource_id FROM audit_events
|
||||
legitimately high throughput.
|
||||
- **Caddy logs `urn:ietf:params:acme:error:rejectedIdentifier`** →
|
||||
the SAN list includes an identifier the certctl profile policy
|
||||
rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](./acme-server.md#certificate-readyfalse-with-rejectedidentifier).
|
||||
rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#certificate-readyfalse-with-rejectedidentifier).
|
||||
- **`badNonce` in Caddy logs** → clock skew or multi-replica certctl
|
||||
without sticky sessions; same fix as the cert-manager walkthrough.
|
||||
|
||||
@@ -165,8 +174,8 @@ rm -rf ~/.local/share/caddy/certificates/certctl.example.com-*
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
|
||||
K8s-native equivalent.
|
||||
- [Caddy upstream ACME docs](https://caddyserver.com/docs/automatic-https#acme-issuer)
|
||||
— verify behavior pinned here against Caddy 2.7.x semantics.
|
||||
@@ -1,5 +1,16 @@
|
||||
# cert-manager Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running cert-manager
|
||||
> 1.15+ in Kubernetes and want it to issue certs from certctl (your
|
||||
> internal CA, your private PKI, or a local sub-CA chained under an
|
||||
> enterprise root) via the standard ACME `ClusterIssuer` model. If
|
||||
> you want certctl to coexist with cert-manager rather than replace
|
||||
> its issuer backend, see
|
||||
> [`docs/migration/cert-manager-coexistence.md`](cert-manager-coexistence.md)
|
||||
> instead.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through cert-manager 1.15+. Target audience: Kubernetes operator who
|
||||
has never deployed certctl before and wants a working
|
||||
@@ -64,7 +75,7 @@ curl -X POST https://certctl-test.default.svc.cluster.local:8443/api/profiles \
|
||||
```
|
||||
|
||||
Auth-mode tradeoffs are covered in
|
||||
[`docs/acme-server.md` § Auth-mode decision tree](./acme-server.md#auth-mode-decision-tree).
|
||||
[`docs/acme-server.md` § Auth-mode decision tree](../reference/protocols/acme-server.md#auth-mode-decision-tree).
|
||||
For first-time deployments, `trust_authenticated` is the right default.
|
||||
|
||||
## Step 3 — Capture the certctl bootstrap CA
|
||||
@@ -83,7 +94,7 @@ cat deploy/test/certs/ca.crt | base64 -w0
|
||||
Capture the output for Step 4. This is **the** single biggest first-
|
||||
time-deploy footgun on the cert-manager integration path. The reference
|
||||
recipe lives in
|
||||
[`docs/acme-server.md` § TLS trust bootstrap](./acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
||||
[`docs/acme-server.md` § TLS trust bootstrap](../reference/protocols/acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
||||
|
||||
## Step 4 — Apply the ClusterIssuer
|
||||
|
||||
@@ -218,7 +229,7 @@ psql -c "SELECT created_at, action, resource_type, resource_id
|
||||
## Common failure modes
|
||||
|
||||
These are operator-side; full troubleshooting reference is in
|
||||
[`docs/acme-server.md` § Troubleshooting](./acme-server.md#troubleshooting).
|
||||
[`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#troubleshooting).
|
||||
|
||||
- `400 Bad Request: badNonce` → clock skew between certctl-server and
|
||||
cert-manager, or a multi-replica certctl fleet without sticky
|
||||
@@ -243,12 +254,12 @@ helm uninstall certctl-test
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-server-threat-model.md`](../reference/protocols/acme-server-threat-model.md) —
|
||||
security posture.
|
||||
- [`docs/acme-caddy-walkthrough.md`](./acme-caddy-walkthrough.md) —
|
||||
- [`docs/acme-caddy-walkthrough.md`](./acme-from-caddy.md) —
|
||||
Caddy-side recipe.
|
||||
- [`docs/acme-traefik-walkthrough.md`](./acme-traefik-walkthrough.md) —
|
||||
- [`docs/acme-traefik-walkthrough.md`](./acme-from-traefik.md) —
|
||||
Traefik-side recipe.
|
||||
- [`deploy/test/acme-integration/`](../deploy/test/acme-integration/) —
|
||||
Phase 5 integration test (the same recipe, automated).
|
||||
@@ -1,5 +1,14 @@
|
||||
# Traefik Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running Traefik 3.0+
|
||||
> (Kubernetes or VM) and want it to ACME-issue from certctl (your
|
||||
> internal CA, your private PKI, or a local sub-CA chained under an
|
||||
> enterprise root) instead of Let's Encrypt. The Traefik static config
|
||||
> changes are minimal; the load-bearing piece is `serversTransport.rootCAs`
|
||||
> so Traefik trusts certctl's bootstrap CA on every outbound ACME call.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Traefik 3.0+. Target audience: operator running Traefik (in
|
||||
Kubernetes or on a VM) who wants to use certctl as their ACME source
|
||||
@@ -10,7 +19,7 @@ of truth instead of Let's Encrypt.
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
|
||||
Step 2.
|
||||
- Traefik 3.0+ (the v2 API surface for ACME is also supported but the
|
||||
`serversTransport.rootCAs` reference below is v3-shaped).
|
||||
@@ -191,8 +200,8 @@ sudo rm /etc/traefik/acme-certctl.json
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
|
||||
cert-manager equivalent.
|
||||
- [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
|
||||
verify behavior pinned here against Traefik 3.0+ semantics.
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl for cert-manager Users
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You run cert-manager inside Kubernetes and it works well for in-cluster certificates. But you also have VMs, bare-metal servers, network appliances, and legacy systems outside the cluster. cert-manager can't reach those. This guide shows how certctl complements cert-manager to give you unified certificate visibility and automation across your entire infrastructure.
|
||||
|
||||
## Not a Replacement
|
||||
@@ -96,7 +98,7 @@ Go to **Policies** → **+ New Policy** to create enforcement rules:
|
||||
- **Severity:** `high`
|
||||
- **Config:** set your enforcement parameters
|
||||
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other compliance rules across your fleet.
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other policy rules across your fleet.
|
||||
|
||||
### 6. View Unified Inventory
|
||||
|
||||
@@ -139,7 +141,7 @@ For now: cert-manager handles Kubernetes, certctl handles everything else. They
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Run through the [Quick Start](./quickstart.md) for a 5-minute demo
|
||||
1. Run through the [Quick Start](../getting-started/quickstart.md) for a 5-minute demo
|
||||
2. Try the [Multi-Issuer example](../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
|
||||
3. Explore [Architecture](./architecture.md#agents) for deployment patterns
|
||||
3. Explore [Architecture](../reference/architecture.md#agents) for deployment patterns
|
||||
4. Check the [Helm Chart](../deploy/helm/certctl/) for production Kubernetes deployment
|
||||
@@ -1,5 +1,7 @@
|
||||
# Migrate from acme.sh to certctl
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You use acme.sh to automate Let's Encrypt renewal across multiple servers. It works — but without centralized visibility, deployment verification, or policy enforcement.
|
||||
|
||||
This guide walks through moving your acme.sh workload to certctl while keeping your existing DNS provider setup.
|
||||
@@ -270,6 +272,6 @@ certctl automatically falls back to DNS-01 if the CA doesn't support dns-persist
|
||||
## Next Steps
|
||||
|
||||
- Try the [Wildcard DNS-01 example](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
|
||||
- See [Connector Reference](connectors.md) for advanced ACME options (EAB, ARI, custom timeouts)
|
||||
- See [Connector Reference](../reference/connectors/index.md) for advanced ACME options (EAB, ARI, custom timeouts)
|
||||
- See [Discovery Guide](concepts.md#certificate-discovery) for managing discovered certificates at scale
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
|
||||
- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
|
||||
@@ -1,5 +1,7 @@
|
||||
# Migrating from Certbot to certctl
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You have 50 Let's Encrypt certificates across 10 servers, managed by a mix of Certbot cron jobs and manual renewals. Certbot handles issuance, but you lack inventory visibility, centralized alerting, and audit trails. This guide walks you through moving to certctl while keeping your existing certificates and ACME account.
|
||||
|
||||
## Why Migrate
|
||||
@@ -168,6 +170,6 @@ certctl will stop renewing that cert when the policy is disabled. Certbot resume
|
||||
## Next Steps
|
||||
|
||||
- Try the [ACME + NGINX example](../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
|
||||
- Review the [Concepts Guide](./concepts.md) for terminology (profiles, policies, agents, jobs)
|
||||
- Explore [Network Discovery](./quickstart.md#network-discovery-agentless) to find certificates you didn't know about
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
|
||||
- Review the [Concepts Guide](../getting-started/concepts.md) for terminology (profiles, policies, agents, jobs)
|
||||
- Explore [Network Discovery](../getting-started/quickstart.md#network-discovery-agentless) to find certificates you didn't know about
|
||||
- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
|
||||
@@ -0,0 +1,134 @@
|
||||
# Issuance approval workflow
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Operators configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
|
||||
Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — without this surface the answer is "we don't"; with `requires_approval=true` on the profile, the answer is "here's the RBAC contract + here's the audit query that proves bypass mode is off in production."
|
||||
|
||||
## End-to-end flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant A as Operator A<br/>(or scheduler)
|
||||
participant SVC as CertificateService<br/>.TriggerRenewal
|
||||
participant JOB as Job + ApprovalRequest
|
||||
participant B as Operator B
|
||||
participant APR as ApprovalService.Approve
|
||||
participant SCH as Scheduler
|
||||
|
||||
A->>SVC: POST /api/v1/certificates/{id}/renew<br/>(or renewal-loop tick)
|
||||
SVC->>JOB: read profile.RequiresApproval;<br/>create Job @ JobStatusAwaitingApproval;<br/>create ApprovalRequest<br/>(state=pending, requested_by=Operator A)
|
||||
Note over JOB,SCH: Scheduler skips —<br/>AwaitingApproval is NOT a dispatchable status
|
||||
B->>JOB: GET /api/v1/approvals?state=pending
|
||||
B->>APR: POST /api/v1/approvals/{id}/approve<br/>(decided_by=Operator B, note=...)
|
||||
APR->>APR: RBAC: reject if Operator B == Operator A<br/>→ ErrApproveBySameActor (HTTP 403)
|
||||
APR->>JOB: ApprovalRequest → state=approved;<br/>Job AwaitingApproval → Pending;<br/>audit row (action=approval_approved,<br/>actor=Operator B);<br/>certctl_approval_decisions_total<br/>{outcome=approved,profile_id=...}++
|
||||
SCH->>JOB: pick up Pending → dispatch to issuer connector
|
||||
JOB-->>A: cert issues normally
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Set `requires_approval=true` on a `CertificateProfile`:
|
||||
|
||||
```bash
|
||||
curl -X PUT https://certctl/api/v1/profiles/p-prod-cdn \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Production CDN",
|
||||
"requires_approval": true,
|
||||
...
|
||||
}'
|
||||
```
|
||||
|
||||
Every certificate bound to that profile is now gated. The default is `requires_approval=false` — existing profiles keep the historical unattended renewal path.
|
||||
|
||||
## RBAC: the two-person integrity rule
|
||||
|
||||
The actor that triggers a renewal **cannot** be the actor that approves it. The check happens at the service layer and surfaces as **HTTP 403** at the handler. The error message contains the substring `two-person integrity` so server-log greps detect attempted self-approvals.
|
||||
|
||||
This is the load-bearing two-person-integrity contract. Pinned by:
|
||||
|
||||
- `internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor` — service-level pin.
|
||||
- `internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403` — handler-level pin (HTTP 403 + body contains "two-person integrity").
|
||||
|
||||
## Operator playbook: "I need to approve a renewal"
|
||||
|
||||
```bash
|
||||
# 1. Find the pending request
|
||||
curl -s "https://certctl/api/v1/approvals?state=pending" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq
|
||||
|
||||
# 2. Inspect the request — confirm CN, SANs, requester
|
||||
curl -s "https://certctl/api/v1/approvals/ar-abc123" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq
|
||||
|
||||
# 3. Approve as a different actor than the requester
|
||||
curl -X POST "https://certctl/api/v1/approvals/ar-abc123/approve" \
|
||||
-H "Authorization: Bearer $APPROVER_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"note":"approved per ticket SECOPS-12345"}'
|
||||
|
||||
# 4. Confirm the job transitioned to Pending
|
||||
curl -s "https://certctl/api/v1/jobs?certificate_id=mc-foo" \
|
||||
-H "Authorization: Bearer $API_KEY" | jq '.[] | {id,status,type}'
|
||||
```
|
||||
|
||||
To **reject** instead, swap the path: `POST /api/v1/approvals/{id}/reject` with the same body shape. The job transitions to `Cancelled` and the `note` is recorded in the audit row.
|
||||
|
||||
## Operator playbook: "approval timed out"
|
||||
|
||||
The scheduler reaper transitions stale pending requests + their linked jobs after `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT` (default `168h` = 7 days):
|
||||
|
||||
- `ApprovalRequest.state` → `expired`
|
||||
- `Job.Status` → `Cancelled` (with `error_message="approval expired"`)
|
||||
- One audit row per expiry (`action=approval_expired, actor=system-reaper, actorType=System`)
|
||||
- `certctl_approval_decisions_total{outcome="expired",profile_id="..."}` increments
|
||||
|
||||
Resolve by re-triggering the renewal once the underlying delay is sorted:
|
||||
|
||||
```bash
|
||||
curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
|
||||
-H "Authorization: Bearer $API_KEY"
|
||||
```
|
||||
|
||||
Tighten the timeout for short-window deployments via the env var, e.g. `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h`.
|
||||
|
||||
## Bypass mode (dev / CI ONLY)
|
||||
|
||||
Setting `CERTCTL_APPROVAL_BYPASS=true` short-circuits the workflow: every `RequestApproval` call auto-approves with `decided_by=system-bypass` and `actorType=System`. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.
|
||||
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so reviewers detect misuse via:
|
||||
|
||||
```sql
|
||||
SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
|
||||
```
|
||||
|
||||
returning **zero rows in production** and a high count in dev. The certctl-server logs a `WARN` line at boot when bypass is enabled — operators alert on that log line in production environments.
|
||||
|
||||
## Prometheus metrics
|
||||
|
||||
```
|
||||
certctl_approval_decisions_total{outcome,profile_id} counter
|
||||
certctl_approval_pending_age_seconds histogram
|
||||
(le buckets:
|
||||
60, 300, 1800, 3600,
|
||||
21600, 86400, +Inf)
|
||||
```
|
||||
|
||||
`outcome` is one of `approved`, `rejected`, `expired`, `bypassed`. `profile_id` is the `CertificateProfile.ID` that triggered the gate (cardinality-bounded — operators have <100 profiles in production).
|
||||
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — production deployments usually have a same-day decision deadline.
|
||||
|
||||
## Future free V2 work
|
||||
|
||||
- **M-of-N approver chains.** Today's primitive is single-approver. Future V2 work adds chains — e.g., "needs 2 of 3 platform-team members."
|
||||
- **Time-windowed auto-approve.** Today's reaper hard-cancels at the static deadline. Policy-driven time-windowed auto-approve (T+30m unattended → cancel; T+24h business hours → escalate) is future work.
|
||||
- **External ticketing integration.** ServiceNow / JIRA bridging so approval state mirrors the change-management record.
|
||||
- **Per-owner / per-team routing.** Today's pool is global. Per-owner / per-team routing matches cert ownership to approver pools.
|
||||
- **Approval delegation.** Today the same-actor rule is strict. Time-bounded delegation is future work.
|
||||
|
||||
Tracked in `WORKSPACE-ROADMAP.md` under the Future Free V2 Work section — every item ships free under BSL.
|
||||
@@ -1,6 +1,8 @@
|
||||
# Database TLS — Postgres Transport Encryption
|
||||
|
||||
**Audit reference:** Bundle B / M-018. PCI-DSS v4.0 Req 4 §2.2.5; CWE-319.
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** Bundle B / M-018. CWE-319 (Cleartext transmission of sensitive information).
|
||||
|
||||
certctl talks to Postgres over a single connection-string URL controlled by the
|
||||
`CERTCTL_DATABASE_URL` env var. The `sslmode` query parameter on that URL
|
||||
@@ -13,16 +15,16 @@ explicit opt-in / opt-out paths for the four real-world deployment shapes.
|
||||
|
||||
| Deployment shape | Default `sslmode` | When to change |
|
||||
|------------------------------------------------|--------------------|----------------|
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload is in PCI-DSS scope. |
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload handles sensitive data. |
|
||||
| Helm chart, external Postgres (RDS / Cloud SQL / Azure DB) | not auto-set | **Always** set to `verify-full` and provide the cloud provider's server CA bundle. |
|
||||
| docker-compose, bundled Postgres on docker bridge | `disable` | Demo/dev only; not a deployment shape we expect operators to harden. |
|
||||
| docker-compose / k8s with external Postgres | not auto-set | **Always** set `CERTCTL_DATABASE_URL` to a connection string with `sslmode=verify-full`. |
|
||||
|
||||
`sslmode` values come from `lib/pq` (the underlying driver). The full set is:
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`. PCI-DSS
|
||||
Req 4 v4.0 §2.2.5 considers `verify-ca` the floor for sensitive-data transport;
|
||||
`verify-full` is the floor for systems exposed to spoofing risk (it adds
|
||||
hostname validation against the server cert's CN/SAN).
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`.
|
||||
`verify-ca` is the floor for sensitive-data transport; `verify-full`
|
||||
is the floor for systems exposed to spoofing risk (it adds hostname
|
||||
validation against the server cert's CN/SAN).
|
||||
|
||||
## Helm chart (Bundle B)
|
||||
|
||||
@@ -0,0 +1,120 @@
|
||||
# Helm Deployment
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operator runbook for deploying certctl on Kubernetes via the bundled Helm chart at `deploy/helm/certctl/`.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- Kubernetes cluster, v1.27+
|
||||
- `kubectl` configured and authenticated
|
||||
- `helm` v3.13+
|
||||
- Storage class for the PostgreSQL StatefulSet PVC
|
||||
- TLS cert source: either an operator-supplied `kubernetes.io/tls` Secret OR a cert-manager `ClusterIssuer` / `Issuer`. The chart refuses to render without one. See [`tls.md`](tls.md) for the four cert provisioning patterns.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--create-namespace \
|
||||
--set server.apiKey=$(openssl rand -hex 32) \
|
||||
--set postgres.password=$(openssl rand -hex 32) \
|
||||
--set server.tls.existingSecret=certctl-server-tls
|
||||
```
|
||||
|
||||
`server.apiKey` and `postgres.password` should be high-entropy values. The example above generates them inline; production deployments use a secrets manager (Vault, External Secrets Operator, AWS Secrets Manager) instead.
|
||||
|
||||
## What you get
|
||||
|
||||
- **Server Deployment** with a configurable replica count (default 1; HA needs sticky sessions on the ACME server's nonce path)
|
||||
- **PostgreSQL StatefulSet** with PVC-backed persistence
|
||||
- **Agent DaemonSet** with one agent per node (configurable via `agent.daemonset.enabled=false` if you don't want the in-cluster agent)
|
||||
- Health probes (`/health` liveness + `/ready` readiness)
|
||||
- Security contexts: non-root, read-only root filesystem
|
||||
- Optional Ingress (off by default; opt in via `ingress.enabled=true`)
|
||||
|
||||
## Cert source patterns
|
||||
|
||||
### Pattern 1 — operator-supplied Secret (recommended for non-cert-manager shops)
|
||||
|
||||
```bash
|
||||
kubectl create secret tls certctl-server-tls \
|
||||
--cert=server.crt --key=server.key \
|
||||
--namespace certctl
|
||||
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set server.tls.existingSecret=certctl-server-tls
|
||||
```
|
||||
|
||||
### Pattern 2 — cert-manager Certificate CR (recommended for cert-manager shops)
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set server.tls.certManager.enabled=true \
|
||||
--set server.tls.certManager.issuerRef.name=my-cluster-issuer \
|
||||
--set server.tls.certManager.issuerRef.kind=ClusterIssuer
|
||||
```
|
||||
|
||||
### Refuses to render without one of the above
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ --namespace certctl
|
||||
# Error: server.tls.existingSecret OR server.tls.certManager.enabled must be set
|
||||
```
|
||||
|
||||
The render-time guard catches the missing config at `helm install` time, not at pod-crash-loop time.
|
||||
|
||||
## Verify the install
|
||||
|
||||
```bash
|
||||
kubectl wait --for=condition=Ready --timeout=3m \
|
||||
-n certctl pod -l app.kubernetes.io/name=certctl-server
|
||||
|
||||
kubectl port-forward -n certctl svc/certctl-server 8443:8443 &
|
||||
|
||||
# Bundle the TLS root from the Secret to verify
|
||||
kubectl get secret -n certctl certctl-server-tls -o jsonpath='{.data.ca\.crt}' \
|
||||
| base64 -d > /tmp/certctl-ca.crt
|
||||
curl --cacert /tmp/certctl-ca.crt https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
```
|
||||
|
||||
If the Secret has no `ca.crt` key (operator-supplied Secrets often don't), use `tls.crt` as the bundle. For a self-signed cert the two files are identical; for a chained cert distribute the root CA bundle separately via ConfigMap.
|
||||
|
||||
## Upgrade
|
||||
|
||||
```bash
|
||||
helm upgrade certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--reuse-values
|
||||
```
|
||||
|
||||
Postgres state survives the upgrade (the PVC is retained). The server / agent images bump per the chart's `image.tag`. See [`docs/archive/upgrades/`](../archive/upgrades/) for version-specific upgrade guidance.
|
||||
|
||||
## Configuration reference
|
||||
|
||||
Every value is documented at `deploy/helm/certctl/values.yaml`. Common tweaks:
|
||||
|
||||
- `server.replicaCount` — replica count (default 1)
|
||||
- `server.resources.{requests,limits}` — pod resource bounds
|
||||
- `agent.daemonset.enabled` — toggle the in-cluster agent (default true)
|
||||
- `postgres.storageSize` — PVC size (default 10Gi)
|
||||
- `ingress.enabled` + `ingress.host` — opt into Ingress
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Pod crash-loops with TLS error.** Cert + key in the Secret don't pair. Verify with `openssl x509 -modulus -in server.crt -noout | md5` against `openssl rsa -modulus -in server.key -noout | md5` — outputs must match.
|
||||
|
||||
**Agent DaemonSet pods can't reach the server.** Service DNS / NetworkPolicy issue. Confirm the agent's `CERTCTL_SERVER_URL` env points at the in-cluster service name (`https://certctl-server.certctl.svc.cluster.local:8443`).
|
||||
|
||||
**Postgres won't start.** PVC permissions. Check `kubectl describe pvc -n certctl certctl-postgres` and confirm the storage class supports `fsGroup`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`tls.md`](tls.md) — cert provisioning patterns + SIGHUP rotation
|
||||
- [`security.md`](security.md) — production security posture
|
||||
- [`runbooks/disaster-recovery.md`](runbooks/disaster-recovery.md) — Postgres restore + recovery procedures
|
||||
- [`docs/archive/upgrades/`](../archive/upgrades/) — version-specific upgrade procedures
|
||||
@@ -0,0 +1,209 @@
|
||||
# Legacy Clients (TLS 1.2) — Reverse-Proxy Runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** Bundle F / M-023. CWE-326 (Inadequate encryption strength).
|
||||
|
||||
## What this is
|
||||
|
||||
certctl's control plane pins `tls.Config.MinVersion = tls.VersionTLS13`
|
||||
(`cmd/server/tls.go:131`). Some embedded EST (RFC 7030) and SCEP (RFC 8894)
|
||||
clients only speak TLS 1.0/1.1/1.2 — those clients cannot complete the
|
||||
handshake against certctl directly. This runbook documents the supported
|
||||
operator pattern: terminate the legacy TLS version at a front-door reverse
|
||||
proxy and pass the request through to certctl over TLS 1.3.
|
||||
|
||||
## Why TLS 1.3 minimum
|
||||
|
||||
certctl's audit posture and the M-001 PBKDF2 work factor both assume
|
||||
modern transport crypto. TLS 1.2 with the cipher suites still in the
|
||||
wild has known attack surface (BEAST, POODLE, ROBOT, raccoon — all
|
||||
CVE-categorized); allowing TLS 1.2 directly on the certctl listener
|
||||
would invalidate the guarantee that the server-side encryption chain
|
||||
is the strongest the ecosystem currently supports.
|
||||
|
||||
## When this runbook applies
|
||||
|
||||
You need this if **all three** are true:
|
||||
|
||||
1. You operate certctl with EST or SCEP enabled (`CERTCTL_EST_ENABLED=true`
|
||||
or `CERTCTL_SCEP_ENABLED=true`).
|
||||
2. Your enrolling clients are embedded devices (printers, network
|
||||
appliances, IoT boards, legacy MFPs, point-of-sale terminals) whose TLS
|
||||
stack pre-dates 2018 and only speaks TLS 1.2 or older.
|
||||
3. Replacing those clients is not feasible on a 6-month horizon.
|
||||
|
||||
If your enrolling clients are modern (any current Linux/Windows/macOS
|
||||
host, anything Go-based, anything Rust/Python/Node from 2019 onward),
|
||||
they speak TLS 1.3 natively and this runbook is unnecessary — point them
|
||||
straight at certctl on `:8443`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
Client["legacy EST/SCEP client"]
|
||||
Proxy["nginx / HAProxy<br/>reverse proxy"]
|
||||
Server["certctl :8443"]
|
||||
Client -->|"TLS 1.2/1.3<br/>(allowed TLS 1.2)"| Proxy
|
||||
Proxy -->|"TLS 1.3<br/>(re-encrypts as TLS 1.3)"| Server
|
||||
```
|
||||
|
||||
The reverse proxy:
|
||||
|
||||
- Terminates the legacy-version TLS handshake on the public-facing port.
|
||||
- Forwards the request to certctl over TLS 1.3 on a private network.
|
||||
- (For EST mTLS) forwards the client certificate via an
|
||||
`X-SSL-Client-Cert` header that certctl reads only when the connection
|
||||
arrives from a configured-trusted source IP.
|
||||
|
||||
## nginx config
|
||||
|
||||
```nginx
|
||||
upstream certctl_backend {
|
||||
# Private-network address; not reachable from outside the proxy host.
|
||||
server 10.0.0.10:8443;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name est.example.com;
|
||||
|
||||
# Public-facing legacy listener. ssl_protocols includes TLSv1.2 explicitly.
|
||||
# Keep ssl_ciphers conservative — only strong AEAD suites with forward
|
||||
# secrecy.
|
||||
ssl_certificate /etc/nginx/certs/est.example.com.fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/est.example.com.key;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
|
||||
ssl_prefer_server_ciphers on;
|
||||
|
||||
# mTLS for EST: optional client cert, verified against the EST CA.
|
||||
ssl_client_certificate /etc/nginx/certs/est-clients-ca.pem;
|
||||
ssl_verify_client optional;
|
||||
|
||||
location ~ ^/\.well-known/(est|pki) {
|
||||
# Forward the client cert (if presented) to certctl over the
|
||||
# private hop. The current certctl implementation IGNORES the
|
||||
# X-SSL-Client-Cert header (header-agnostic by default — see
|
||||
# the certctl-side configuration section below). EST/SCEP
|
||||
# authentication still works correctly because both protocols
|
||||
# carry their own auth (CSR signature for EST, challengePassword
|
||||
# for SCEP) inside the request body.
|
||||
proxy_set_header X-SSL-Client-Cert $ssl_client_escaped_cert;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# The proxy-to-certctl hop is itself TLS 1.3.
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
|
||||
# SCEP endpoints — same pattern, no client-cert requirement
|
||||
# (SCEP authenticates via challengePassword inside the CSR).
|
||||
location ^~ /scep {
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## HAProxy config (alternative)
|
||||
|
||||
```
|
||||
frontend est_legacy
|
||||
bind *:443 ssl crt /etc/haproxy/certs/est.example.com.pem alpn h2,http/1.1 \
|
||||
ssl-min-ver TLSv1.2 \
|
||||
ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384
|
||||
|
||||
acl is_est_path path_beg /.well-known/est
|
||||
acl is_pki_path path_beg /.well-known/pki
|
||||
acl is_scep_path path_beg /scep
|
||||
use_backend certctl_backend if is_est_path or is_pki_path or is_scep_path
|
||||
default_backend certctl_modern
|
||||
|
||||
backend certctl_backend
|
||||
server certctl 10.0.0.10:8443 ssl verify required \
|
||||
ca-file /etc/haproxy/certs/certctl-internal-ca.pem \
|
||||
ssl-min-ver TLSv1.3
|
||||
http-request set-header X-Forwarded-For %[src]
|
||||
http-request set-header X-Forwarded-Proto https
|
||||
```
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
The current implementation is **header-agnostic**: certctl ignores any
|
||||
`X-SSL-Client-Cert` / `X-Forwarded-For` headers from the proxy. EST
|
||||
authentication still happens via in-protocol CSR signature + profile
|
||||
policy (RFC 7030 §3.2.3); SCEP authentication still happens via the
|
||||
`challengePassword` attribute embedded in the CSR (RFC 8894 §3.2). Both
|
||||
mechanisms are inside the request body and survive the reverse-proxy
|
||||
hop without server-side header trust.
|
||||
|
||||
**Why this is the correct default:** trusting a proxy-supplied header
|
||||
for client identity opens a header-spoofing attack surface that requires
|
||||
careful design (CIDR allowlist of trusted proxies, fail-closed defaults,
|
||||
explicit operator opt-in). The Bundle F closure of M-023 ships the
|
||||
TLS-bridge guidance as documentation only; a future commit can extend
|
||||
certctl with proxy-header trust if and when an operator demonstrates a
|
||||
deployment shape that requires it. Until that lands, the runbook above
|
||||
is operationally complete: legacy EST and SCEP clients continue to
|
||||
authenticate via their in-protocol mechanisms, and the reverse proxy is
|
||||
purely a TLS-version bridge.
|
||||
|
||||
If your deployment requires proxy-supplied client identity (e.g., the
|
||||
proxy terminates mTLS and you want certctl to record the client-cert
|
||||
subject in the audit trail beyond what the CSR carries), open an issue
|
||||
and a future commit will add a header-trust contract behind two
|
||||
fail-closed env vars: a CIDR allowlist of trusted proxies, plus an
|
||||
explicit opt-in toggle. Both knobs would be required together; setting
|
||||
only one would fail loud at startup. Until that work ships, the
|
||||
header-agnostic default described above is the only supported
|
||||
configuration.
|
||||
|
||||
## TLS posture summary
|
||||
|
||||
The configuration above:
|
||||
|
||||
- Pins TLS 1.2 + TLS 1.3 only (no SSLv3, TLS 1.0, TLS 1.1).
|
||||
- Uses only AEAD cipher suites with forward secrecy (ECDHE-* with GCM or
|
||||
ChaCha20-Poly1305).
|
||||
- Re-encrypts to TLS 1.3 on the proxy-to-certctl hop so the certctl
|
||||
listener never speaks anything below 1.3.
|
||||
|
||||
That is the strongest posture currently achievable while still allowing
|
||||
the legacy clients to enroll. Reviewers looking for the attestation
|
||||
should be pointed at this section + the proxy's TLS config.
|
||||
|
||||
## What this runbook does NOT cover
|
||||
|
||||
- **Replacing the legacy clients.** That's the long-term fix; this
|
||||
runbook is the bridge while you're migrating.
|
||||
- **Network segmentation.** The reverse proxy assumes the proxy-to-certctl
|
||||
hop is on a network that an external attacker can't reach. If it's
|
||||
not, you need a deeper architecture review.
|
||||
- **Client-cert revocation.** EST mTLS revocation is the relying party's
|
||||
responsibility. certctl's EST handler accepts the cert; the proxy can
|
||||
enforce CRL/OCSP via `ssl_crl_path` (nginx) or `crl-file` (HAProxy).
|
||||
|
||||
## When TLS 1.2 itself sunsets
|
||||
|
||||
Major browsers and OS vendors will eventually deprecate TLS 1.2. When
|
||||
that happens, this runbook becomes obsolete; the only path forward
|
||||
will be to replace the legacy clients. Watch the IETF TLS working
|
||||
group, the major browser vendors' announcement channels, and your
|
||||
own embedded-device vendors for deprecation notices.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/operator/tls.md`](tls.md) — the certctl-internal TLS configuration (HTTPS-only control plane, MinVersion pin)
|
||||
- [`docs/operator/security.md`](security.md) — overall security posture
|
||||
- [`docs/operator/database-tls.md`](database-tls.md) — Postgres TLS opt-in (Bundle B / M-018)
|
||||
- [`docs/reference/protocols/scep-server.md`](../reference/protocols/scep-server.md) — SCEP RFC 8894 native server reference
|
||||
- [`docs/reference/protocols/est.md`](../reference/protocols/est.md) — EST RFC 7030 server reference
|
||||
@@ -0,0 +1,106 @@
|
||||
# Performance Baselines
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operator-runnable benchmarks for spot-checking certctl performance against published baselines. Useful as a regression detector after upgrades or infra changes.
|
||||
|
||||
## Why these specific spots?
|
||||
|
||||
certctl's hot paths are dominated by three workloads:
|
||||
|
||||
1. **API request handling** — auth, rate-limit decision, route dispatch, DB read
|
||||
2. **Renewal scheduler** — periodic scan + dispatch
|
||||
3. **Certificate inventory queries** — large list returns with sparse fields
|
||||
|
||||
The baselines below cover those three.
|
||||
|
||||
## Baseline #1: API request handling (single endpoint)
|
||||
|
||||
Hit a hot read endpoint with a tight loop and compare against the baseline.
|
||||
|
||||
```bash
|
||||
SERVER=https://localhost:8443
|
||||
CACERT="--cacert ./deploy/test/certs/ca.crt"
|
||||
AUTH="Authorization: Bearer change-me-in-production"
|
||||
|
||||
# Warm the connection pool (5 requests, discard timing)
|
||||
for i in $(seq 1 5); do
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
||||
done
|
||||
|
||||
# Measured run: 100 requests, capture mean latency
|
||||
time (for i in $(seq 1 100); do
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
||||
done)
|
||||
```
|
||||
|
||||
**Baseline (M3 MacBook Pro, Docker Desktop):** real time under 5 seconds for 100 sequential requests = mean ~50ms p50.
|
||||
|
||||
If you're seeing > 100ms mean, something is wrong: PostgreSQL connection pool exhaustion, agent flooding the work-poll endpoint, or rate-limiter mis-tuned.
|
||||
|
||||
## Baseline #2: Inventory list with cursor pagination
|
||||
|
||||
```bash
|
||||
# Cursor-paginated full inventory walk
|
||||
NEXT=""
|
||||
PAGES=0
|
||||
START=$(date +%s)
|
||||
while true; do
|
||||
RESP=$(curl -s $CACERT -H "$AUTH" "$SERVER/api/v1/certificates?limit=100&cursor=$NEXT")
|
||||
NEXT=$(echo "$RESP" | jq -r '.next_cursor // empty')
|
||||
PAGES=$((PAGES + 1))
|
||||
[ -z "$NEXT" ] && break
|
||||
done
|
||||
END=$(date +%s)
|
||||
echo "Walked $PAGES pages in $((END - START))s"
|
||||
```
|
||||
|
||||
**Baseline:** for the demo dataset (15 certificates, 1 page), under 1 second total. For a 1000-cert inventory (10 pages of 100), under 3 seconds total = ~300ms per page.
|
||||
|
||||
If you're seeing > 1s per page on a 1000-cert inventory, the cursor index on `managed_certificates(created_at, id)` is missing or the query plan went wrong.
|
||||
|
||||
## Baseline #3: Scheduler tick (renewal scan)
|
||||
|
||||
The renewal scheduler runs every hour by default. Force a tick and observe the time-to-completion in the logs:
|
||||
|
||||
```bash
|
||||
# Trigger an immediate renewal scan via the admin endpoint
|
||||
curl -s $CACERT -H "$AUTH" -X POST $SERVER/api/v1/admin/scheduler/run-now/renewal | jq .
|
||||
|
||||
# Tail the log and look for the matching `renewal scan complete` line
|
||||
docker compose logs -f certctl-server | grep 'renewal'
|
||||
```
|
||||
|
||||
**Baseline (15-cert demo dataset):** "renewal scan complete" within 100ms of the trigger.
|
||||
|
||||
For a 1000-cert inventory: under 5 seconds. The dominant cost is the per-cert profile + policy + alert-channel resolve plus the threshold-comparison math. If you're seeing > 10 seconds, profile resolution is likely doing N+1 queries.
|
||||
|
||||
## Baseline #4: Bulk revoke
|
||||
|
||||
```bash
|
||||
# Bulk-revoke all certs from a (test) issuer
|
||||
TIME=$(date +%s)
|
||||
curl -s $CACERT -H "$AUTH" -H "$CT" -X POST $SERVER/api/v1/certificates/bulk-revoke \
|
||||
-d '{"filter":{"issuer_id":"iss-test"},"reason":"superseded"}' | jq .
|
||||
echo "Bulk revoke: $(($(date +%s) - TIME))s"
|
||||
```
|
||||
|
||||
**Baseline:** linear in cert count. For 100 certs from one issuer: under 5 seconds. For 1000 certs: under 30 seconds (dominated by per-cert audit row + per-cert CRL refresh).
|
||||
|
||||
## When to re-baseline
|
||||
|
||||
After any of:
|
||||
|
||||
- Postgres major-version upgrade
|
||||
- Go major-version upgrade
|
||||
- Significant migration (add a column to `managed_certificates`, add an index)
|
||||
- Connection pool config change
|
||||
- Changing the renewal scheduler interval
|
||||
|
||||
Capture timing in your own loadtest-baselines log so future regressions surface against a real baseline rather than the operator's gut feeling.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/contributor/ci-pipeline.md`](../contributor/ci-pipeline.md) — CI guard for performance regression
|
||||
- [`docs/operator/security.md`](security.md) — rate limit tuning
|
||||
- [`docs/reference/architecture.md`](../reference/architecture.md) — request path through handler → service → repository
|
||||
@@ -1,5 +1,7 @@
|
||||
# Runbook: cloud-target deployment connectors (AWS ACM + Azure Key Vault)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This runbook covers the SDK-driven cloud target connectors that ship in
|
||||
certctl post-2026-05-03 (Rank 5 of the Infisical deep-research
|
||||
deliverable). It complements the operator-facing
|
||||
@@ -15,42 +17,39 @@ install certctl.
|
||||
|
||||
## End-to-end flow (cloud targets)
|
||||
|
||||
```
|
||||
cert renewed → renewal job created
|
||||
│
|
||||
▼
|
||||
agent picks up DeployCertificate work item
|
||||
│
|
||||
▼
|
||||
target.Connector.DeployCertificate(ctx, request)
|
||||
│
|
||||
┌──────────────────┴──────────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
AWS ACM path Azure Key Vault path
|
||||
│ │
|
||||
▼ ▼
|
||||
1. (rotate-in-place only) 1. GetCertificate(name, "" /* latest */)
|
||||
DescribeCertificate(arn) — capture snapshot CER bytes
|
||||
2. GetCertificate(arn) — capture 2. Build PFX from cert+chain+key
|
||||
snapshot bytes for rollback (PKCS#12 via go-pkcs12)
|
||||
3. ImportCertificate(arn, new_bytes) 3. ImportCertificate(name, PFX, tags)
|
||||
— fresh ARN OR rotate-in-place — ALWAYS creates a new version
|
||||
4. AddTagsToCertificate(arn, 4. (Tags carried forward
|
||||
provenance) — ACM strips on automatically)
|
||||
re-import; we re-apply
|
||||
5. DescribeCertificate(arn) — verify 5. GetCertificate(name, "" /* latest */)
|
||||
serial matches expected — verify serial matches expected
|
||||
6. ON MISMATCH: rollback ←──── (same shape) ────→ 6. ON MISMATCH: rollback
|
||||
ImportCertificate(arn, ImportCertificate(name,
|
||||
snapshot_bytes) snapshot_PFX) — new version
|
||||
│
|
||||
▼
|
||||
7. Audit row + Prometheus counter
|
||||
certctl_deploy_attempts_total{target_type="AWSACM"|"AzureKeyVault",
|
||||
result="success"|"failure"}
|
||||
certctl_deploy_rollback_total{target_type=...,
|
||||
outcome="restored"|"also_failed"}
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Renew["cert renewed → renewal job created"]
|
||||
Pick["agent picks up DeployCertificate work item"]
|
||||
Dispatch["target.Connector.DeployCertificate(ctx, request)"]
|
||||
|
||||
Renew --> Pick --> Dispatch
|
||||
Dispatch --> AWS
|
||||
Dispatch --> AZ
|
||||
|
||||
subgraph AWS["AWS ACM path"]
|
||||
A1["1. rotate-in-place only:<br/>DescribeCertificate(arn)"]
|
||||
A2["2. GetCertificate(arn) —<br/>capture snapshot bytes for rollback"]
|
||||
A3["3. ImportCertificate(arn, new_bytes) —<br/>fresh ARN OR rotate-in-place"]
|
||||
A4["4. AddTagsToCertificate(arn, provenance) —<br/>ACM strips on re-import; we re-apply"]
|
||||
A5["5. DescribeCertificate(arn) —<br/>verify serial matches expected"]
|
||||
A6["6. ON MISMATCH: rollback<br/>ImportCertificate(arn, snapshot_bytes)"]
|
||||
A1 --> A2 --> A3 --> A4 --> A5 --> A6
|
||||
end
|
||||
|
||||
subgraph AZ["Azure Key Vault path"]
|
||||
Z1["1. GetCertificate(name, '' = latest) —<br/>capture snapshot CER bytes"]
|
||||
Z2["2. Build PFX from cert+chain+key<br/>(PKCS#12 via go-pkcs12)"]
|
||||
Z3["3. ImportCertificate(name, PFX, tags) —<br/>ALWAYS creates a new version"]
|
||||
Z4["4. Tags carried forward automatically"]
|
||||
Z5["5. GetCertificate(name, '' = latest) —<br/>verify serial matches expected"]
|
||||
Z6["6. ON MISMATCH: rollback<br/>ImportCertificate(name, snapshot_PFX) —<br/>new version"]
|
||||
Z1 --> Z2 --> Z3 --> Z4 --> Z5 --> Z6
|
||||
end
|
||||
|
||||
A6 --> Audit
|
||||
Z6 --> Audit
|
||||
Audit["7. Audit row + Prometheus counters<br/>certctl_deploy_attempts_total{target_type, result}<br/>certctl_deploy_rollback_total{target_type, outcome}"]
|
||||
```
|
||||
|
||||
---
|
||||
@@ -319,7 +318,7 @@ az monitor activity-log list \
|
||||
|
||||
## V3-Pro forward path
|
||||
|
||||
Tracked at `cowork/WORKSPACE-ROADMAP.md` under "Adapter hardening":
|
||||
Tracked under "Adapter hardening" on the project roadmap:
|
||||
|
||||
- **AWS CloudFront direct-attach** — UpdateDistribution after an ACM
|
||||
ImportCertificate so the CloudFront edge picks up the new cert
|
||||
@@ -1,5 +1,7 @@
|
||||
# Disaster recovery runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Status (this document):** Production hardening II Phase 10
|
||||
> deliverable. Codifies the fail-safe behaviors that already exist in
|
||||
> the codebase and the operator procedures for recovering from
|
||||
@@ -7,10 +9,10 @@
|
||||
> if a procedure here doesn't work as documented, that's a bug in
|
||||
> docs (file an issue).
|
||||
|
||||
This runbook is the SOC 2 / PCI procurement-team deliverable: it tells
|
||||
auditors and on-call operators what to do when a piece of certctl's
|
||||
state corrupts, when a CA key needs rotation, or when Postgres needs
|
||||
a point-in-time restore. Read it once when you set up certctl; print
|
||||
This runbook is the on-call deliverable: it tells reviewers and
|
||||
on-call operators what to do when a piece of certctl's state
|
||||
corrupts, when a CA key needs rotation, or when Postgres needs a
|
||||
point-in-time restore. Read it once when you set up certctl; print
|
||||
the [DR checklist](#dr-checklist) and pin it near your on-call rotation.
|
||||
|
||||
## Contents
|
||||
@@ -55,7 +57,7 @@ without operator action. The fail-safes in the codebase:
|
||||
These fail-safes mean most of this runbook is "delete the corrupt
|
||||
row + wait for the next tick" rather than "restore from backup +
|
||||
manually re-issue." The runbook documents the full procedures
|
||||
anyway because compliance auditors need to see them written down.
|
||||
anyway because reviewers need to see them written down.
|
||||
|
||||
## CRL cache recovery
|
||||
|
||||
@@ -236,7 +238,7 @@ remains trusted by relying parties until its `notAfter` (typical
|
||||
openssl x509 -in new-cert -noout -issuer
|
||||
```
|
||||
|
||||
**Future:** when the HSM/PKCS#11 driver bundle (`cowork/hsm-pkcs11-
|
||||
**Future:** when the HSM/PKCS#11 driver bundle (planned;
|
||||
driver-prompt.md`) ships, this rotation procedure changes
|
||||
substantially — the HSM-backed key never moves, only the cert wrap
|
||||
rotates. The signer interface seam is the load-bearing prerequisite
|
||||
@@ -286,7 +288,7 @@ backups. Without them, a restored DB is unusable.
|
||||
## Trust-bundle reload semantics
|
||||
|
||||
This section codifies the fail-safe behavior that's already in code,
|
||||
for compliance auditors who need to see the procedure documented.
|
||||
for reviewers who need to see the procedure documented.
|
||||
|
||||
**Pattern:** every trust-bundle holder (`internal/trustanchor.Holder`,
|
||||
used by SCEP/Intune dispatcher + EST mTLS sibling route) implements
|
||||
@@ -340,9 +342,9 @@ Print this. Pin it near your on-call rotation.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`crl-ocsp.md`](crl-ocsp.md) — CRL/OCSP responder operator guide.
|
||||
- [`tls.md`](tls.md) — control-plane TLS bootstrap.
|
||||
- [`security.md`](security.md) — production-grade security posture.
|
||||
- [`scep-intune.md`](scep-intune.md) — SCEP/Intune trust-anchor
|
||||
- [`crl-ocsp.md`](../../reference/protocols/crl-ocsp.md) — CRL/OCSP responder operator guide.
|
||||
- [`tls.md`](../../operator/tls.md) — control-plane TLS bootstrap.
|
||||
- [`security.md`](../../operator/security.md) — production-grade security posture.
|
||||
- [`scep-intune.md`](../../reference/protocols/scep-intune.md) — SCEP/Intune trust-anchor
|
||||
rotation specifics.
|
||||
- [`est.md`](est.md) — EST mTLS trust-bundle rotation specifics.
|
||||
- [`est.md`](../../reference/protocols/est.md) — EST mTLS trust-bundle rotation specifics.
|
||||
@@ -1,5 +1,7 @@
|
||||
# Runbook: certificate-expiry alerts (multi-channel)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This runbook covers the per-policy multi-channel expiry-alert dispatch
|
||||
path that ships in certctl post-2026-05-03 (Rank 4 of the Infisical
|
||||
deep-research deliverable). It complements the operator-facing
|
||||
@@ -14,36 +16,37 @@ walkthrough of how to install certctl — that lives in the README.
|
||||
|
||||
## End-to-end flow
|
||||
|
||||
```
|
||||
daily ticker (renewalCheckLoop)
|
||||
│
|
||||
▼
|
||||
RenewalService.CheckExpiringCertificates
|
||||
│
|
||||
┌────────────────┴────────────────┐
|
||||
│ for cert in expiring (≤30 days):│
|
||||
│ 1. Resolve RenewalPolicy │
|
||||
│ 2. Compute daysUntil │
|
||||
│ 3. updateCertExpiryStatus │
|
||||
│ 4. sendThresholdAlerts ──────►│ per threshold:
|
||||
│ 5. Create renewal job (if │ a. resolve severity tier
|
||||
│ issuer registered + ARI │ via AlertSeverityMap
|
||||
│ allows) │ b. resolve channel set
|
||||
└──────────────────────────────────┘ via AlertChannels[tier]
|
||||
c. for each channel:
|
||||
i. dedup via
|
||||
notification_events
|
||||
(cert,threshold,channel)
|
||||
ii. SendThresholdAlertOnChannel
|
||||
→ notifierRegistry[channel]
|
||||
→ Send(recipient,subj,body)
|
||||
iii. record audit row
|
||||
(event_type=expiration_alert_sent,
|
||||
metadata.channel,
|
||||
metadata.severity_tier)
|
||||
iv. bump Prometheus counter
|
||||
certctl_expiry_alerts_total
|
||||
{channel,threshold,result}
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Tick["daily ticker (renewalCheckLoop)"]
|
||||
Check["RenewalService.CheckExpiringCertificates"]
|
||||
|
||||
Tick --> Check --> Loop
|
||||
|
||||
subgraph Loop["for cert in expiring (≤30 days)"]
|
||||
L1["1. Resolve RenewalPolicy"]
|
||||
L2["2. Compute daysUntil"]
|
||||
L3["3. updateCertExpiryStatus"]
|
||||
L4["4. sendThresholdAlerts"]
|
||||
L5["5. Create renewal job<br/>(if issuer registered +<br/>ARI allows)"]
|
||||
L1 --> L2 --> L3 --> L4 --> L5
|
||||
end
|
||||
|
||||
L4 --> Threshold
|
||||
|
||||
subgraph Threshold["per threshold"]
|
||||
T1["a. resolve severity tier<br/>via AlertSeverityMap"]
|
||||
T2["b. resolve channel set<br/>via AlertChannels[tier]"]
|
||||
T1 --> T2 --> Channel
|
||||
end
|
||||
|
||||
subgraph Channel["for each channel (fault-isolating)"]
|
||||
C1["i. dedup via notification_events<br/>(cert, threshold, channel)"]
|
||||
C2["ii. SendThresholdAlertOnChannel<br/>→ notifierRegistry[channel]<br/>→ Send(recipient, subj, body)"]
|
||||
C3["iii. record audit row<br/>event_type=expiration_alert_sent<br/>metadata.channel, metadata.severity_tier"]
|
||||
C4["iv. bump Prometheus counter<br/>certctl_expiry_alerts_total<br/>{channel, threshold, result}"]
|
||||
C1 --> C2 --> C3 --> C4
|
||||
end
|
||||
```
|
||||
|
||||
The dispatch loop's per-channel error handling is
|
||||
@@ -214,7 +217,7 @@ dedup on the `notification_events` table guards against that).
|
||||
|
||||
## V3-Pro forward path
|
||||
|
||||
Tracked at `cowork/WORKSPACE-ROADMAP.md` under "Adapter hardening":
|
||||
Tracked under "Adapter hardening" on the project roadmap:
|
||||
|
||||
- Per-owner / per-team / per-tenant channel routing (the matrix is
|
||||
per-policy today, not per-owner).
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl Security Posture & Operator Guidance
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This document collects the operator-facing security guidance that the source
|
||||
code's per-finding comment blocks reference. Each section names the audit
|
||||
finding it closes, the threat model, and the operator action required (if
|
||||
@@ -1,8 +1,10 @@
|
||||
# TLS on the Control Plane
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl's control plane is HTTPS-only as of v2.2. There is no plaintext `http://` listener, no `auto` mode, no dual-listener bridge, no TLS 1.2 escape hatch. The server refuses to start without a cert+key pair, the agent/CLI/MCP clients reject `http://` URLs at startup, and the Helm chart refuses to render without either an operator-supplied Secret or a cert-manager Certificate CR.
|
||||
|
||||
This doc covers four cert provisioning patterns, SIGHUP-based cert rotation, and the client-side CA-trust configuration agents and the CLI need to talk to the server. If you are upgrading from a pre-HTTPS release and want the step-by-step cutover procedure, read [`upgrade-to-tls.md`](upgrade-to-tls.md) first and come back here for reference.
|
||||
This doc covers four cert provisioning patterns, SIGHUP-based cert rotation, and the client-side CA-trust configuration agents and the CLI need to talk to the server. If you are upgrading from a pre-HTTPS release and want the step-by-step cutover procedure, read [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md) first and come back here for reference.
|
||||
|
||||
## What you get
|
||||
|
||||
@@ -154,7 +156,7 @@ Same three controls as CLI, env-var-driven only (no flags — MCP runs as a stdi
|
||||
- `CERTCTL_SERVER_CA_BUNDLE_PATH` optional CA bundle
|
||||
- `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY` optional skip
|
||||
|
||||
Claude Desktop / other MCP client configs should set all three in the tool's env block.
|
||||
MCP-client configs should set all three in the tool's env block.
|
||||
|
||||
## Troubleshooting: fail-loud preflight errors
|
||||
|
||||
@@ -173,7 +175,7 @@ Both files exist but `tls.LoadX509KeyPair` refused them. Typical causes: the pri
|
||||
The client did not trust the CA that signed the server cert. Either mount the CA bundle via `CERTCTL_SERVER_CA_BUNDLE_PATH`, add the CA to the system trust store on the client host, or (dev only) set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`.
|
||||
|
||||
**Client side: `tls: first record does not look like a TLS handshake`**
|
||||
The client is speaking plaintext HTTP to an HTTPS server (or vice-versa). Check that `CERTCTL_SERVER_URL` starts with `https://`. If you are upgrading from a pre-v2.2 release and your agents are old, they will surface this error until you roll the DaemonSet — see [`upgrade-to-tls.md`](upgrade-to-tls.md).
|
||||
The client is speaking plaintext HTTP to an HTTPS server (or vice-versa). Check that `CERTCTL_SERVER_URL` starts with `https://`. If you are upgrading from a pre-v2.2 release and your agents are old, they will surface this error until you roll the DaemonSet — see [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md).
|
||||
|
||||
## InsecureSkipVerify justifications (Audit L-001)
|
||||
|
||||
@@ -208,8 +210,8 @@ ignores `_test.go`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`upgrade-to-tls.md`](upgrade-to-tls.md) — one-step cutover from pre-HTTPS releases
|
||||
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough with HTTPS examples
|
||||
- [`test-env.md`](test-env.md) — integration test environment (also HTTPS-only)
|
||||
- [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md) — one-step cutover from pre-HTTPS releases
|
||||
- [`quickstart.md`](../getting-started/quickstart.md) — docker-compose walkthrough with HTTPS examples
|
||||
- [`test-env.md`](../contributor/test-environment.md) — integration test environment (also HTTPS-only)
|
||||
- [`security.md`](security.md) — overall security posture, OCSP Must-Staple guidance, encryption-at-rest spec
|
||||
- Milestone spec: `prompts/https-everywhere-milestone.md` (authoritative source for locked decisions)
|
||||
@@ -1,6 +1,8 @@
|
||||
# OpenAPI Specification Guide
|
||||
|
||||
certctl ships with a complete OpenAPI 3.1 specification at `api/openapi.yaml`. This spec documents all 78 API operations currently specified, every request/response schema, pagination conventions, authentication requirements, and error formats. It's the single source of truth for the documented REST API. (Note: The spec will be updated to include 7 additional certificate discovery endpoints from M18b.)
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl ships with a complete OpenAPI 3.1 specification at `api/openapi.yaml`. The spec documents every operation (re-derive count via `grep -cE '^\s+operationId:' api/openapi.yaml`), every request/response schema, pagination conventions, authentication requirements, and error formats. It's the single source of truth for the documented REST API.
|
||||
|
||||
This guide covers how to use the spec for API exploration, client SDK generation, and integration testing.
|
||||
|
||||
@@ -12,9 +14,8 @@ The spec lives at `api/openapi.yaml` in the repository root. It's versioned alon
|
||||
# View the spec
|
||||
cat api/openapi.yaml
|
||||
|
||||
# Count operations
|
||||
grep "operationId:" api/openapi.yaml | wc -l
|
||||
# 78 (includes health + ready, 7 discovery endpoints pending spec update)
|
||||
# Count operations (includes health + ready)
|
||||
grep -cE '^\s+operationId:' api/openapi.yaml
|
||||
```
|
||||
|
||||
## Viewing with Swagger UI
|
||||
@@ -151,7 +152,7 @@ npx @apidevtools/swagger-cli validate api/openapi.yaml
|
||||
Import the spec directly into Postman:
|
||||
|
||||
1. Open Postman → Import → File → select `api/openapi.yaml`
|
||||
2. Postman creates a collection with all 78 documented operations organized by tag
|
||||
2. Postman creates a collection with every documented operation organized by tag
|
||||
3. Set the `baseUrl` variable to `https://localhost:8443` (HTTPS-only as of v2.2)
|
||||
4. Add an `Authorization: Bearer your-api-key` header to the collection
|
||||
5. Import the demo stack CA bundle (`deploy/test/certs/ca.crt`) into Postman's Settings → Certificates → CA Certificates, or disable certificate verification for the `localhost` host (Settings → General → SSL certificate verification)
|
||||
@@ -191,6 +192,6 @@ This sends randomized valid requests to every endpoint and verifies the response
|
||||
## What's Next
|
||||
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the certctl API
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Connector Guide](connectors.md) — Build custom issuer and target connectors
|
||||
- [Quick Start](../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Connector Guide](connectors/index.md) — Build custom issuer and target connectors
|
||||
- [Architecture](architecture.md) — System design deep dive
|
||||
@@ -1,5 +1,7 @@
|
||||
# Architecture Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
@@ -61,7 +63,7 @@ flowchart TB
|
||||
API["REST API\n(Go net/http, :8443)"]
|
||||
SVC["Service Layer"]
|
||||
REPO["Repository Layer\n(database/sql + lib/pq)"]
|
||||
SCHED["Background Scheduler\n8 always-on + 4 optional loops"]
|
||||
SCHED["Background Scheduler\n9 always-on + 5 opt-in loops"]
|
||||
DASH["Web Dashboard\n(React SPA)"]
|
||||
end
|
||||
|
||||
@@ -493,11 +495,11 @@ Short-lived certificates (those with profile TTL < 1 hour) return "good" from OC
|
||||
|
||||
#### Bulk Revocation
|
||||
|
||||
For compliance events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
For incident-response events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
|
||||
### 4. Automatic Renewal
|
||||
|
||||
The control plane runs a scheduler with 8 always-on loops plus up to 4 optional loops (enabled by configuration). `internal/scheduler/scheduler.go:262-265` is the authoritative count.
|
||||
The control plane runs a scheduler with 9 always-on loops plus up to 5 opt-in loops (enabled by configuration). Re-derive the count via `grep -cE '^func \(s \*Scheduler\) [a-zA-Z]+Loop' internal/scheduler/scheduler.go`; the opt-in gating lives in `cmd/server/main.go` startup wiring (`cfg.NetworkScan.Enabled`, `digestService != nil`, `healthCheckService != nil`, `cloudDiscoveryService != nil`, `cfg.ACMEServer.Enabled && cfg.ACMEServer.GCInterval > 0`).
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
@@ -1042,7 +1044,7 @@ For deployments that need JWT/OIDC/mTLS, the standard pattern is to put an authe
|
||||
|
||||
### Concurrency Safety
|
||||
|
||||
The background scheduler uses `sync/atomic.Bool` idempotency guards on every loop (8 always-on plus up to 4 optional) — if a tick fires while the previous iteration is still running, it skips. A `sync.WaitGroup` tracks all in-flight goroutines. `WaitForCompletion(timeout)` blocks during shutdown until all work finishes or the timeout expires, preventing state corruption from mid-flight database operations during process exit.
|
||||
The background scheduler uses `sync/atomic.Bool` idempotency guards on every loop (9 always-on plus up to 5 opt-in) — if a tick fires while the previous iteration is still running, it skips. A `sync.WaitGroup` tracks all in-flight goroutines. `WaitForCompletion(timeout)` blocks during shutdown until all work finishes or the timeout expires, preventing state corruption from mid-flight database operations during process exit.
|
||||
|
||||
The job-processor tick fans the per-job work out across up to `CERTCTL_RENEWAL_CONCURRENCY` goroutines (default 25), gated by `golang.org/x/sync/semaphore.Weighted`. The cap is the operator's lever for "how many concurrent CA calls per scheduler tick" — operators with permissive upstream limits and large fleets (>10k certs) can bump to 100; operators with strict limits or async-CA-heavy fleets should stay at 25 or lower. Values ≤ 0 normalise to 1 (sequential). The Acquire is ctx-aware so a shutdown-driven ctx cancel interrupts the dispatch loop promptly; in-flight goroutines drain via Wait before the tick returns. Closes the #9 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (pre-fix the fan-out had no cap, so a 5,000-cert sweep tripped DigiCert / Entrust / Sectigo rate limits and the next tick re-fanned-out the same calls).
|
||||
|
||||
@@ -1094,11 +1096,11 @@ Health checks live outside the API prefix: `GET /health` and `GET /ready`.
|
||||
|
||||
## MCP Server
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server as a separate binary (`cmd/mcp-server/`) that enables AI assistants to interact with the certificate platform. The MCP server uses the official MCP Go SDK (`modelcontextprotocol/go-sdk`) with stdio transport for integration with Claude, Cursor, and other MCP-compatible tools.
|
||||
certctl includes an MCP (Model Context Protocol) server as a separate binary (`cmd/mcp-server/`) that enables AI assistants to interact with the certificate platform. The MCP server uses the official MCP Go SDK (`modelcontextprotocol/go-sdk`) with stdio transport for integration with any MCP-compatible AI client.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
AI["AI Assistant\n(Claude, Cursor)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
AI["AI Assistant\n(any MCP client)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
MCP -->|"HTTP + Bearer token"| API["certctl REST API\n:8443"]
|
||||
|
||||
subgraph "MCP Tools"
|
||||
@@ -1248,7 +1250,7 @@ flowchart TB
|
||||
|
||||
1. **Pluggable sources** — Each cloud provider implements the `DiscoverySource` interface (Name, Type, Discover, ValidateConfig). Three built-in sources: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
|
||||
2. **CloudDiscoveryService orchestrator** — Iterates registered sources, calls `Discover()` on each, feeds reports into `ProcessDiscoveryReport()`. Errors from one source don't prevent other sources from running
|
||||
3. **Scheduler integration** — opt-in cloud discovery scheduler loop (6h default; see `docs/architecture.md` 12-loop topology), runs immediately on startup, `atomic.Bool` idempotency guard
|
||||
3. **Scheduler integration** — opt-in cloud discovery scheduler loop (6h default; one of the 14 loops in the scheduler topology — see the Background Scheduler section above), runs immediately on startup, `atomic.Bool` idempotency guard
|
||||
4. **Sentinel agents** — Each source uses its own sentinel agent ID (`cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`) for dedup and triage filtering
|
||||
5. **Source path format** — `aws-sm://{region}/{secret}`, `azure-kv://{cert-name}/{version}`, `gcp-sm://{project}/{secret}`
|
||||
6. **No new schema** — Reuses existing `discovered_certificates` and `discovery_scans` tables. Sentinel agent IDs leverage existing `(fingerprint_sha256, agent_id, source_path)` dedup constraint
|
||||
@@ -1262,7 +1264,7 @@ flowchart TB
|
||||
- **Claims it** via `POST /discovered-certificates/{id}/claim` — links to existing managed cert or creates new enrollment
|
||||
- **Dismisses it** via `POST /discovered-certificates/{id}/dismiss` — removes from triage, marked as "Dismissed"
|
||||
9. **Status tracking** — `discovery_cert_claimed` and `discovery_cert_dismissed` events audit the operator's decision
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for compliance reporting)
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for inventory reporting)
|
||||
|
||||
This data flow is pull-based and non-blocking. Agents discover at their own pace; the server stores results for later review. There's no pressure to claim or dismiss; operators can leave certificates in "Unmanaged" status indefinitely.
|
||||
|
||||
@@ -1316,7 +1318,7 @@ For detailed test procedures, smoke tests, and the release sign-off checklist, s
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (see `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`). Pre-audit, certctl had no benchmarks or load tests for any API path, so any throughput claim was hand-waved; the harness in `deploy/test/loadtest/` substantiates the API-tier capacity numbers with reproducible methodology.
|
||||
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. Pre-audit, certctl had no benchmarks or load tests for any API path, so any throughput claim was hand-waved; the harness in `deploy/test/loadtest/` substantiates the API-tier capacity numbers with reproducible methodology.
|
||||
|
||||
The harness drives a k6 client at sustained 50 req/s × 2 scenarios × 5 minutes against a docker-compose stack of postgres + tls-init + certctl-server. Two scenarios run in parallel: `POST /api/v1/certificates` (issuance-acceptance hot path: auth + JSON decode + validation + service `CreateCertificate` + `managed_certificates` insert) and `GET /api/v1/certificates?per_page=50` (most-trafficked read endpoint). Hard regression-guard thresholds: p99 < 5 s for issuance-acceptance, p99 < 2 s for list, error rate < 1% globally. k6 exits non-zero on any threshold breach so a future PR that pushes p99 above the bar fails `make loadtest`. Run via `make loadtest` from the repo root or via `.github/workflows/loadtest.yml` (`workflow_dispatch` + weekly cron — never per-push).
|
||||
|
||||
@@ -1326,11 +1328,10 @@ Captured baseline numbers are committed in `deploy/test/loadtest/README.md` once
|
||||
|
||||
## What's Next
|
||||
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](demo-advanced.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors.md) — Build custom connectors
|
||||
- [Compliance Mapping](compliance.md) — SOC 2, PCI-DSS 4.0, and NIST SP 800-57 alignment
|
||||
- [Quick Start](../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](../getting-started/advanced-demo.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors/index.md) — Build custom connectors
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the API
|
||||
- [OpenAPI Spec](openapi.md) — Full API reference and SDK generation
|
||||
- [Testing Guide](testing-guide.md) — Test procedures and release sign-off
|
||||
- [Test Environment](test-env.md) — Docker Compose test environment setup
|
||||
- [API Reference](api.md) — OpenAPI 3.1 spec and SDK generation
|
||||
- [QA Test Suite](../contributor/qa-test-suite.md) — Test procedures and release sign-off
|
||||
- [Test Environment](../contributor/test-environment.md) — Docker Compose test environment setup
|
||||
@@ -0,0 +1,156 @@
|
||||
# certctl CLI
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
`certctl-cli` is the command-line interface to certctl. It wraps the REST API as terminal commands so operators and CI/CD pipelines can drive certctl without writing curl invocations.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
go install github.com/certctl-io/certctl/cmd/cli@latest
|
||||
```
|
||||
|
||||
The binary lands at `$GOBIN/cli` (or `$HOME/go/bin/cli` if `GOBIN` is unset). Rename to `certctl-cli` if you prefer.
|
||||
|
||||
## Configure
|
||||
|
||||
The CLI reads three environment variables:
|
||||
|
||||
```bash
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt
|
||||
```
|
||||
|
||||
Or pass them per-invocation:
|
||||
|
||||
```bash
|
||||
certctl-cli --server https://localhost:8443 --api-key your-key --ca-bundle ca.crt certs list
|
||||
```
|
||||
|
||||
For local development against a self-signed bootstrap cert, `--insecure` skips TLS verification. **Never set this in production.**
|
||||
|
||||
## Command groups
|
||||
|
||||
The CLI is organized by resource:
|
||||
|
||||
```
|
||||
certctl-cli certs [list|get|renew|revoke]
|
||||
certctl-cli agents [list|get]
|
||||
certctl-cli jobs [list|get|cancel]
|
||||
certctl-cli import [bulk PEM import]
|
||||
certctl-cli est [enroll|reenroll]
|
||||
certctl-cli status [server health + summary stats]
|
||||
certctl-cli version [CLI + server version]
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### List + filter certificates
|
||||
|
||||
```bash
|
||||
# All certs
|
||||
certctl-cli certs list
|
||||
|
||||
# Filter by environment
|
||||
certctl-cli certs list --env production
|
||||
|
||||
# JSON output (default is table)
|
||||
certctl-cli certs list --format json
|
||||
|
||||
# Sort + paginate
|
||||
certctl-cli certs list --sort -expires_at --limit 50
|
||||
|
||||
# Time-range filter (RFC 3339)
|
||||
certctl-cli certs list --expires-before 2026-06-01T00:00:00Z
|
||||
|
||||
# Sparse fields — only return the columns you need
|
||||
certctl-cli certs list --fields id,common_name,expires_at,status
|
||||
```
|
||||
|
||||
### Trigger renewal
|
||||
|
||||
```bash
|
||||
certctl-cli certs renew mc-api-prod
|
||||
# Returns the job id; track with: certctl-cli jobs get <job-id>
|
||||
|
||||
# Recovery: clear a stuck in-flight renewal so a new one can start
|
||||
certctl-cli certs renew mc-api-prod --force
|
||||
```
|
||||
|
||||
`--force` clears the server-side `RenewalInProgress` block — used when a previous renewal job hung without releasing the status flag. `--force` does NOT override `Archived` or `Expired` (those are terminal states; archived = decommissioned, expired = issue a new cert instead of renewing a dead one).
|
||||
|
||||
### Revoke
|
||||
|
||||
```bash
|
||||
# Single revoke — --reason is REQUIRED (no silent fallback to 'unspecified')
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
|
||||
# snake_case is accepted and normalised to camelCase before dispatch
|
||||
certctl-cli certs revoke mc-api-prod --reason key_compromise
|
||||
|
||||
# Bulk revoke by filter
|
||||
certctl-cli certs revoke --profile prof-deprecated --reason superseded
|
||||
certctl-cli certs revoke --team t-payments --reason cessationOfOperation
|
||||
certctl-cli certs revoke --issuer iss-old-vault --reason caCompromise
|
||||
```
|
||||
|
||||
`--reason` is mandatory: omitting it prints the canonical RFC 5280 §5.3.1 menu and exits non-zero. Compliance reporting (PCI-DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful, so the CLI no longer falls back silently. Valid camelCase set: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aaCompromise`. snake_case variants (`key_compromise`, `cessation_of_operation`, etc.) are accepted and normalised.
|
||||
|
||||
### Bulk import
|
||||
|
||||
```bash
|
||||
# Import a directory of PEMs
|
||||
certctl-cli import /etc/letsencrypt/live/
|
||||
|
||||
# Import a single concatenated bundle
|
||||
certctl-cli import certs.pem
|
||||
```
|
||||
|
||||
Each cert lands in the inventory as `Unmanaged` (per the discovery model). Triage from the dashboard or via `certctl-cli certs claim <id>` once you've decided to actively manage it.
|
||||
|
||||
### EST enrollment
|
||||
|
||||
```bash
|
||||
# Enroll a new device cert via EST simpleenroll
|
||||
certctl-cli est enroll --csr device.csr --output device.crt
|
||||
|
||||
# Re-enroll (renew) an existing device cert
|
||||
certctl-cli est reenroll --csr device.csr --client-cert device.crt --client-key device.key
|
||||
```
|
||||
|
||||
### Server status
|
||||
|
||||
```bash
|
||||
certctl-cli status
|
||||
# Health: ok
|
||||
# Total certificates: 145
|
||||
# Expiring (30d): 12
|
||||
# Active jobs: 3
|
||||
# Pending renewals: 8
|
||||
```
|
||||
|
||||
## Output formats
|
||||
|
||||
- `--format table` (default) — human-readable terminal output
|
||||
- `--format json` — JSON for piping into `jq`, scripts, dashboards
|
||||
|
||||
The CLI is built with Go's standard library only — no external dependencies. The binary is small (~10MB) and statically linked.
|
||||
|
||||
## Wiring into CI/CD
|
||||
|
||||
Common pattern: a CI step that issues a cert from your internal CA, deploys it via certctl, and verifies the deploy:
|
||||
|
||||
```bash
|
||||
certctl-cli certs renew mc-api-prod --wait
|
||||
certctl-cli jobs get $(certctl-cli certs renew mc-api-prod --json | jq -r '.job_id') --wait
|
||||
certctl-cli certs get mc-api-prod --json | jq -r '.expires_at'
|
||||
```
|
||||
|
||||
The `--wait` flag blocks until the job reaches a terminal state (Completed / Failed / Cancelled), which is what CI scripts actually need.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/reference/api.md`](api.md) — the OpenAPI 3.1 spec the CLI wraps
|
||||
- [`docs/reference/mcp.md`](mcp.md) — the MCP server that exposes the same surface to AI assistants
|
||||
- [`docs/contributor/qa-prerequisites.md`](../contributor/qa-prerequisites.md) — local environment setup before the CLI can talk to a server
|
||||
@@ -0,0 +1,98 @@
|
||||
# Configuration Reference
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Compact reference for `CERTCTL_*` environment variables consumed by
|
||||
`certctl-server` and `certctl-agent`. Most operators don't need to
|
||||
touch these — defaults are tuned for the common case. Reach for them
|
||||
when the system's behaviour needs tuning beyond what's exposed in the
|
||||
GUI / API.
|
||||
|
||||
This page enumerates the operator-tunable knobs that don't have a
|
||||
dedicated home elsewhere. Connector-specific env vars are documented
|
||||
on the per-connector pages under
|
||||
[`docs/reference/connectors/`](connectors/index.md). Protocol env
|
||||
vars (ACME server, EST, SCEP) are documented under
|
||||
[`docs/reference/protocols/`](protocols/). TLS env vars are
|
||||
documented in [`docs/operator/tls.md`](../operator/tls.md).
|
||||
|
||||
## Scheduler intervals
|
||||
|
||||
The scheduler runs N background loops; intervals are tunable for
|
||||
performance / contention tuning.
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | `2m` | How often the agent-health loop scans for stale heartbeats and transitions agents to `Unhealthy` / `Offline`. |
|
||||
| `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | `30s` | How often the job-processor loop dispatches `Pending` jobs to agents. |
|
||||
| `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | `1m` | How often the notification-dispatcher loop fans out queued alerts to channels. |
|
||||
| `CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL` | `5m` | How often the short-lived-expiry loop watches certs whose TTL is less than 1h for imminent expiry. |
|
||||
|
||||
For the full scheduler topology (14 loops, 9 always-on + 5 opt-in)
|
||||
see [`architecture.md`](architecture.md) "Scheduler topology".
|
||||
|
||||
## Job lifecycle
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_JOB_AWAITING_CSR_TIMEOUT` | `24h` | How long a job stays in `AwaitingCSR` before the scheduler marks it `Failed` (the agent never picked it up). |
|
||||
|
||||
## Rate limiting
|
||||
|
||||
The control plane API is rate-limited by default; tune for
|
||||
high-volume environments (mass-rotation events, bulk imports).
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_RATE_LIMIT_ENABLED` | `true` | Master toggle. Disable only for trusted-network single-tenant deploys where the API is firewall-protected. |
|
||||
| `CERTCTL_RATE_LIMIT_PER_USER_RPS` | `0` (= use global default) | Per-user requests-per-second cap. Zero opts each user into the global default in `internal/api/middleware`. |
|
||||
| `CERTCTL_RATE_LIMIT_PER_USER_BURST` | `0` (= use global default) | Per-user token-bucket burst size. Same opt-in semantics. |
|
||||
|
||||
## Audit trail
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS` | `30` | How long the audit-event flush worker waits for the buffered batch to drain before forcing a flush at shutdown. |
|
||||
|
||||
## Deploy verification
|
||||
|
||||
The deploy-hardening primitive wraps every cert deploy in
|
||||
atomic-write + post-verify + rollback. These env vars tune the
|
||||
post-deploy TLS verification phase.
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_VERIFY_DEPLOYMENT` | `true` | Master toggle for post-deploy TLS verify. Disable only for connectors / environments where the verify endpoint is not reachable from the agent. |
|
||||
| `CERTCTL_VERIFY_DELAY` | `2s` | How long to wait after the reload command completes before the first verify-handshake attempt (gives the daemon time to pick up new keys). |
|
||||
| `CERTCTL_VERIFY_TIMEOUT` | `10s` | Per-attempt TLS-handshake timeout. |
|
||||
| `CERTCTL_DEPLOY_BACKUP_RETENTION` | `3` | How many `.certctl-bak.<unix-nanos>.<ext>` rollback snapshots to keep per target after a successful deploy. `0` uses the default of 3; `-1` opts out of pruning entirely. |
|
||||
|
||||
For the full deploy contract see
|
||||
[`deployment-model.md`](deployment-model.md).
|
||||
|
||||
## Database
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_DATABASE_MIGRATIONS_PATH` | `./migrations` | Filesystem path to the `*.up.sql` / `*.down.sql` migration set. Override only when running `certctl-server` from a non-standard layout. |
|
||||
|
||||
## Agent
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_AGENT_ID` | (none — required) | The agent's unique ID, issued by `POST /api/v1/agents/register` and bundled into the agent's registration response. Pass via this env var when the agent runs as a systemd unit / container without the `-agent-id` CLI flag. |
|
||||
|
||||
## SCEP profile binding (single-profile back-compat)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SCEP_PROFILE_ID` | (empty) | Optional certificate profile ID for the legacy single-profile SCEP path. The multi-profile path uses `CERTCTL_SCEP_PROFILES=<list>` + `CERTCTL_SCEP_PROFILE_<NAME>_PROFILE_ID` instead — see [`scep-server.md`](protocols/scep-server.md). |
|
||||
|
||||
## Related references
|
||||
|
||||
- [`architecture.md`](architecture.md) — scheduler topology, system design, security model
|
||||
- [`deployment-model.md`](deployment-model.md) — atomic write + verify + rollback contract
|
||||
- [`operator/security.md`](../operator/security.md) — full security posture (auth, rate limits, encryption at rest)
|
||||
- [`operator/tls.md`](../operator/tls.md) — control-plane TLS env vars
|
||||
- Per-connector pages under [`reference/connectors/`](connectors/index.md) for connector-specific config
|
||||
- Per-protocol pages under [`reference/protocols/`](protocols/) for ACME / SCEP / EST / CRL+OCSP / async-CA polling
|
||||
@@ -0,0 +1,235 @@
|
||||
# ACME Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the outbound ACME v2 issuer
|
||||
> connector (certctl as an ACME *client*). For the inbound ACME
|
||||
> server (certctl as an ACME *server*), see
|
||||
> [acme-server.md](../protocols/acme-server.md). For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The ACME connector implements the full ACME v2 protocol (RFC 8555)
|
||||
using Go's `golang.org/x/crypto/acme` package. It supports three
|
||||
challenge methods and ARI (RFC 9773) for renewal-window negotiation.
|
||||
|
||||
Compatible CAs include Let's Encrypt, ZeroSSL, Sectigo, Buypass,
|
||||
Google Trust Services, SSL.com, and any other RFC 8555 ACME
|
||||
implementation. step-ca's ACME directory is also compatible if you
|
||||
prefer ACME over the native step-ca connector.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/acme/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the ACME connector when:
|
||||
|
||||
- You need public-trust certificates (Let's Encrypt, ZeroSSL,
|
||||
Sectigo via ACME, Google Trust Services, SSL.com).
|
||||
- You want certctl to drive renewal lifecycle on top of the ACME
|
||||
CA's free or paid issuance.
|
||||
- You want one tool that covers both internal PKI (Local, Vault,
|
||||
step-ca) and public-trust ACME issuance.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You need OV / EV certificates and your CA doesn't expose them
|
||||
via ACME — use the DigiCert or Sectigo SCM REST connectors.
|
||||
- You're standing up internal-only PKI and don't want to operate
|
||||
ACME challenge infrastructure — use Local CA or Vault PKI for a
|
||||
simpler synchronous path.
|
||||
|
||||
## Challenge methods
|
||||
|
||||
### HTTP-01 (default)
|
||||
|
||||
A built-in temporary HTTP server starts on demand during
|
||||
certificate issuance. The domain being validated must resolve to
|
||||
the machine running the connector, and the configured HTTP port
|
||||
must be reachable from the internet.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-staging-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"http_port": 80
|
||||
}
|
||||
```
|
||||
|
||||
### DNS-01 (for wildcards)
|
||||
|
||||
Creates DNS TXT records via user-provided scripts. Required for
|
||||
wildcard certificates (`*.example.com`) and hosts that can't serve
|
||||
HTTP on port 80. The connector invokes external scripts to create
|
||||
and clean up `_acme-challenge` TXT records, making it compatible
|
||||
with any DNS provider (Cloudflare, Route53, Azure DNS, etc.).
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"challenge_type": "dns-01",
|
||||
"dns_present_script": "/etc/certctl/dns/create-record.sh",
|
||||
"dns_cleanup_script": "/etc/certctl/dns/delete-record.sh",
|
||||
"dns_propagation_wait": 30
|
||||
}
|
||||
```
|
||||
|
||||
DNS hook scripts receive these environment variables:
|
||||
|
||||
- `CERTCTL_DNS_DOMAIN` — domain being validated
|
||||
- `CERTCTL_DNS_FQDN` — full record name (`_acme-challenge.<domain>`
|
||||
for dns-01, `_validation-persist.<domain>` for dns-persist-01)
|
||||
- `CERTCTL_DNS_VALUE` — TXT record value
|
||||
- `CERTCTL_DNS_TOKEN` — ACME challenge token
|
||||
|
||||
The present script must create the TXT record and exit 0; the
|
||||
cleanup script removes it (dns-01 only).
|
||||
|
||||
### DNS-PERSIST-01 (standing record)
|
||||
|
||||
Creates a one-time persistent TXT record at
|
||||
`_validation-persist.<domain>` containing the CA's issuer domain
|
||||
and your ACME account URI. Once set, this record authorizes
|
||||
unlimited future certificate issuances without per-renewal DNS
|
||||
updates. Based on
|
||||
[draft-ietf-acme-dns-persist](https://datatracker.ietf.org/doc/draft-ietf-acme-dns-persist/)
|
||||
and CA/Browser Forum ballot SC-088v3.
|
||||
|
||||
If the CA doesn't offer dns-persist-01 yet, the connector falls
|
||||
back to dns-01 automatically.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"challenge_type": "dns-persist-01",
|
||||
"dns_present_script": "/etc/certctl/dns/create-record.sh",
|
||||
"dns_persist_issuer_domain": "letsencrypt.org",
|
||||
"dns_propagation_wait": 30
|
||||
}
|
||||
```
|
||||
|
||||
The present script creates a TXT record at
|
||||
`_validation-persist.<domain>` with the value
|
||||
`letsencrypt.org; accounturi=https://acme-v02.api.letsencrypt.org/acme/acct/<your-id>`.
|
||||
This record is permanent — no cleanup script is needed.
|
||||
|
||||
## ACME Renewal Information (ARI, RFC 9773)
|
||||
|
||||
Instead of using fixed renewal thresholds (e.g. renew 30 days
|
||||
before expiry), certctl can ask the CA when it should renew.
|
||||
Enable with `CERTCTL_ACME_ARI_ENABLED=true`.
|
||||
|
||||
The ARI protocol lets the CA specify a `suggestedWindow` (start
|
||||
and end times) for when you should renew — useful for distributing
|
||||
load during maintenance windows or coordinating mass-revocation
|
||||
scenarios. Cert ID is computed as `base64url(SHA-256(DER cert))`.
|
||||
|
||||
If the CA doesn't support ARI (404 response), certctl
|
||||
automatically falls back to threshold-based renewal with no
|
||||
operator intervention required.
|
||||
|
||||
## External Account Binding (EAB)
|
||||
|
||||
ZeroSSL, Google Trust Services, and SSL.com require EAB for ACME
|
||||
account registration. For most CAs, get your EAB credentials from
|
||||
the CA's dashboard and provide them via `eab_kid` and `eab_hmac`.
|
||||
The HMAC key must be base64url-encoded (no padding). CAs that
|
||||
don't require EAB (Let's Encrypt, Buypass) ignore these fields.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme.zerossl.com/v2/DV90",
|
||||
"email": "admin@example.com",
|
||||
"eab_kid": "your-zerossl-eab-kid",
|
||||
"eab_hmac": "your-zerossl-eab-hmac-base64url"
|
||||
}
|
||||
```
|
||||
|
||||
### ZeroSSL auto-EAB
|
||||
|
||||
When the directory URL points to ZeroSSL and no EAB credentials
|
||||
are provided, certctl automatically fetches them from ZeroSSL's
|
||||
public API (`api.zerossl.com/acme/eab-credentials-email`) using
|
||||
your configured email address. No dashboard visit required — just
|
||||
set the directory URL and email. Same approach used by Caddy and
|
||||
acme.sh.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme.zerossl.com/v2/DV90",
|
||||
"email": "admin@example.com"
|
||||
}
|
||||
```
|
||||
|
||||
## Certificate profiles (Let's Encrypt, GA January 2026)
|
||||
|
||||
Let's Encrypt supports ACME certificate profile selection. Set
|
||||
`CERTCTL_ACME_PROFILE=shortlived` to request 6-day certificates —
|
||||
ideal for ephemeral workloads where short validity substitutes for
|
||||
revocation. The `tlsserver` profile produces standard TLS
|
||||
certificates. When the profile field is empty (default), the CA
|
||||
uses its default profile.
|
||||
|
||||
## Environment variables
|
||||
|
||||
- `CERTCTL_ACME_DIRECTORY_URL` — ACME directory URL
|
||||
- `CERTCTL_ACME_EMAIL` — Contact email for account registration
|
||||
- `CERTCTL_ACME_EAB_KID` — External Account Binding Key ID
|
||||
- `CERTCTL_ACME_EAB_HMAC` — External Account Binding HMAC key
|
||||
(base64url-encoded)
|
||||
- `CERTCTL_ACME_CHALLENGE_TYPE` — `http-01` (default), `dns-01`,
|
||||
or `dns-persist-01`
|
||||
- `CERTCTL_ACME_DNS_PRESENT_SCRIPT` — Path to DNS record creation
|
||||
script
|
||||
- `CERTCTL_ACME_DNS_CLEANUP_SCRIPT` — Path to DNS record cleanup
|
||||
script (dns-01 only)
|
||||
- `CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN` — CA issuer domain for
|
||||
persistent record (dns-persist-01 only)
|
||||
- `CERTCTL_ACME_PROFILE` — Certificate profile for the newOrder
|
||||
request
|
||||
|
||||
## Revocation by serial number (Top-10 fix #7)
|
||||
|
||||
RFC 8555 §7.6 requires the certificate DER bytes (not just the
|
||||
serial) on the revoke wire — but a CLM platform's job is to
|
||||
abstract over that limitation. Operators routinely have only the
|
||||
serial in hand: the original PEM was lost, the private key was
|
||||
rotated, the operator clicked "revoke" in the GUI based on a row
|
||||
in the certs list.
|
||||
|
||||
certctl's ACME
|
||||
`RevokeCertificate(ctx, RevocationRequest{Serial: ...})` looks the
|
||||
serial up in the local cert store
|
||||
(`certificate_versions.pem_chain`), decodes the leaf-cert PEM into
|
||||
DER, and calls the ACME revoke endpoint with
|
||||
`(accountKey, der, reasonCode)` — RFC 8555 §7.6 case 1,
|
||||
"revocation request signed with account key". This works because
|
||||
the same account key issued the cert, so authority is intrinsic.
|
||||
|
||||
The cert version must exist in the local store: this means the
|
||||
cert was issued through certctl, not imported. If
|
||||
`GetVersionBySerial` returns `sql.ErrNoRows`, the connector
|
||||
returns an actionable error pointing at the local-store
|
||||
requirement. Revoke-by-serial is therefore only available for
|
||||
ACME certs that certctl issued.
|
||||
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to
|
||||
`unspecified` (0), and the connector accepts the canonical
|
||||
camelCase form (`keyCompromise`, `cACompromise`,
|
||||
`affiliationChanged`, `superseded`, `cessationOfOperation`,
|
||||
`certificateHold`, `removeFromCRL`, `privilegeWithdrawn`,
|
||||
`aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE
|
||||
variants. An unknown reason returns an error rather than silently
|
||||
demoting to `unspecified` — operators rely on the reason for
|
||||
audit reporting.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [ACME server](../protocols/acme-server.md) — certctl *as* an ACME server (the inverse direction)
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [migration/acme-from-cert-manager.md](../../migration/acme-from-cert-manager.md) — point cert-manager at certctl's ACME server
|
||||
- [migration/acme-from-traefik.md](../../migration/acme-from-traefik.md) — point Traefik at certctl's ACME server
|
||||
@@ -0,0 +1,112 @@
|
||||
# Active Directory Certificate Services (ADCS) Integration — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for integrating certctl with Microsoft
|
||||
> ADCS as the enterprise root. For the connector-development context
|
||||
> (interface contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
ADCS integration is **not** a separate connector. certctl integrates
|
||||
with ADCS via the **sub-CA mode** of the Local CA issuer: certctl
|
||||
operates as a subordinate CA whose signing certificate was issued by
|
||||
ADCS, so all certctl-issued certificates chain back to the enterprise
|
||||
ADCS root.
|
||||
|
||||
This is the canonical pattern for Windows-shop deployments where
|
||||
ADCS is already the root of trust and operators want certctl to
|
||||
handle automation (lifecycle, renewal, deployment, alerts) without
|
||||
ADCS having to support a non-Microsoft REST API surface.
|
||||
|
||||
## When to use this integration
|
||||
|
||||
Use ADCS sub-CA mode when:
|
||||
|
||||
- ADCS is your enterprise root and you don't want to introduce a
|
||||
parallel root of trust.
|
||||
- You want all certctl-issued certificates to validate against the
|
||||
ADCS chain that's already in your Windows trust stores, mobile
|
||||
device profiles, and load-balancer configurations.
|
||||
- You need certctl's automation surface (ACME, SCEP, EST, profile
|
||||
policy, scheduler, deployment connectors) but want ADCS to remain
|
||||
the signing authority for the root.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You want certctl to issue from its own root of trust — use the
|
||||
Local CA issuer in self-signed mode.
|
||||
- ADCS is being decommissioned or replaced — the migration path
|
||||
from ADCS to Vault PKI / step-ca / Local CA needs its own
|
||||
rollout plan; that's not what this connector covers.
|
||||
|
||||
## How sub-CA mode works
|
||||
|
||||
The Local CA issuer loads a pre-signed CA certificate and key from
|
||||
disk:
|
||||
|
||||
- `CERTCTL_CA_CERT_PATH` — path to the certctl signing cert PEM
|
||||
(the one ADCS issued).
|
||||
- `CERTCTL_CA_KEY_PATH` — path to the matching private key PEM.
|
||||
|
||||
Every leaf certctl issues is signed with this key, and the chain
|
||||
returned to clients includes both the certctl signing cert and the
|
||||
ADCS root (so verifying clients see a complete chain to the
|
||||
enterprise root).
|
||||
|
||||
The signing certificate certctl uses is just a normal CA cert with
|
||||
`Basic Constraints: CA=true` and an appropriate path-length
|
||||
constraint. ADCS issues this certificate using its standard
|
||||
"Subordinate Certification Authority" template; the operator just
|
||||
takes the resulting cert + key and points certctl at them.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Provisioning the certctl sub-CA
|
||||
|
||||
1. Generate a new keypair for certctl on the host that will run it
|
||||
(or in the HSM / KMS the operator wants to delegate signing to,
|
||||
via the `internal/crypto/signer/` driver interface when alternate
|
||||
drivers are configured).
|
||||
2. Build a CSR with `Basic Constraints: CA=true`, the operator's
|
||||
chosen path-length constraint, and key usages including
|
||||
`keyCertSign` and `cRLSign`.
|
||||
3. Submit the CSR to ADCS using the Subordinate Certification
|
||||
Authority template (or a custom template that grants those key
|
||||
usages).
|
||||
4. Place the signed certctl-cert and the matching key at
|
||||
`CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH`.
|
||||
5. Restart certctl-server (or Rebuild the issuer via the API).
|
||||
Subsequent issuance chains to the ADCS root.
|
||||
|
||||
### Rotating the sub-CA cert
|
||||
|
||||
When the certctl sub-CA cert is approaching expiry:
|
||||
|
||||
1. Generate a new keypair (re-keying is recommended at sub-CA
|
||||
rotation time).
|
||||
2. CSR + ADCS signing cycle as above.
|
||||
3. Stage the new cert and key at fresh on-disk paths and follow the
|
||||
[intermediate-CA hierarchy
|
||||
runbook](../intermediate-ca-hierarchy.md) for the cutover (rotate
|
||||
`CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH` to the new files
|
||||
when ready). The
|
||||
key concern is overlap: both the old and new sub-CA certs must
|
||||
chain to the ADCS root during the rollover so existing leaves
|
||||
keep validating.
|
||||
|
||||
### Revocation chain
|
||||
|
||||
CRL and OCSP for ADCS-rooted leaves are handled by certctl's CRL
|
||||
distribution point and OCSP responder
|
||||
([crl-ocsp.md](../protocols/crl-ocsp.md)). The ADCS root publishes
|
||||
its own CRL covering the certctl sub-CA cert; relying parties walk
|
||||
both CDP entries to determine the full revocation status.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Local CA issuer](index.md#built-in-local-ca) — the connector this integration uses
|
||||
- [Intermediate CA hierarchy](../intermediate-ca-hierarchy.md) — how certctl manages multi-level CA trees, including ADCS-rooted setups
|
||||
- [CRL and OCSP](../protocols/crl-ocsp.md) — how relying parties validate ADCS-rooted leaves
|
||||
- [Architecture](../architecture.md) — `internal/crypto/signer/` driver interface for HSM / KMS / cloud-KMS alternatives to file-on-disk for the certctl sub-CA private key
|
||||
@@ -1,6 +1,11 @@
|
||||
# Apache httpd Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -73,7 +78,7 @@ per-file ownership is preserved per Bundle I Phase 5.
|
||||
`TestVendorEdge_Apache_ReloadVsRestart_PreservesConnections_E2E`
|
||||
|
||||
In-flight TLS sessions survive `apachectl graceful` worker
|
||||
swap. Documented in `docs/deployment-atomicity.md`.
|
||||
swap. Documented in `docs/reference/deployment-model.md`.
|
||||
|
||||
### SNI server_name binding
|
||||
|
||||
@@ -97,5 +102,5 @@ supplied ordering across rotation.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
@@ -0,0 +1,165 @@
|
||||
# AWS ACM Private CA Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the AWS Certificate Manager
|
||||
> Private Certificate Authority (ACM PCA) issuer connector. For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
AWS ACM Private CA is a managed private CA on AWS. The connector
|
||||
calls `IssueCertificate` (which is asynchronous at the ACM PCA API
|
||||
level), then runs the SDK's `NewCertificateIssuedWaiter` until the
|
||||
cert reaches `CERTIFICATE_ISSUED` state, then `GetCertificate` to
|
||||
retrieve the PEM. Default waiter timeout is 5 minutes; tune by
|
||||
editing `defaultWaiterTimeout` in
|
||||
`internal/connector/issuer/awsacmpca/awsacmpca.go`.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/awsacmpca/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the AWS ACM PCA connector when:
|
||||
|
||||
- Your workloads are AWS-native and you want the CA to live inside
|
||||
your AWS account (for blast-radius, IAM, and audit reasons).
|
||||
- You need ACM PCA's CRL distribution and OCSP responder to serve
|
||||
status to relying parties without certctl being in the OCSP path.
|
||||
- You want IAM-based access control (no API keys to rotate) for
|
||||
certctl's signing path.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're not on AWS — Google CAS or Azure Key Vault are the cloud-
|
||||
native equivalents on those platforms.
|
||||
- You need public-trust certificates — ACM PCA is private only.
|
||||
- You don't already pay for ACM PCA (it has a non-trivial monthly
|
||||
cost). Vault, step-ca, or the Local CA issuer are free
|
||||
self-hosted alternatives.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_AWS_PCA_REGION` | Yes | — | AWS region (e.g. `us-east-1`) |
|
||||
| `CERTCTL_AWS_PCA_CA_ARN` | Yes | — | ARN of the ACM Private CA |
|
||||
| `CERTCTL_AWS_PCA_SIGNING_ALGORITHM` | No | `SHA256WITHRSA` | Signing algorithm |
|
||||
| `CERTCTL_AWS_PCA_VALIDITY_DAYS` | No | `365` | Certificate validity in days |
|
||||
| `CERTCTL_AWS_PCA_TEMPLATE_ARN` | No | — | Optional certificate template ARN |
|
||||
|
||||
Supported signing algorithms: `SHA256WITHRSA`, `SHA384WITHRSA`,
|
||||
`SHA512WITHRSA`, `SHA256WITHECDSA`, `SHA384WITHECDSA`,
|
||||
`SHA512WITHECDSA`.
|
||||
|
||||
## Authentication
|
||||
|
||||
Standard AWS credential chain via
|
||||
`aws-sdk-go-v2/config.LoadDefaultConfig()`. Resolves credentials in
|
||||
this order:
|
||||
|
||||
1. Environment variables (`AWS_ACCESS_KEY_ID`,
|
||||
`AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`).
|
||||
2. Shared config files (`~/.aws/config`, `~/.aws/credentials`,
|
||||
profile via `AWS_PROFILE`).
|
||||
3. IAM Roles for Service Accounts (IRSA) on EKS.
|
||||
4. EC2 instance profiles.
|
||||
5. ECS task roles.
|
||||
6. SSO.
|
||||
|
||||
certctl never stores AWS credentials directly — set them in the
|
||||
certctl process's environment or via the IAM role attached to the
|
||||
host.
|
||||
|
||||
## Minimal IAM policy
|
||||
|
||||
The IAM principal that certctl authenticates as needs the following
|
||||
actions against the CA's ARN:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"acm-pca:IssueCertificate",
|
||||
"acm-pca:GetCertificate",
|
||||
"acm-pca:RevokeCertificate",
|
||||
"acm-pca:GetCertificateAuthorityCertificate"
|
||||
],
|
||||
"Resource": "arn:aws:acm-pca:us-east-1:123456789012:certificate-authority/12345678-1234-1234-1234-123456789012"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Replace the `Resource` ARN with your own CA ARN. If you use a
|
||||
`TemplateArn` (subordinate-CA template), the policy needs no
|
||||
additional permissions — `IssueCertificate` covers it.
|
||||
|
||||
## Worked example: add the issuer via API
|
||||
|
||||
```bash
|
||||
curl -k -X POST https://localhost:8443/api/v1/issuers \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"id": "iss-aws-prod",
|
||||
"name": "AWS ACM PCA (prod)",
|
||||
"type": "AWSACMPCA",
|
||||
"config": {
|
||||
"region": "us-east-1",
|
||||
"ca_arn": "arn:aws:acm-pca:us-east-1:123456789012:certificate-authority/12345678-1234-1234-1234-123456789012",
|
||||
"signing_algorithm": "SHA256WITHRSA",
|
||||
"validity_days": 90
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
The certctl server process must have AWS credentials available
|
||||
before the issuer is created (or before any subsequent issuance
|
||||
call). For a local dev run with shared-config creds:
|
||||
`export AWS_PROFILE=my-profile` before `docker compose up`. For an
|
||||
EKS deployment: attach an IRSA-bound IAM role to the certctl pod's
|
||||
service account.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `AccessDeniedException: User ... is not authorized to perform: acm-pca:IssueCertificate`
|
||||
|
||||
The IAM principal certctl is using lacks the required actions.
|
||||
Apply the IAM policy above (scoped to your CA ARN) to the
|
||||
role/user. The principal can be inspected with
|
||||
`aws sts get-caller-identity` from the certctl host.
|
||||
|
||||
### `ResourceNotFoundException: Could not find Certificate Authority`
|
||||
|
||||
The `CAArn` doesn't match any CA in the configured region. Common
|
||||
causes: region mismatch (CA is in `us-west-2`, certctl region is
|
||||
set to `us-east-1`), CA was deleted, ARN typo. Verify with
|
||||
`aws acm-pca describe-certificate-authority --certificate-authority-arn <arn> --region <region>`.
|
||||
|
||||
### `acmpca waiter (waiting for issuance): exceeded max wait time`
|
||||
|
||||
The cert was submitted but didn't reach `CERTIFICATE_ISSUED` state
|
||||
within 5 minutes. Check the CA's CloudWatch metrics for backlog;
|
||||
check the CA's audit reports for any policy violations on the
|
||||
request. If the wait is consistently slow, edit
|
||||
`defaultWaiterTimeout` in
|
||||
`internal/connector/issuer/awsacmpca/awsacmpca.go` and rebuild.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by AWS ACM PCA directly. certctl records
|
||||
revocations locally and notifies AWS via the `RevokeCertificate`
|
||||
API with RFC 5280 reason mapping (e.g. `keyCompromise` →
|
||||
`KEY_COMPROMISE`). AWS ACM PCA's CRL distribution point and OCSP
|
||||
responder serve the resulting status to verifying clients —
|
||||
certctl is **not** in the OCSP path for this connector.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — bounded-polling primitive (ACM PCA uses the SDK waiter, not certctl's polling, but the same operator concerns apply)
|
||||
- [Disaster recovery runbook](../../operator/runbooks/disaster-recovery.md) — what happens to ACM PCA-issued certs if the CA is deleted
|
||||
@@ -0,0 +1,208 @@
|
||||
# AWS Certificate Manager (ACM) Target Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the AWS Certificate Manager
|
||||
> (ACM) target connector. For the connector-development context
|
||||
> (interface contract, registry, atomic deploy primitive shared
|
||||
> across all targets), see the [connector index](index.md).
|
||||
>
|
||||
> **Note:** this is the **target** connector that deploys
|
||||
> certificates *into* ACM for ALB / CloudFront / API Gateway / App
|
||||
> Runner consumption. The **issuer** connector that pulls certs
|
||||
> *from* AWS ACM Private CA is documented separately at
|
||||
> [aws-acm-pca.md](aws-acm-pca.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The AWS ACM target connector deploys certificates into AWS
|
||||
Certificate Manager — the public AWS service that ALB /
|
||||
CloudFront / API Gateway / App Runner consume by ARN. Closes the
|
||||
"we terminate TLS at AWS, how do we get certctl-issued certs to
|
||||
ALB?" question for cloud-first deployments. Rank 5 of the
|
||||
2026-05-03 Infisical deep-research deliverable.
|
||||
|
||||
Implementation lives at `internal/connector/target/awsacm/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the AWS ACM target connector when:
|
||||
|
||||
- TLS terminates at AWS-managed edges (ALB, CloudFront, API
|
||||
Gateway, App Runner) and those services consume certs by ACM
|
||||
ARN.
|
||||
- You want certctl to drive the rotation while Terraform /
|
||||
CloudFormation handles the ARN-to-resource attachment.
|
||||
- You need short-lived IAM credentials (IRSA, instance profiles)
|
||||
rather than long-lived access keys.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The target is an EC2 instance running NGINX / HAProxy / Apache
|
||||
directly — those connectors are simpler than the ACM round-trip.
|
||||
- You're using ACM Private CA for internal trust — that's the
|
||||
[aws-acm-pca.md](aws-acm-pca.md) issuer, a different connector.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"region": "us-east-1",
|
||||
"certificate_arn": "arn:aws:acm:us-east-1:123456789012:certificate/abcdef01-2345-6789-abcd-ef0123456789",
|
||||
"tags": {"env": "production", "app": "api-gateway"}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `region` | (required) | AWS region for the ACM endpoint (e.g. `us-east-1`). CloudFront-attached certs MUST live in `us-east-1`; ALB / API Gateway use the same region as the load balancer. |
|
||||
| `certificate_arn` | — | ARN of an existing ACM certificate to rotate in place. Empty on first deploy — the adapter creates a new ACM cert via `ImportCertificate` and the deployment record's Metadata captures the resulting ARN. Operators can also pre-create the ARN out-of-band (Terraform, CloudFormation) and pin it here. |
|
||||
| `tags` | — | Tags applied to the ACM cert at first import + re-applied via `AddTagsToCertificate` on every subsequent import (ACM strips tags on re-import). The reserved keys `certctl-managed-by` and `certctl-certificate-id` are set automatically and cannot be overridden. |
|
||||
|
||||
## IAM policy (minimum permissions)
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"acm:ImportCertificate",
|
||||
"acm:GetCertificate",
|
||||
"acm:DescribeCertificate",
|
||||
"acm:ListCertificates",
|
||||
"acm:AddTagsToCertificate"
|
||||
],
|
||||
"Resource": "arn:aws:acm:*:*:certificate/*"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Auth recipes
|
||||
|
||||
- **IRSA (IAM Roles for Service Accounts) — recommended for K8s
|
||||
deploys.** Annotate the agent's ServiceAccount with
|
||||
`eks.amazonaws.com/role-arn=arn:aws:iam::<account>:role/certctl-acm-deployer`.
|
||||
The role's trust policy allows the cluster's OIDC provider;
|
||||
permission policy is the JSON above. Short-lived STS
|
||||
credentials are auto-rotated by EKS — no long-lived access
|
||||
keys.
|
||||
- **EC2 instance profile — recommended for VM-based agents.**
|
||||
Attach an instance profile referencing the same role. SDK's
|
||||
`LoadDefaultConfig` picks credentials up via the IMDS metadata
|
||||
service.
|
||||
- **AWS SSO / `aws configure sso` — recommended for operator
|
||||
workstations.** SDK reads `~/.aws/config` for the SSO profile
|
||||
and refreshes tokens via the existing CLI session.
|
||||
- **Long-lived access keys are NOT supported in connector
|
||||
Config** — the credential chain is configured at the SDK
|
||||
level, not the connector level. This is a procurement-
|
||||
readability decision: a security reviewer reading the
|
||||
`deployment_targets` table should never find an access key.
|
||||
|
||||
## Atomic-rollback contract
|
||||
|
||||
Every `DeployCertificate` snapshots the existing cert via
|
||||
`DescribeCertificate` + `GetCertificate` BEFORE calling
|
||||
`ImportCertificate` with the new bytes. After import, the
|
||||
connector re-fetches the cert metadata and compares serial
|
||||
numbers.
|
||||
|
||||
On serial-mismatch (post-verify failure), the connector calls
|
||||
`ImportCertificate` again with the snapshotted bytes to restore
|
||||
the previous cert. The rollback path emits a `WARN`-level slog
|
||||
entry; the rollback's own success or failure is exposed via
|
||||
`certctl_deploy_rollback_total{target_type="AWSACM",outcome="restored"|"also_failed"}`
|
||||
per the deploy-hardening I Phase 10 metric exposer.
|
||||
|
||||
Mirrors the Bundle 5+ pre-deploy-snapshot pattern shipped for
|
||||
IIS / WinCertStore / JavaKeystore.
|
||||
|
||||
## ALB attachment recipe
|
||||
|
||||
certctl creates / rotates the ACM cert; the operator (or
|
||||
Terraform / CloudFormation) attaches it to the ALB listener
|
||||
separately. For Terraform-driven deployments, look up the ARN by
|
||||
tag:
|
||||
|
||||
```hcl
|
||||
data "aws_acm_certificate" "certctl_managed" {
|
||||
domain = "api.example.com"
|
||||
most_recent = true
|
||||
|
||||
# Filter by certctl provenance tags so an unrelated ACM cert with
|
||||
# the same SAN doesn't get picked up.
|
||||
tags = {
|
||||
"certctl-managed-by" = "certctl"
|
||||
"certctl-certificate-id" = "mc-api-prod"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_listener" "https" {
|
||||
load_balancer_arn = aws_lb.api.arn
|
||||
port = 443
|
||||
protocol = "HTTPS"
|
||||
certificate_arn = data.aws_acm_certificate.certctl_managed.arn
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
The ARN updates in place across renewals (ACM `ImportCertificate`
|
||||
is upsert-style when given an ARN), so the ALB listener's
|
||||
`certificate_arn` reference doesn't change. CloudFront / API
|
||||
Gateway distributions can reference the same ARN via their
|
||||
respective Terraform resources.
|
||||
|
||||
## Threat model carve-outs
|
||||
|
||||
- **Cert key bytes never written to disk on the agent.**
|
||||
`DeployCertificate` reads `request.KeyPEM` from memory and
|
||||
passes it to the SDK's `ImportCertificate` call. No temp file.
|
||||
No swap-out window.
|
||||
- **Provenance tags are mandatory.** The reserved
|
||||
`certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>`
|
||||
pair is set automatically on every import. Operators
|
||||
identifying a stray ACM cert in their account can match
|
||||
against `certctl-managed-by` to confirm it was certctl-issued
|
||||
(or NOT — the absence of the tag means a manual import).
|
||||
- **No long-lived AWS credentials in `Config`.** `Config`
|
||||
carries region + ARN + operator tags only. AWS auth is the
|
||||
SDK credential chain (IRSA / instance profile / SSO).
|
||||
- **`ListCertificates` IAM permission is required for the V2
|
||||
ARN-discovery dance to work.** Operators who pin
|
||||
`Config.CertificateArn` after the first deploy can drop this
|
||||
permission; the V2 fallback emits a warning and reverts to
|
||||
"always create new ARN" if the operator forgets to update
|
||||
`certificate_arn` post-first-deploy.
|
||||
|
||||
## Procurement checklist crib
|
||||
|
||||
Paste into security review:
|
||||
|
||||
- certctl uses short-lived IAM-role credentials via IRSA /
|
||||
instance profile, not long-lived access keys.
|
||||
- The cert key is held only in agent memory during the import
|
||||
call; never written to disk.
|
||||
- Every imported ACM cert is tagged with
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback to the snapshotted
|
||||
previous cert; both outcomes are surfaced via Prometheus.
|
||||
- The minimum IAM policy is 5 actions on
|
||||
`arn:aws:acm:*:*:certificate/*`; CloudTrail captures every
|
||||
API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
ACM has no dry-run API for `ImportCertificate`; `ValidateOnly`
|
||||
returns `target.ErrValidateOnlyNotSupported` per the deploy-
|
||||
hardening I Phase 3 sentinel contract. Operators preview deploys
|
||||
via `ValidateConfig` + `aws acm describe-certificate
|
||||
--certificate-arn <arn>` against the current ARN.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [Azure Key Vault](azure-kv.md) — Azure equivalent target
|
||||
- [AWS ACM Private CA issuer](aws-acm-pca.md) — the *issuer* counterpart (same vendor, opposite direction)
|
||||
- [Cloud targets runbook](../../operator/runbooks/cloud-targets.md) — operator playbook covering both AWS ACM and Azure KV
|
||||
@@ -0,0 +1,195 @@
|
||||
# Azure Key Vault Target Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Azure Key Vault target
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Azure Key Vault target connector deploys certificates into
|
||||
Azure Key Vault — the Azure-managed cert/secret store that
|
||||
Application Gateway / Front Door / App Service / Container Apps
|
||||
consume by KID URI. Rank 5 (Azure half) of the 2026-05-03
|
||||
Infisical deep-research deliverable.
|
||||
|
||||
Implementation lives at `internal/connector/target/azurekv/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Azure Key Vault target connector when:
|
||||
|
||||
- TLS terminates at Azure-managed edges (Application Gateway,
|
||||
Front Door, App Service, Container Apps) and those services
|
||||
consume certs by Key Vault KID URI.
|
||||
- You need short-lived Azure credentials (managed identity,
|
||||
workload identity) rather than long-lived service-principal
|
||||
secrets.
|
||||
- You need cross-region or cross-cloud-environment Key Vault
|
||||
endpoints (US-Gov `.vault.usgovcloudapi.net`, China
|
||||
`.vault.azure.cn`).
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The target is an Azure VM running NGINX / IIS / HAProxy
|
||||
directly — those connectors are simpler.
|
||||
- The cert is for an internal Azure service that doesn't read
|
||||
from Key Vault (e.g. a custom .NET app reading PEM from disk).
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"vault_url": "https://my-vault.vault.azure.net",
|
||||
"certificate_name": "api-prod",
|
||||
"tags": {"env": "production", "app": "api-gateway"},
|
||||
"credential_mode": "managed_identity"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `vault_url` | (required) | Key Vault DNS endpoint (`https://<vault-name>.vault.azure.net`). For US-Gov: `.vault.usgovcloudapi.net`; for China: `.vault.azure.cn`. |
|
||||
| `certificate_name` | (required) | Cert object name in the vault (1-127 chars, alphanumeric + hyphens). Versions are auto-generated per import. |
|
||||
| `tags` | — | Tags applied at every import (Key Vault carries tags forward across versions, unlike ACM). Reserved keys `certctl-managed-by` + `certctl-certificate-id` are set automatically. |
|
||||
| `credential_mode` | `default` | One of `default` / `managed_identity` / `client_secret` / `workload_identity`. See "Auth recipes" below. |
|
||||
|
||||
## RBAC role (minimum permissions)
|
||||
|
||||
The off-the-shelf builtin role **Key Vault Certificates Officer**
|
||||
covers everything. For minimum-permission deploys, use a custom
|
||||
role with these data-plane operations on the vault scope
|
||||
(`/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>`):
|
||||
|
||||
```
|
||||
Microsoft.KeyVault/vaults/certificates/import/action
|
||||
Microsoft.KeyVault/vaults/certificates/read
|
||||
Microsoft.KeyVault/vaults/certificates/listversions/read
|
||||
```
|
||||
|
||||
## Auth recipes
|
||||
|
||||
- **AKS workload identity (`credential_mode: workload_identity`)
|
||||
— recommended for AKS deploys.** Annotate the agent's
|
||||
ServiceAccount with
|
||||
`azure.workload.identity/client-id=<app-id>`. The AKS
|
||||
cluster's OIDC issuer + the federated credential on the app
|
||||
registration handle token exchange; no long-lived secrets.
|
||||
- **Managed identity (`credential_mode: managed_identity`) —
|
||||
recommended for VM / App Service deploys.** Assign a
|
||||
system-assigned or user-assigned managed identity to the
|
||||
host; certctl-server / agent picks it up via IMDS. Pin
|
||||
`credential_mode` rather than letting `default` fall through
|
||||
to env vars (defends against accidental local-dev creds
|
||||
leaking into production).
|
||||
- **Service principal (`credential_mode: client_secret`).**
|
||||
Configure `AZURE_TENANT_ID` + `AZURE_CLIENT_ID` +
|
||||
`AZURE_CLIENT_SECRET` env vars on the agent. NOT recommended
|
||||
for production — long-lived client secret risk; rotate via
|
||||
Key Vault soft-delete recovery if leaked.
|
||||
- **Default (`credential_mode: default` or unset).** SDK's
|
||||
`DefaultAzureCredential` walks env vars → managed identity →
|
||||
Azure CLI fallback. Useful for local-dev where the operator
|
||||
already has `az login` active.
|
||||
- **Long-lived secrets in connector Config NOT supported** —
|
||||
same procurement-readability rule as AWS ACM.
|
||||
|
||||
## Atomic-rollback contract + Azure-version semantics
|
||||
|
||||
Every `DeployCertificate` snapshots the existing latest version
|
||||
via `GetCertificate(name, "" /* latest */)` BEFORE calling
|
||||
`ImportCertificate`. After import, the connector re-fetches the
|
||||
latest version and compares serial numbers.
|
||||
|
||||
On serial-mismatch, the connector calls `ImportCertificate`
|
||||
again with the snapshotted CER bytes (re-PFX'd with the
|
||||
operator's key) — **as a NEW VERSION**. Key Vault doesn't
|
||||
support "version-restore" without soft-delete recovery (which we
|
||||
keep off the minimum-RBAC surface). The version history will
|
||||
show e.g. v1=initial, v2=failed-renewal, v3=rollback-of-v2;
|
||||
operators reading audit dashboards filter by tag.
|
||||
|
||||
### Soft-delete caveat
|
||||
|
||||
V2 doesn't manage Key Vault soft-delete recovery. If a previous
|
||||
version was soft-deleted out-of-band (e.g. operator ran
|
||||
`az keyvault certificate delete`), the rollback re-imports the
|
||||
snapshot bytes as a new version rather than restoring the
|
||||
soft-deleted version. Operators alerting on rollback frequency
|
||||
should also watch for soft-delete events.
|
||||
|
||||
## App Gateway / Front Door attachment recipe
|
||||
|
||||
```hcl
|
||||
data "azurerm_key_vault_certificate" "certctl_managed" {
|
||||
name = "api-prod"
|
||||
key_vault_id = azurerm_key_vault.main.id
|
||||
}
|
||||
|
||||
resource "azurerm_application_gateway" "main" {
|
||||
# ...
|
||||
ssl_certificate {
|
||||
name = "certctl-managed"
|
||||
key_vault_secret_id = data.azurerm_key_vault_certificate.certctl_managed.secret_id
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Application Gateway / Front Door reference the cert by KID URI;
|
||||
certctl rotates the version under the same name, and the AGW /
|
||||
Front Door reference auto-resolves to the latest version (the
|
||||
SDK's behaviour when the KID points to
|
||||
`/certificates/<name>/<version>` vs `/certificates/<name>`
|
||||
differs — the latter auto-tracks "latest"; the former pins).
|
||||
**Pin the version-less KID for auto-tracking renewals.**
|
||||
|
||||
## Threat model carve-outs
|
||||
|
||||
- **Cert key bytes never written to disk on the agent.** PFX
|
||||
wrapping happens in memory (PKCS#12 via
|
||||
`software.sslmate.com/src/go-pkcs12`); the base64-encoded PFX
|
||||
is passed straight to the SDK's `ImportCertificate` call.
|
||||
- **Provenance tags are mandatory.** Same
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` shape as AWS ACM. Operators
|
||||
identifying a stray Key Vault cert match against
|
||||
`certctl-managed-by`.
|
||||
- **No long-lived Azure credentials in `Config`.** `Config`
|
||||
carries vault URL + cert name + operator tags + credential
|
||||
mode only. Auth is the Azure SDK credential chain.
|
||||
- **`credential_mode: managed_identity` is the recommended
|
||||
production posture.** Defends against accidental env-var
|
||||
creds leaking into deployments where the host already has a
|
||||
managed identity assigned.
|
||||
|
||||
## Procurement checklist crib
|
||||
|
||||
Paste into security review:
|
||||
|
||||
- certctl uses Azure managed identity (or workload identity for
|
||||
AKS), not long-lived service-principal secrets.
|
||||
- The cert key is held only in agent memory during the PFX wrap
|
||||
+ import call; never written to disk.
|
||||
- Every imported Key Vault cert is tagged with
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback by re-importing the
|
||||
snapshotted previous version's bytes; both outcomes are
|
||||
surfaced via Prometheus.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log
|
||||
captures every API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
Key Vault has no dry-run API; `ValidateOnly` returns
|
||||
`target.ErrValidateOnlyNotSupported`. Operators preview deploys
|
||||
via `ValidateConfig` + `az keyvault certificate show
|
||||
--vault-name <name> --name <cert>`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [AWS ACM target](aws-acm.md) — AWS equivalent target
|
||||
- [Cloud targets runbook](../../operator/runbooks/cloud-targets.md) — operator playbook covering both AWS ACM and Azure KV
|
||||
@@ -0,0 +1,100 @@
|
||||
# Caddy Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Caddy target connector. For
|
||||
> the connector-development context (interface contract, registry,
|
||||
> atomic deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Caddy connector supports two deployment modes:
|
||||
|
||||
- **API mode (recommended).** Posts the certificate directly to
|
||||
Caddy's admin API for zero-downtime hot reload.
|
||||
- **File mode (fallback).** Writes cert and key files to disk,
|
||||
relying on Caddy's built-in file watcher or a manual reload.
|
||||
|
||||
Implementation lives at `internal/connector/target/caddy/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Caddy connector when:
|
||||
|
||||
- Caddy fronts your services and you want certctl-managed certs
|
||||
rather than letting Caddy run its own ACME client.
|
||||
- You want zero-downtime hot reload via Caddy's admin API.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You'd rather Caddy keep running its own ACME client — point it
|
||||
at certctl's ACME server (see
|
||||
[migration/acme-from-caddy.md](../../migration/acme-from-caddy.md))
|
||||
for the cleanest pattern.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "api",
|
||||
"admin_api": "http://localhost:2019",
|
||||
"cert_dir": "/etc/caddy/certs",
|
||||
"cert_file": "site.crt",
|
||||
"key_file": "site.key"
|
||||
}
|
||||
```
|
||||
|
||||
When `mode` is `"api"`, the connector posts the certificate to
|
||||
the admin API endpoint. When `mode` is `"file"`, it writes files
|
||||
to `cert_dir` (same pattern as Traefik). The `admin_api` field is
|
||||
ignored in file mode.
|
||||
|
||||
## Mode trade-offs
|
||||
|
||||
### API mode
|
||||
|
||||
- Zero-downtime hot reload via `POST /load` or
|
||||
certificate-specific endpoints.
|
||||
- Requires Caddy's admin API to be enabled and reachable from the
|
||||
deployment agent.
|
||||
- Best fit for production deployments where Caddy is configured
|
||||
with an admin endpoint.
|
||||
|
||||
### File mode
|
||||
|
||||
- Writes cert and key files to `cert_dir`; Caddy picks them up
|
||||
via its file watcher or on next config reload.
|
||||
- Use when the admin API isn't available or when Caddy is
|
||||
configured to read certificates from disk.
|
||||
- Behaviorally equivalent to the [Traefik](traefik.md) connector.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
API mode bypasses the Bundle I file-write deploy primitive and
|
||||
talks directly to the Caddy admin API. File mode follows the
|
||||
standard atomic-write + verify path (idempotency check → backup
|
||||
→ atomic write → optional reload → post-deploy TLS verify).
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Admin API exposure
|
||||
|
||||
Caddy's admin API is an unauthenticated control surface by
|
||||
default. In API mode, ensure the admin API is bound to a
|
||||
loopback or trusted network — exposing it to the public would
|
||||
let anyone reload Caddy's config. Run the agent on the same host
|
||||
as Caddy and use `http://localhost:2019` for the safest posture.
|
||||
|
||||
### Falling back to file mode
|
||||
|
||||
If the admin API is intermittently unreachable, switch the
|
||||
target's `mode` to `file` via `PUT /api/v1/targets/{id}`. The
|
||||
deploy still lands; reload behaviour is whatever the operator's
|
||||
Caddy config does with file changes.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [Traefik](traefik.md) — comparable file-provider target
|
||||
- [Migration: point Caddy at certctl's ACME](../../migration/acme-from-caddy.md) — alternative pattern when Caddy should keep its ACME client
|
||||
@@ -0,0 +1,106 @@
|
||||
# DigiCert CertCentral Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the DigiCert CertCentral issuer
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The DigiCert connector integrates with DigiCert's CertCentral REST
|
||||
API for ordering and managing certificates from DigiCert's commercial
|
||||
public CA. It supports Domain Validated (DV), Organization Validated
|
||||
(OV), and Extended Validated (EV) certificates, with async order
|
||||
processing for OV/EV.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/digicert/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the DigiCert connector when:
|
||||
|
||||
- You're already a DigiCert CertCentral customer and want certctl to
|
||||
drive issuance, renewal, and deployment from the same platform that
|
||||
manages your internal PKI.
|
||||
- You need OV or EV certificates that require DigiCert to validate
|
||||
organization details before issuance.
|
||||
- You want one tool that covers both internal CAs (Vault, Local,
|
||||
step-ca) and a public-trust commercial CA.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You only need DV certificates and Let's Encrypt / ZeroSSL is an
|
||||
acceptable issuer — use the ACME connector instead.
|
||||
- You need self-hosted PKI with no commercial CA dependency — use
|
||||
Vault PKI, step-ca, or the Local CA issuer.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_DIGICERT_API_KEY` | — | DigiCert API key (sent in `X-DC-DEVKEY` header) |
|
||||
| `CERTCTL_DIGICERT_ORG_ID` | — | DigiCert organization ID |
|
||||
| `CERTCTL_DIGICERT_PRODUCT_TYPE` | `ssl_basic` | Certificate product (e.g. `ssl_basic`, `ssl_plus`, `ssl_ev`) |
|
||||
| `CERTCTL_DIGICERT_BASE_URL` | `https://www.digicert.com/services/v2` | DigiCert API base URL |
|
||||
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus` |
|
||||
|
||||
## Authentication
|
||||
|
||||
API key passed via `X-DC-DEVKEY` header. The organization ID is sent
|
||||
in the request body (not the header). No mTLS or OAuth2 required.
|
||||
|
||||
## Issuance model
|
||||
|
||||
- **DV certificates** — typically issue immediately; the
|
||||
`/order/certificate/create` API may return the PEM in the same
|
||||
response.
|
||||
- **OV / EV certificates** — require DigiCert-side validation
|
||||
(vetting org documents, checking domain ownership). The API
|
||||
returns 201 with an order ID; certctl's `GetOrderStatus` polls
|
||||
until the certificate is retrievable.
|
||||
|
||||
`GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m
|
||||
capped, ±20% jitter, default 10-minute deadline). For OV/EV orders
|
||||
where humans approve enrollments, bump
|
||||
`CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` to a value that comfortably
|
||||
covers the approval window — see
|
||||
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
|
||||
schedule shape and tuning guidance.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by DigiCert. Clients should validate
|
||||
certificate status against DigiCert's infrastructure. certctl
|
||||
records the revocation locally (audit row + cert state) but does
|
||||
**not** call DigiCert's revoke endpoint — operators revoke through
|
||||
DigiCert's dashboard or the CertCentral REST API directly. This
|
||||
keeps the certctl revocation flow simple at the cost of one extra
|
||||
manual step on revocation.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### API key rotation
|
||||
|
||||
Rotate the API key in DigiCert's dashboard, then either restart
|
||||
certctl-server with the new value in `CERTCTL_DIGICERT_API_KEY` or
|
||||
hot-swap via `PUT /api/v1/issuers/{id}` so the registry's Rebuild
|
||||
path replaces the connector with the new key. No certificate
|
||||
state is invalidated by the rotation — the new key just signs
|
||||
future API calls.
|
||||
|
||||
### Diagnosing slow OV/EV issuance
|
||||
|
||||
DigiCert's OV/EV vetting is a human process and can take hours to
|
||||
days. Bumping `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` lets a
|
||||
single tick wait through the full approval window, but the better
|
||||
operational pattern is to issue OV/EV certs well ahead of expiry
|
||||
so the bounded poll deadline is short. The renewal scheduler's
|
||||
"alert at T-30 days" default exists for exactly this reason.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
- [ACME server](../protocols/acme-server.md) — alternative issuer for DV-only workflows
|
||||
@@ -0,0 +1,115 @@
|
||||
# EJBCA (Keyfactor) Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the EJBCA issuer connector. For
|
||||
> the connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The EJBCA connector calls the EJBCA REST API for self-hosted
|
||||
open-source and Keyfactor enterprise CAs. It supports dual
|
||||
authentication: mTLS (default) or OAuth2 Bearer token, selectable
|
||||
via configuration.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/ejbca/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the EJBCA connector when:
|
||||
|
||||
- You already run EJBCA Community Edition or Keyfactor EJBCA
|
||||
Enterprise as your internal CA and want certctl to drive the
|
||||
lifecycle automation (renewal, deployment, alerts) on top.
|
||||
- You need EJBCA's certificate-profile and end-entity-profile
|
||||
policy enforcement — those policies stay in EJBCA and certctl
|
||||
passes the profile names through.
|
||||
- You need approval-pending workflows (humans approve enrollments)
|
||||
— EJBCA supports the 201-Accepted async path.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You want a simpler internal CA without EJBCA's operational weight
|
||||
— Vault PKI, step-ca, or the Local CA issuer are lighter.
|
||||
- You need a managed CA (no servers to run) — Google CAS or AWS
|
||||
ACM PCA on cloud, or DigiCert / Sectigo for commercial PKI.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_EJBCA_API_URL` | Yes | — | EJBCA REST API base URL |
|
||||
| `CERTCTL_EJBCA_AUTH_MODE` | No | `mtls` | Auth mode: `mtls` or `oauth2` |
|
||||
| `CERTCTL_EJBCA_CLIENT_CERT_PATH` | mTLS | — | Path to client certificate PEM (mTLS mode) |
|
||||
| `CERTCTL_EJBCA_CLIENT_KEY_PATH` | mTLS | — | Path to client key PEM (mTLS mode) |
|
||||
| `CERTCTL_EJBCA_TOKEN` | OAuth2 | — | Bearer token (oauth2 mode) |
|
||||
| `CERTCTL_EJBCA_CA_NAME` | Yes | — | EJBCA CA name |
|
||||
| `CERTCTL_EJBCA_CERT_PROFILE` | No | — | EJBCA certificate profile |
|
||||
| `CERTCTL_EJBCA_EE_PROFILE` | No | — | EJBCA end-entity profile |
|
||||
|
||||
## Authentication
|
||||
|
||||
Configurable via `auth_mode`:
|
||||
|
||||
- **`mtls`** — client certificate and key are loaded for the TLS
|
||||
handshake. This is the default and the more common deployment
|
||||
mode for EJBCA.
|
||||
- **`oauth2`** — the token is sent as `Authorization: Bearer
|
||||
{token}`. Use when EJBCA is fronted by an OAuth2-aware reverse
|
||||
proxy or when integrating with Keyfactor's identity provider.
|
||||
|
||||
The mTLS keypair is cached on the connector after the first API
|
||||
call and reused for the lifetime of the process; rotation is
|
||||
picked up automatically via mtime polling on the cert file (see
|
||||
the mtls keypair caching note in the [connector
|
||||
index](index.md#built-in-ejbca-keyfactor)).
|
||||
|
||||
## Issuance model
|
||||
|
||||
`POST /v1/certificate/pkcs10enroll` with base64-encoded CSR.
|
||||
Returns base64-encoded certificate PEM. EJBCA 9.3+ creates
|
||||
end-entity and issues cert in a single call. Approval-pending
|
||||
enrollments return 201 with a tracking ID; certctl's
|
||||
`GetOrderStatus` polls until the certificate is available.
|
||||
|
||||
## Revocation
|
||||
|
||||
EJBCA requires both issuer DN and serial number for revocation.
|
||||
The connector stores these as a composite `OrderID` in
|
||||
`issuer_dn::serial` format.
|
||||
|
||||
CRL and OCSP are managed by the EJBCA instance. certctl records
|
||||
revocations locally and notifies EJBCA via
|
||||
`PUT /v1/certificate/{issuer_dn}/{serial}/revoke`.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### mTLS rotation without downtime
|
||||
|
||||
`mv -f new.crt /etc/certctl/ejbca/client.crt` (mtime changes), no
|
||||
process restart required. The next API call re-parses the file
|
||||
and rebuilds the `*http.Transport`. `os.Stat` errors during
|
||||
rotation surface as connector errors rather than silently serving
|
||||
stale credentials.
|
||||
|
||||
### Switching from mTLS to OAuth2
|
||||
|
||||
Update the issuer config via `PUT /api/v1/issuers/{id}` with the
|
||||
new `auth_mode: oauth2` and `token`. The registry's Rebuild path
|
||||
replaces the connector without restart. Prior issuance state
|
||||
(serial numbers, cert state) is unaffected.
|
||||
|
||||
### Diagnosing approval-pending hangs
|
||||
|
||||
If `GetOrderStatus` consistently times out, the operator approval
|
||||
queue in EJBCA is the most common cause. The connector consumes
|
||||
the shared bounded-polling primitive — see
|
||||
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
|
||||
schedule shape and tuning approach.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — bounded-polling primitive
|
||||
- [Approval workflow](../../operator/approval-workflow.md) — certctl-side two-person integrity (separate from EJBCA's approval queue, but addresses the same shape of risk on the certctl side)
|
||||
@@ -0,0 +1,96 @@
|
||||
# Entrust Certificate Services Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Entrust CA Gateway issuer
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Entrust connector calls the Entrust CA Gateway REST API with
|
||||
mutual TLS client-certificate authentication. It supports
|
||||
synchronous issuance (200 OK with PEM) and approval-pending flows
|
||||
(201 Accepted with async polling).
|
||||
|
||||
Implementation lives at `internal/connector/issuer/entrust/` (the
|
||||
mTLS keypair cache is shared at
|
||||
`internal/connector/issuer/mtlscache/`).
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Entrust connector when:
|
||||
|
||||
- You're an Entrust Certificate Services customer using the CA
|
||||
Gateway as the integration surface.
|
||||
- You need approval-pending workflows where humans approve
|
||||
enrollments before issuance.
|
||||
- You want mTLS-authenticated issuance against a commercial CA
|
||||
with no API keys to rotate.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You only need DV / OV public-trust and your CA is reachable via
|
||||
ACME — use the [ACME connector](acme.md) for a simpler path.
|
||||
- You're not already an Entrust customer — DigiCert, Sectigo, and
|
||||
GlobalSign are comparable commercial alternatives, with
|
||||
different auth shapes.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_ENTRUST_API_URL` | Yes | — | Entrust CA Gateway base URL |
|
||||
| `CERTCTL_ENTRUST_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
|
||||
| `CERTCTL_ENTRUST_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_ENTRUST_CA_ID` | Yes | — | Certificate Authority ID (from `GET /certificate-authorities`) |
|
||||
| `CERTCTL_ENTRUST_PROFILE_ID` | No | — | Optional enrollment profile ID |
|
||||
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus` |
|
||||
|
||||
For approval-pending workflows where humans approve enrollments,
|
||||
bump `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` to `86400` (24h) so a
|
||||
single tick can wait through the approval window.
|
||||
|
||||
## Authentication
|
||||
|
||||
Mutual TLS — the client certificate and key are loaded via
|
||||
`tls.LoadX509KeyPair()` and attached to the HTTP transport. No API
|
||||
key or token required.
|
||||
|
||||
## Issuance model
|
||||
|
||||
Enrollment via
|
||||
`POST /v1/certificate-authorities/{caId}/enrollments`. Returns 200
|
||||
with PEM immediately for auto-approved enrollments, or 201
|
||||
Accepted with a tracking ID for approval-pending orders.
|
||||
`GetOrderStatus` polls the enrollment endpoint.
|
||||
|
||||
## mTLS keypair caching (audit fix #10)
|
||||
|
||||
The parsed client certificate plus a precomputed `*http.Transport`
|
||||
are cached on the connector after the first API call. Steady-state
|
||||
calls reuse the cached transport — no per-call disk read or
|
||||
`tls.X509KeyPair` parse.
|
||||
|
||||
Rotation is picked up automatically via mtime polling: when the
|
||||
cert file's mtime advances beyond the last-loaded value, the next
|
||||
API call re-parses and rebuilds the transport.
|
||||
|
||||
Operator workflow: `mv -f new.crt /etc/certctl/entrust/client.crt`
|
||||
(mtime changes), no process restart required, takes effect on the
|
||||
next API call. `os.Stat` errors during rotation surface as
|
||||
connector errors rather than silently serving stale credentials.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by Entrust. certctl records revocations
|
||||
locally and notifies Entrust via
|
||||
`PUT /v1/certificate-authorities/{caId}/certificates/{serial}/revoke`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [GlobalSign Atlas HVCA](globalsign.md) — comparable mTLS-authenticated commercial CA
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
- [Approval workflow](../../operator/approval-workflow.md) — certctl-side two-person integrity (separate from Entrust's approval queue)
|
||||
@@ -0,0 +1,112 @@
|
||||
# Envoy Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Envoy target connector. For
|
||||
> the connector-development context (interface contract, registry,
|
||||
> atomic deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Envoy connector uses **file-based certificate delivery** — it
|
||||
writes certificate and key files to a directory that Envoy watches
|
||||
via its SDS (Secret Discovery Service) file-based configuration or
|
||||
static `filename` references in the bootstrap config. When files
|
||||
change, Envoy automatically picks up the new certificates without
|
||||
requiring a reload command.
|
||||
|
||||
Implementation lives at `internal/connector/target/envoy/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Envoy connector when:
|
||||
|
||||
- Envoy fronts your services (standalone, as part of a service
|
||||
mesh, or as an API gateway like Emissary or Gloo).
|
||||
- You want certctl to drive cert rotation and let Envoy's file
|
||||
SDS handle the rolling reload across worker threads.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're running an Envoy-based service mesh (Istio, Consul
|
||||
Connect) — those meshes have their own cert distribution
|
||||
pipelines, and integrating certctl at the mesh layer is a
|
||||
different design than this connector covers.
|
||||
- You're using Envoy's xDS/gRPC SDS path (not file-based SDS) —
|
||||
the gRPC SDS-server connector is on the V3-Pro roadmap.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"cert_dir": "/etc/envoy/certs",
|
||||
"cert_filename": "cert.pem",
|
||||
"key_filename": "key.pem",
|
||||
"chain_filename": "chain.pem",
|
||||
"sds_config": true
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `cert_dir` | string | (required) | Directory where Envoy watches for certificate files |
|
||||
| `cert_filename` | string | `cert.pem` | Filename for the certificate (leaf + chain unless `chain_filename` is set) |
|
||||
| `key_filename` | string | `key.pem` | Filename for the private key |
|
||||
| `chain_filename` | string | (empty) | If set, chain is written to a separate file instead of appended to the cert |
|
||||
| `sds_config` | bool | `false` | If true, writes an `sds.json` file for Envoy's file-based SDS provider |
|
||||
|
||||
## SDS mode (recommended for production)
|
||||
|
||||
When `sds_config` is `true`, the connector writes an SDS JSON
|
||||
file (`{cert_dir}/sds.json`) containing a `tls_certificate`
|
||||
resource that points to the cert and key file paths. Envoy's
|
||||
file-based SDS (`path_config_source`) watches this file for
|
||||
changes, providing automatic hot-reload of certificates without
|
||||
restarting worker threads.
|
||||
|
||||
This is the recommended approach for production Envoy deployments
|
||||
using dynamic TLS configuration.
|
||||
|
||||
## Static-bootstrap mode
|
||||
|
||||
When `sds_config` is `false` (the default), the connector simply
|
||||
writes cert and key files. Use this mode when Envoy's bootstrap
|
||||
config references the cert / key files directly via static
|
||||
`filename` fields in the TLS context.
|
||||
|
||||
In this mode Envoy still picks up file changes via its filesystem
|
||||
watcher, but the operator should verify the bootstrap config sets
|
||||
`watched_directory` (or equivalent) on each `tls_certificate`
|
||||
entry — without it, the cert is loaded once at startup and
|
||||
subsequent file changes are ignored.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
Standard atomic-write + post-deploy verify (file-based deploy
|
||||
primitive shared across all file-deploy connectors). When SDS
|
||||
mode is on, the SDS JSON file is updated last so Envoy sees the
|
||||
cert / key on disk before the SDS resource pointer changes.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Hot-reload across worker threads
|
||||
|
||||
Envoy's file SDS path triggers a per-worker-thread reload as each
|
||||
worker re-reads the SDS file. In-flight TLS connections on each
|
||||
worker continue with the OLD cert until they close; new
|
||||
connections after the reload pick up the NEW cert.
|
||||
|
||||
### Service mesh interactions
|
||||
|
||||
If you're running Istio or Consul Connect, the mesh's own cert
|
||||
distribution pipeline (citadel / SDS server) is the system of
|
||||
record for sidecar certs. Don't point this connector at sidecar
|
||||
cert paths — point it at standalone Envoy gateways or API edges
|
||||
that aren't sidecar-managed.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [NGINX](nginx.md) — explicit-reload-command counterpart
|
||||
- [Traefik](traefik.md) — file-watcher counterpart with simpler semantics
|
||||
@@ -1,6 +1,11 @@
|
||||
# F5 BIG-IP Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -25,7 +30,7 @@ on-failure cleanup of orphaned crypto objects.
|
||||
against this in CI.
|
||||
2. **Customer-grade tier**: operator-supplied real F5 vagrant box.
|
||||
Documented setup recipe below. Manual smoke required for
|
||||
"verified" status in `docs/deployment-vendor-matrix.md`.
|
||||
"verified" status in `docs/reference/vendor-matrix.md`.
|
||||
|
||||
The mock implements a SUBSET of iControl REST. A real F5 may
|
||||
diverge on quirks the mock doesn't model. Customer-grade
|
||||
@@ -161,6 +166,6 @@ F5 iControl REST defaults to 100 req/s. Connector backs off on
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
- F5 official iControl REST docs: <https://clouddocs.f5.com/api/icontrol-rest/>
|
||||
@@ -0,0 +1,122 @@
|
||||
# GlobalSign Atlas HVCA Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the GlobalSign Atlas High Volume
|
||||
> CA (HVCA) issuer connector. For the connector-development context
|
||||
> (interface contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
GlobalSign Atlas HVCA REST API with **dual authentication**: mTLS
|
||||
for the TLS handshake AND API key/secret headers for request
|
||||
authorization. Region-aware base URLs (EMEA, APAC, Americas).
|
||||
|
||||
Implementation lives at `internal/connector/issuer/globalsign/`
|
||||
(mTLS keypair cache shared at
|
||||
`internal/connector/issuer/mtlscache/`).
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the GlobalSign Atlas HVCA connector when:
|
||||
|
||||
- You're a GlobalSign Atlas customer issuing high volumes of
|
||||
publicly trusted certificates (the "HV" in HVCA).
|
||||
- You want region-pinned issuance for regulatory or latency
|
||||
reasons (EMEA / APAC / Americas regional endpoints).
|
||||
- You're prepared to manage both mTLS client certs AND
|
||||
API key/secret credentials in tandem.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You only need DV public-trust and your CA is reachable via ACME —
|
||||
the [ACME connector](acme.md) is simpler.
|
||||
- The dual-auth burden (mTLS + API key + API secret) is heavier
|
||||
than your environment needs — DigiCert (API key only) or Entrust
|
||||
(mTLS only) are simpler to operate.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_GLOBALSIGN_API_URL` | Yes | — | Atlas HVCA API URL (region-specific) |
|
||||
| `CERTCTL_GLOBALSIGN_API_KEY` | Yes | — | API key for request authentication |
|
||||
| `CERTCTL_GLOBALSIGN_API_SECRET` | Yes | — | API secret for request authentication |
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_GLOBALSIGN_SERVER_CA_PATH` | No | system trust store | PEM bundle used to verify the Atlas API server certificate. Set this for private/lab Atlas deployments whose server TLS chain is not in the host's default trust bundle. |
|
||||
| `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. GlobalSign tracks orders by serial number rather than order ID; the polling shape is identical. |
|
||||
|
||||
## Authentication
|
||||
|
||||
Dual — mTLS client certificate for TLS handshake plus `X-API-Key`
|
||||
and `X-API-Secret` headers on every request. Both must be valid
|
||||
or the request fails.
|
||||
|
||||
## TLS verification
|
||||
|
||||
The connector always verifies the server certificate. When
|
||||
`server_ca_path` is set, the PEM bundle at that path is used as
|
||||
the trust anchor; otherwise the host's system trust store is
|
||||
used. TLS 1.2 is the minimum protocol version.
|
||||
|
||||
## Issuance model
|
||||
|
||||
`POST /v2/certificates` returns a serial number. Certificate PEM
|
||||
is available after validation completes. Typically resolves
|
||||
within seconds for DV. `GetOrderStatus` polls the certificate
|
||||
endpoint.
|
||||
|
||||
## mTLS keypair caching (audit fix #10)
|
||||
|
||||
The parsed client certificate plus a precomputed `*http.Transport`
|
||||
(with `ServerCAPath` pinning preserved when configured) are cached
|
||||
on the connector after the first API call. Steady-state calls
|
||||
reuse the cached transport — no per-call disk read or
|
||||
`tls.X509KeyPair` parse.
|
||||
|
||||
Rotation is picked up automatically via mtime polling: when the
|
||||
cert file's mtime advances beyond the last-loaded value, the next
|
||||
API call re-parses and rebuilds the transport.
|
||||
|
||||
Operator workflow: `mv -f new.crt /etc/certctl/globalsign/client.crt`
|
||||
(mtime changes), no process restart required, takes effect on the
|
||||
next API call. `os.Stat` errors during rotation surface as
|
||||
connector errors rather than silently serving stale credentials.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by GlobalSign. certctl records
|
||||
revocations locally and notifies GlobalSign via
|
||||
`PUT /v2/certificates/{serial}/revoke`.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Rotating mTLS client material
|
||||
|
||||
Same flow as the [Entrust connector](entrust.md): place the new
|
||||
cert at the configured path, mtime changes, next API call picks
|
||||
up the new keypair. `ServerCAPath` pin (when configured) is
|
||||
preserved across the rebuild.
|
||||
|
||||
### Rotating API key / secret
|
||||
|
||||
Rotate in the Atlas dashboard, then either restart certctl-server
|
||||
or hot-swap via `PUT /api/v1/issuers/{id}`. The registry's
|
||||
Rebuild path replaces the connector with the new credentials. The
|
||||
mTLS transport cache stays warm across the swap (mTLS material
|
||||
hasn't changed) — only the per-request headers are new.
|
||||
|
||||
### Region selection
|
||||
|
||||
Atlas HVCA has region-specific base URLs. Use the URL that
|
||||
matches your account's contracted region; the connector does no
|
||||
region-routing on its own.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Entrust connector](entrust.md) — mTLS-only commercial alternative
|
||||
- [DigiCert connector](digicert.md) — API-key-only commercial alternative
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
@@ -0,0 +1,89 @@
|
||||
# Google CAS Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Google Cloud Certificate
|
||||
> Authority Service (CAS) issuer connector. For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
Google Cloud Certificate Authority Service is a managed private CA
|
||||
on GCP. Issuance is synchronous via the CAS REST API with OAuth2
|
||||
service-account auth.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/googlecas/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Google CAS connector when:
|
||||
|
||||
- Your workloads are GCP-native and you want the CA to live inside
|
||||
your GCP project (for blast radius, IAM, and audit reasons).
|
||||
- You want IAM-bound service-account auth instead of API keys to
|
||||
rotate.
|
||||
- You need GCP-native CRL distribution and audit logging served by
|
||||
Google.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're not on GCP — AWS ACM Private CA or Azure Key Vault are
|
||||
the cloud-native equivalents on those platforms.
|
||||
- You need public-trust certificates — CAS is private only.
|
||||
- You don't already pay for CAS (it has a non-trivial monthly
|
||||
cost). Vault, step-ca, or the Local CA issuer are free
|
||||
self-hosted alternatives.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_GOOGLE_CAS_PROJECT` | Yes | — | GCP project ID |
|
||||
| `CERTCTL_GOOGLE_CAS_LOCATION` | Yes | — | GCP region (e.g. `us-central1`) |
|
||||
| `CERTCTL_GOOGLE_CAS_CA_POOL` | Yes | — | CA pool name |
|
||||
| `CERTCTL_GOOGLE_CAS_CREDENTIALS` | Yes | — | Path to service account JSON |
|
||||
| `CERTCTL_GOOGLE_CAS_TTL` | No | `8760h` | Default certificate TTL |
|
||||
|
||||
## Authentication
|
||||
|
||||
OAuth2 service account. The connector reads a service account
|
||||
JSON file, signs a JWT with the private key, and exchanges it for
|
||||
an access token at Google's token endpoint. Tokens are cached and
|
||||
refreshed automatically (5 min before expiry) so the connector
|
||||
doesn't pay token-mint latency on every request.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by Google CAS directly. certctl records
|
||||
revocations locally and notifies Google CAS via the revoke
|
||||
endpoint. CAS's CRL distribution and audit logging serve the
|
||||
resulting status to verifying clients.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Service-account key rotation
|
||||
|
||||
1. Generate a new service-account key in the GCP IAM console.
|
||||
2. Distribute the new JSON to the certctl host at the
|
||||
`CERTCTL_GOOGLE_CAS_CREDENTIALS` path (overwrite or use a new
|
||||
path).
|
||||
3. Either restart certctl-server with the new env var or hot-swap
|
||||
via `PUT /api/v1/issuers/{id}` so the registry's Rebuild path
|
||||
replaces the connector.
|
||||
4. Delete the old key in GCP IAM after the next successful
|
||||
issuance proves the new key works.
|
||||
|
||||
### Required IAM roles
|
||||
|
||||
The service account needs `roles/privateca.certificateRequester`
|
||||
(or a custom role with `privateca.certificates.create` and
|
||||
`privateca.certificates.get`) on the CA pool. Add
|
||||
`roles/privateca.certificateAuthorityUser` if the connector also
|
||||
needs to read the issuing CA cert chain.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [AWS ACM PCA](aws-acm-pca.md) — AWS equivalent
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — bounded-polling primitive (Google CAS is synchronous so doesn't consume it)
|
||||
@@ -0,0 +1,107 @@
|
||||
# HAProxy Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the HAProxy target connector.
|
||||
> For the connector-development context (interface contract,
|
||||
> registry, atomic deploy primitive shared across all targets), see
|
||||
> the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
HAProxy differs from NGINX and Apache in one important way: it
|
||||
expects all TLS material in a **single combined PEM file** —
|
||||
certificate, intermediate chain, and private key concatenated. The
|
||||
connector builds this combined file, writes it with 0600
|
||||
permissions (since it contains the private key), optionally
|
||||
validates the HAProxy configuration, and reloads.
|
||||
|
||||
Implementation lives at `internal/connector/target/haproxy/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the HAProxy connector when:
|
||||
|
||||
- HAProxy fronts your applications and you want certctl to
|
||||
rotate the cert + chain + key in place atomically without
|
||||
hand-rolling the combined-PEM build.
|
||||
- You want validate-before-reload behaviour to keep a bad config
|
||||
from taking down the load balancer mid-rotation.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're running HAProxy Enterprise's hot-cert-update API path —
|
||||
the connector currently uses the file-write-and-reload model;
|
||||
the API path is on the V3-Pro roadmap.
|
||||
- You're not running HAProxy directly but a managed load balancer
|
||||
(AWS ALB, Azure Application Gateway). Use the cloud-native
|
||||
target connector for that platform instead.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"pem_path": "/etc/haproxy/certs/site.pem",
|
||||
"reload_command": "systemctl reload haproxy",
|
||||
"validate_command": "haproxy -c -f /etc/haproxy/haproxy.cfg"
|
||||
}
|
||||
```
|
||||
|
||||
The combined PEM is built in this order: server certificate,
|
||||
intermediate / chain certificates, private key.
|
||||
|
||||
The `validate_command` is optional — if omitted, the connector
|
||||
skips config validation and goes straight to reload. Keeping it
|
||||
on is the production-recommended posture.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)`
|
||||
flow:
|
||||
|
||||
1. **Idempotency check** — SHA-256 over the combined PEM bytes;
|
||||
skip if the destination already matches.
|
||||
2. **Pre-deploy backup** — copy existing PEM to
|
||||
`<pem_path>.certctl-bak.<unix-nanos>`.
|
||||
3. **Atomic write** — temp-file + chown + atomic rename.
|
||||
4. **PreCommit (validate)** — runs `haproxy -c -f
|
||||
/etc/haproxy/haproxy.cfg`. Failure aborts; no live cert
|
||||
touched.
|
||||
5. **Atomic rename** — temp → final.
|
||||
6. **PostCommit (reload)** — runs `systemctl reload haproxy` (or
|
||||
the operator's override).
|
||||
7. **Post-deploy TLS verify** — dials the configured endpoint
|
||||
when configured; pulls leaf cert SHA-256; compares against
|
||||
deployed bytes. Mismatch triggers automatic rollback.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Old cert served via session resumption
|
||||
|
||||
HAProxy keeps TLS sessions alive for the configured
|
||||
`tune.ssl.lifetime` (default 1h). Resumed clients see the OLD
|
||||
cert until their session expires. Post-deploy verify in certctl
|
||||
returns the NEW cert from a fresh handshake; warm clients see the
|
||||
OLD cert until session expiration.
|
||||
|
||||
### Multi-frontend deployments
|
||||
|
||||
When HAProxy serves multiple frontends with different certs,
|
||||
configure **one target per frontend's cert** in the certctl
|
||||
control plane. Each gets its own `pem_path`. The reload command
|
||||
is shared (HAProxy reloads all frontends together), so the
|
||||
deploys can land in any order; the final reload picks them all up.
|
||||
|
||||
### `crt-list` directories
|
||||
|
||||
If your HAProxy config uses a `crt-list` directory rather than a
|
||||
single PEM, set `pem_path` to a file inside the directory and let
|
||||
HAProxy enumerate it on reload. The connector treats `pem_path`
|
||||
as a single file regardless of HAProxy's directory semantics.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [NGINX](nginx.md) — separate-file deploy contract counterpart
|
||||
- [Apache](apache.md) — separate-file deploy contract with `apachectl configtest`
|
||||
- [Migration: ACME from HAProxy](../../migration/acme-from-caddy.md) — pattern for pointing edge proxies at certctl's ACME server (Caddy walkthrough; HAProxy ACME plumbing is similar)
|
||||
@@ -1,6 +1,11 @@
|
||||
# Microsoft IIS Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -126,8 +131,8 @@ hostname.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
|
||||
## Operator validation playbook (Windows host)
|
||||
|
||||
@@ -184,7 +189,7 @@ docker compose --profile deploy-e2e-windows `
|
||||
### Acceptance
|
||||
|
||||
Per Bundle II frozen decision 0.14, the IIS / WinCertStore cells in
|
||||
`docs/deployment-vendor-matrix.md` flip from "CI" / "pending" → "✓"
|
||||
`docs/reference/vendor-matrix.md` flip from "CI" / "pending" → "✓"
|
||||
only when ALL of the following are true:
|
||||
|
||||
- ≥1 happy-path e2e passes against the real Windows IIS sidecar
|
||||
@@ -192,4 +197,4 @@ only when ALL of the following are true:
|
||||
- This playbook's full procedure ran clean once on a real Windows host
|
||||
|
||||
Operator records the validation date + Windows Server version in
|
||||
`cowork/<bundle>/iis-validation-receipts.md` for audit trail.
|
||||
the project's per-bundle iis-validation receipts for audit trail.
|
||||
@@ -1,7 +1,53 @@
|
||||
# Connector Development Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> This is the canonical connector reference: interface contracts,
|
||||
> registry, deployment primitive, network scanner, cloud discovery.
|
||||
> Each built-in connector below has a sibling per-page that goes
|
||||
> deeper on operator-grade material (vendor edges, troubleshooting,
|
||||
> rotation playbooks, when-to-use vs alternatives). Use this index
|
||||
> to navigate; jump to the sibling pages for hands-on operator
|
||||
> material.
|
||||
|
||||
Connectors extend certctl to integrate with external systems for certificate issuance, deployment, and notifications. This guide covers the connector interfaces, built-in implementations, and how to build your own.
|
||||
|
||||
**Per-connector deep-dive pages** (siblings in this directory):
|
||||
|
||||
Issuer connectors:
|
||||
|
||||
- [ACME](acme.md) — RFC 8555 v2 client (Let's Encrypt, ZeroSSL, Sectigo, Buypass, GTS, SSL.com)
|
||||
- [ADCS integration](adcs.md) — Active Directory Certificate Services as enterprise root via Local CA sub-CA mode
|
||||
- [AWS ACM Private CA](aws-acm-pca.md) — managed private CA on AWS, IAM-authenticated
|
||||
- [DigiCert CertCentral](digicert.md) — commercial public CA (DV / OV / EV)
|
||||
- [EJBCA (Keyfactor)](ejbca.md) — self-hosted open-source / Keyfactor enterprise CA
|
||||
- [Entrust Certificate Services](entrust.md) — Entrust CA Gateway with mTLS auth
|
||||
- [GlobalSign Atlas HVCA](globalsign.md) — Atlas HVCA with dual mTLS + API key/secret auth
|
||||
- [Google CAS](google-cas.md) — managed private CA on GCP, OAuth2 service-account auth
|
||||
- [Local CA](local-ca.md) — Go `crypto/x509`-backed signer (self-signed, sub-CA, tree mode)
|
||||
- [OpenSSL / Custom CA](openssl.md) — script-based shell-out for arbitrary CLI-driven CAs
|
||||
- [Sectigo SCM](sectigo.md) — Sectigo Certificate Manager REST API
|
||||
- [step-ca (Smallstep)](step-ca.md) — JWK-provisioner authenticated synchronous internal CA
|
||||
- [Vault PKI](vault.md) — HashiCorp Vault PKI engine, synchronous issuance
|
||||
|
||||
Target connectors:
|
||||
|
||||
- [Apache](apache.md) — Apache httpd, separate-file deploy + `apachectl configtest`
|
||||
- [AWS Certificate Manager](aws-acm.md) — deploy into ACM for ALB / CloudFront / API Gateway
|
||||
- [Azure Key Vault](azure-kv.md) — deploy into Key Vault for App Gateway / Front Door / App Service
|
||||
- [Caddy](caddy.md) — admin-API hot reload or file-watcher fallback
|
||||
- [Envoy](envoy.md) — file SDS hot reload, optional `sds.json`
|
||||
- [F5 BIG-IP](f5.md) — proxy-agent pattern + transactional iControl REST
|
||||
- [HAProxy](haproxy.md) — combined-PEM deploy + `haproxy -c` validate
|
||||
- [IIS](iis.md) — Microsoft IIS, local PowerShell + WinRM modes
|
||||
- [Java Keystore](jks.md) — JKS / PKCS#12 via `keytool` with atomic snapshot rollback
|
||||
- [Kubernetes Secrets](k8s.md) — k8s.io/tls Secrets atomic update
|
||||
- [NGINX](nginx.md) — separate-file deploy + `nginx -t` validate
|
||||
- [Postfix / Dovecot](postfix.md) — dual-mode mail-server TLS connector
|
||||
- [SSH (agentless)](ssh.md) — agentless deploy over SSH/SFTP for Linux/Unix targets
|
||||
- [Traefik](traefik.md) — file-provider zero-reload deploy
|
||||
- [Windows Certificate Store](wincertstore.md) — non-IIS Windows services (Exchange, RDP, SQL, ADFS)
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
@@ -62,8 +108,8 @@ Connectors extend certctl to integrate with external systems for certificate iss
|
||||
|
||||
Three types of connectors:
|
||||
|
||||
1. **Issuer Connector** — Obtains certificates from CAs. 9 built-in: Local CA (self-signed + sub-CA), ACME v2 (HTTP-01, DNS-01, DNS-PERSIST-01, ARI, EAB, profile selection), step-ca, OpenSSL/Custom CA, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM Private CA
|
||||
2. **Target Connector** — Deploys certificates to infrastructure. 14 built-in: NGINX, Apache httpd, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (local + WinRM), F5 BIG-IP (proxy agent), SSH (agentless), Windows Certificate Store, Java Keystore, Kubernetes Secrets
|
||||
1. **Issuer Connector** — Obtains certificates from CAs. 12 built-in: Local CA (self-signed + sub-CA + tree mode; ADCS sub-CA mode is documented separately), ACME v2 (HTTP-01, DNS-01, DNS-PERSIST-01, ARI, EAB, profile selection), step-ca, OpenSSL/Custom CA, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM Private CA, Entrust Certificate Services, GlobalSign Atlas HVCA, EJBCA (Keyfactor)
|
||||
2. **Target Connector** — Deploys certificates to infrastructure. 15 built-in: NGINX, Apache httpd, HAProxy, Traefik, Caddy, Envoy, Postfix/Dovecot (dual-mode), IIS (local PowerShell + WinRM proxy), F5 BIG-IP (proxy agent), SSH (agentless), Windows Certificate Store, Java Keystore (JKS / PKCS#12), Kubernetes Secrets, AWS Certificate Manager, Azure Key Vault
|
||||
3. **Notifier Connector** — Sends alerts about certificate events (Email, Webhooks, Slack, Microsoft Teams, PagerDuty, OpsGenie implemented)
|
||||
|
||||
All connectors accept JSON configuration at initialization, support config validation, and are registered in the service layer. Issuer connectors run on the control plane; target connectors run on agents. For network appliances where agents can't be installed, a **proxy agent** in the same network zone handles deployment — the server never initiates outbound connections.
|
||||
@@ -156,6 +202,8 @@ The Local CA issuer signs certificates using Go's `crypto/x509` library. It supp
|
||||
|
||||
**Sub-CA mode:** Loads a CA certificate and private key from disk (`CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH`). The CA cert is signed by an upstream CA (e.g., ADCS), so all issued certificates chain to the enterprise root trust hierarchy. Clients that already trust the enterprise root automatically trust certctl-issued certs. Supports RSA, ECDSA, and PKCS#8 key formats. If the paths are not set, falls back to self-signed mode. The loaded certificate must have `IsCA=true` and `KeyUsageCertSign`.
|
||||
|
||||
**Tree mode (Rank 8 — multi-level CA hierarchy):** When `Issuer.HierarchyMode = "tree"` is set on the issuer row, the local connector reads the active CA hierarchy from the `intermediate_cas` table and assembles `IssuanceResult.ChainPEM` by walking the `parent_ca_id` ancestry from the issuing leaf CA up to the root. Tree mode is operator-managed via the admin-gated `/api/v1/issuers/{id}/intermediates` and `/api/v1/intermediates/{id}` endpoints (`POST` to create / sign children, `GET` to list / inspect, `POST .../retire` to two-phase retire). The signing path is shared with single-mode (cert is signed via `c.caCert` + `c.caSigner` from the on-disk issuing CA cert+key); only the chain bytes differ. RFC 5280 §3.2 (self-signed root validation), §4.2.1.9 (path-length tightening), and §4.2.1.10 (NameConstraints subset semantics) are enforced at the service layer fail-closed. The default is `single`, byte-identical to the pre-Rank-8 historical flow. See `docs/reference/intermediate-ca-hierarchy.md` for the operator runbook covering common 4-level boundary, 3-level policy, and 2-level internal-PKI patterns + the migration runbook for flipping a single-mode issuer to tree.
|
||||
|
||||
**CRL and OCSP support (M15b):** The Local CA supports DER-encoded X.509 CRL generation served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`) with 24-hour validity. An embedded OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`) returns signed OCSP responses for issued certificates (good/revoked/unknown status). Both endpoints are reachable by relying parties with no certctl API credentials, which is how standard TLS clients, browsers, and hardware appliances consume these resources. Certificates with profile TTL < 1 hour automatically skip CRL/OCSP — expiry is treated as sufficient revocation for short-lived credentials.
|
||||
|
||||
**Extended Key Usage (EKU) support (M27):** The Local CA respects EKU constraints from certificate profiles and adjusts key usage flags accordingly. For S/MIME certificates (emailProtection EKU), it uses `DigitalSignature | ContentCommitment` instead of the TLS default. For TLS certificates (serverAuth/clientAuth EKU), it uses `DigitalSignature | KeyEncipherment`. This enables support for multiple certificate types — TLS, S/MIME, code signing, timestamping — from a single CA.
|
||||
@@ -266,9 +314,9 @@ The connector is registered in the issuer registry under `iss-acme-staging` and
|
||||
|
||||
The cert version must exist in the local store: this means the cert was issued through certctl, not imported. If `GetVersionBySerial` returns `sql.ErrNoRows`, the connector returns an actionable error pointing at the local-store requirement. Revoke-by-serial is therefore only available for ACME certs that certctl issued.
|
||||
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to `unspecified` (0), and the connector accepts the canonical camelCase form (`keyCompromise`, `cACompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE variants. An unknown reason returns an error rather than silently demoting to `unspecified` — operators rely on the reason for compliance reporting (PCI-DSS §3.6, HIPAA §164.312).
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to `unspecified` (0), and the connector accepts the canonical camelCase form (`keyCompromise`, `cACompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE variants. An unknown reason returns an error rather than silently demoting to `unspecified` — operators rely on the reason for audit reporting.
|
||||
|
||||
Audit reference: `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` Top-10 fix #7.
|
||||
Audit reference: 2026-05-01 issuer coverage audit Top-10 fix #7.
|
||||
|
||||
Location: `internal/connector/issuer/acme/acme.go`, `internal/connector/issuer/acme/dns.go`
|
||||
|
||||
@@ -350,14 +398,14 @@ certctl's OpenSSL adapter `exec`s an operator-supplied script for every certific
|
||||
|
||||
**When you should NOT use the OpenSSL adapter:**
|
||||
|
||||
- Compliance environments (PCI-DSS Level 1, FedRAMP High, HIPAA-regulated PHI handling) where shell-out attack surfaces are formally disallowed by your security policy.
|
||||
- Regulated environments where shell-out attack surfaces are formally disallowed by your security policy.
|
||||
- Multi-tenant certctl-server deployments where tenant-A's script can affect tenant-B's certificates.
|
||||
- Environments without operator review of every script line — trust-on-first-use is the wrong posture for a shell-out.
|
||||
- For these cases, switch to a Go-native issuer adapter (Vault, DigiCert, Sectigo, ACME, AWSACMPCA, GoogleCAS, EJBCA, Entrust, GlobalSign, step-ca) or commission a custom Go-native adapter for your CA (the issuer connector interface in `internal/connector/issuer/interface.go` is small — `IssueCertificate` + `RevokeCertificate` + `GetCACertPEM` + a few stubs).
|
||||
|
||||
**V3-Pro forward path:**
|
||||
|
||||
The hardened OpenSSL adapter (chroot/container by default, env-var allow-list at the adapter layer, signed-script-binary verification, audit-log-on-every-invocation, per-call concurrency bound shared with the API surface) is V3-Pro work. Tracking: `cowork/WORKSPACE-ROADMAP.md` (search "OpenSSL hardened mode").
|
||||
The hardened OpenSSL adapter (chroot/container by default, env-var allow-list at the adapter layer, signed-script-binary verification, audit-log-on-every-invocation, per-call concurrency bound shared with the API surface) is V3-Pro work. Tracking: project roadmap, "OpenSSL hardened mode".
|
||||
|
||||
### Revocation Across Issuers
|
||||
|
||||
@@ -379,7 +427,7 @@ The `GetCACertPEM()` method returns the PEM-encoded CA certificate chain, used b
|
||||
- **step-ca**: Returns error — step-ca serves its own `/root` endpoint for CA distribution.
|
||||
- **OpenSSL/Custom CA**: Returns error — custom script-based CAs have no CA cert access through certctl.
|
||||
|
||||
Note: EST and SCEP are not connectors — they are protocol handlers (`internal/api/handler/est.go` and `internal/api/handler/scep.go`) that delegate certificate issuance to whichever issuer connector is configured via `CERTCTL_EST_ISSUER_ID` or `CERTCTL_SCEP_ISSUER_ID` (or the per-profile `CERTCTL_EST_PROFILE_<NAME>_ISSUER_ID` / `CERTCTL_SCEP_PROFILE_<NAME>_ISSUER_ID` form for multi-endpoint dispatch). Both share a common `internal/pkcs7` package for PKCS#7 response encoding. See the [Architecture Guide](architecture.md#est-server-rfc-7030) for the V2-baseline server and [`Architecture Guide::EST Production Deployment`](architecture.md#est-server-rfc-7030--production-deployment) for the post-2026-04-29 hardening master bundle.
|
||||
Note: EST and SCEP are not connectors — they are protocol handlers (`internal/api/handler/est.go` and `internal/api/handler/scep.go`) that delegate certificate issuance to whichever issuer connector is configured via `CERTCTL_EST_ISSUER_ID` or `CERTCTL_SCEP_ISSUER_ID` (or the per-profile `CERTCTL_EST_PROFILE_<NAME>_ISSUER_ID` / `CERTCTL_SCEP_PROFILE_<NAME>_ISSUER_ID` form for multi-endpoint dispatch). Both share a common `internal/pkcs7` package for PKCS#7 response encoding. See the [Architecture Guide](../architecture.md#est-server-rfc-7030) for the V2-baseline server and [`Architecture Guide::EST Production Deployment`](../architecture.md#est-server-rfc-7030--production-deployment) for the post-2026-04-29 hardening master bundle.
|
||||
|
||||
#### Multi-profile EST dispatch + production hardening
|
||||
|
||||
@@ -398,9 +446,9 @@ A single certctl deploy can publish multiple EST endpoints — one per fleet (la
|
||||
| `CERTCTL_EST_PROFILE_<NAME>_RATE_LIMIT_PER_PRINCIPAL_24H` | No | `0` (disabled) | Sliding-window cap on enrollments per `(CSR.Subject.CN, sourceIP)` pair in any rolling 24h window. Production deploys typically set `3`. |
|
||||
| `CERTCTL_EST_PROFILE_<NAME>_SERVERKEYGEN_ENABLED` | No | `false` | Publish `POST /.well-known/est/<pathID>/serverkeygen` per RFC 7030 §4.4 (server generates the keypair, returns multipart/mixed with cert + CMS-EnvelopedData-wrapped private key). |
|
||||
|
||||
See [`docs/est.md`](est.md) for the full operator guide — multi-profile setup, WiFi/802.1X + FreeRADIUS recipe, IoT bootstrap recipe, troubleshooting matrix per typed audit-action code, and the threat-model carve-outs (server-keygen heap-residency window, source-IP limiter process-locality, mTLS cross-profile bleed defense).
|
||||
See [`est.md`](../protocols/est.md) for the full operator guide — multi-profile setup, WiFi/802.1X + FreeRADIUS recipe, IoT bootstrap recipe, troubleshooting matrix per typed audit-action code, and the threat-model carve-outs (server-keygen heap-residency window, source-IP limiter process-locality, mTLS cross-profile bleed defense).
|
||||
|
||||
**SCEP RA cert + key (post-2026-04-29):** the SCEP server's RFC 8894 path requires an RA cert/key pair (`CERTCTL_SCEP_RA_CERT_PATH` + `CERTCTL_SCEP_RA_KEY_PATH`, mode 0600) — clients encrypt their CSR to the RA cert's public key per RFC 8894 §3.2.2. Multi-profile deployments configure per-profile pairs via `CERTCTL_SCEP_PROFILES=corp,iot` + `CERTCTL_SCEP_PROFILE_<NAME>_RA_*_PATH`. See [`legacy-est-scep.md`](legacy-est-scep.md#scep-rfc-8894-native-implementation-post-2026-04-29) for the openssl recipe + ChromeOS Admin Console pointer + must-staple per-profile policy.
|
||||
**SCEP RA cert + key (post-2026-04-29):** the SCEP server's RFC 8894 path requires an RA cert/key pair (`CERTCTL_SCEP_RA_CERT_PATH` + `CERTCTL_SCEP_RA_KEY_PATH`, mode 0600) — clients encrypt their CSR to the RA cert's public key per RFC 8894 §3.2.2. Multi-profile deployments configure per-profile pairs via `CERTCTL_SCEP_PROFILES=corp,iot` + `CERTCTL_SCEP_PROFILE_<NAME>_RA_*_PATH`. See [`legacy-est-scep.md`](../protocols/scep-server.md#scep-rfc-8894-native-implementation-post-2026-04-29) for the openssl recipe + ChromeOS Admin Console pointer + must-staple per-profile policy.
|
||||
|
||||
#### Multi-profile SCEP dispatch
|
||||
|
||||
@@ -415,7 +463,7 @@ A single certctl deploy can publish multiple SCEP endpoints — one per fleet, o
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_RA_CERT_PATH` | Yes | — | RA cert PEM path (mode 0600 enforced). |
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_RA_KEY_PATH` | Yes | — | RA private key PEM path (mode 0600 enforced). |
|
||||
|
||||
See [`legacy-est-scep.md`](legacy-est-scep.md#scep-rfc-8894-native-implementation-post-2026-04-29) for the full per-profile env-var list and the mTLS / Intune extensions.
|
||||
See [`legacy-est-scep.md`](../protocols/scep-server.md#scep-rfc-8894-native-implementation-post-2026-04-29) for the full per-profile env-var list and the mTLS / Intune extensions.
|
||||
|
||||
#### SCEP mTLS sibling route (opt-in)
|
||||
|
||||
@@ -426,11 +474,11 @@ For deploys that already have a previously-issued certctl client cert and want a
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED` | No | `false` | Set `true` to publish `/scep-mtls/<pathID>` alongside `/scep/<pathID>`. |
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH` | When MTLS enabled | — | PEM bundle of CAs that may sign client certs. Preflight refuses a missing/empty bundle. |
|
||||
|
||||
See [`legacy-est-scep.md`](legacy-est-scep.md#scep-mtls-sibling-route-phase-65) for the operator recipe + threat-model rationale.
|
||||
See [`legacy-est-scep.md`](../protocols/scep-server.md#scep-mtls-sibling-route-phase-65) for the operator recipe + threat-model rationale.
|
||||
|
||||
#### Microsoft Intune Certificate Connector dispatcher
|
||||
|
||||
When a profile has `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true`, certctl validates the Microsoft Intune Certificate Connector's signed-challenge JWS natively as a drop-in NDES replacement (the Intune Connector documents itself as RFC 8894-compliant and works against any RFC 8894 SCEP server). The dispatcher walks parse → JWS signature verify (RS256 + ES256, alg=none rejected) → version dispatch → time bounds with ±tolerance → audience pin → CSR ↔ claim binding → replay cache → per-device rate limit → optional V3-Pro compliance hook. The trust anchor file is reloaded on `SIGHUP` (operator rotates the on-disk PEM, then `kill -HUP <certctl-pid>`); a parse failure during reload keeps the OLD pool so a half-rotation doesn't take Intune down.
|
||||
When a profile has `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true`, certctl validates the Microsoft Intune Certificate Connector's signed-challenge JWS natively as a drop-in NDES replacement (the Intune Connector documents itself as RFC 8894-conformant and works against any RFC 8894 SCEP server). The dispatcher walks parse → JWS signature verify (RS256 + ES256, alg=none rejected) → version dispatch → time bounds with ±tolerance → audience pin → CSR ↔ claim binding → replay cache → per-device rate limit → optional V3-Pro device-state hook. The trust anchor file is reloaded on `SIGHUP` (operator rotates the on-disk PEM, then `kill -HUP <certctl-pid>`); a parse failure during reload keeps the OLD pool so a half-rotation doesn't take Intune down.
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
@@ -441,11 +489,11 @@ When a profile has `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true`, certctl va
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CLOCK_SKEW_TOLERANCE` | No | `60s` | ±tolerance on iat/exp checks. Raise on poorly-NTP-synced fleets, lower to enforce strict time. Refused at boot when ≥ `INTUNE_CHALLENGE_VALIDITY`. |
|
||||
| `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H` | No | `3` | Max enrollments per `(claim.Subject, claim.Issuer)` in any rolling 24h window. Zero disables. |
|
||||
|
||||
See [`scep-intune.md`](scep-intune.md) for the full deployment guide — NDES + EJBCA migration playbook, Intune SCEP profile field mapping, trust-anchor extraction recipe, monitoring + Prometheus alert thresholds, and the Microsoft Learn citations operators paste into procurement-team requests.
|
||||
See [`scep-intune.md`](../protocols/scep-intune.md) for the full deployment guide — NDES + EJBCA migration playbook, Intune SCEP profile field mapping, trust-anchor extraction recipe, monitoring + Prometheus alert thresholds, and the Microsoft Learn citations operators paste into procurement-team requests.
|
||||
|
||||
#### SCEP probe in network scanner
|
||||
|
||||
The Network Scans GUI surface includes a one-click "Probe SCEP" form that runs a capability + posture check against any reachable SCEP server URL — `GetCACaps` + `GetCACert` (NEVER `PKCSReq`) so the probe is read-only and safe to run against production endpoints. Result fields surface advertised caps (POSTPKIOperation, SHA-256, SHA-512, AES, SCEPStandard, Renewal), CA cert subject + issuer + algorithm + days-to-expiry + chain length, and a probe duration. Results persist to `scep_probe_results` (migration `000021`) and the probe history is paginated under `GET /api/v1/network-scan/scep-probes`. Useful for pre-migration assessment ("what does the existing NDES advertise?") and compliance-posture audits.
|
||||
The Network Scans GUI surface includes a one-click "Probe SCEP" form that runs a capability + posture check against any reachable SCEP server URL — `GetCACaps` + `GetCACert` (NEVER `PKCSReq`) so the probe is read-only and safe to run against production endpoints. Result fields surface advertised caps (POSTPKIOperation, SHA-256, SHA-512, AES, SCEPStandard, Renewal), CA cert subject + issuer + algorithm + days-to-expiry + chain length, and a probe duration. Results persist to `scep_probe_results` (migration `000021`) and the probe history is paginated under `GET /api/v1/network-scan/scep-probes`. Useful for pre-migration assessment ("what does the existing NDES advertise?") and posture review.
|
||||
|
||||
| Endpoint | Auth | Description |
|
||||
|----------|------|-------------|
|
||||
@@ -490,9 +538,9 @@ The DigiCert connector integrates with DigiCert's CertCentral REST API for order
|
||||
| `CERTCTL_DIGICERT_ORG_ID` | — | DigiCert organization ID |
|
||||
| `CERTCTL_DIGICERT_PRODUCT_TYPE` | `ssl_basic` | Certificate product (e.g., `ssl_basic`, `ssl_plus`, `ssl_ev`) |
|
||||
| `CERTCTL_DIGICERT_BASE_URL` | `https://www.digicert.com/services/v2` | DigiCert API base URL |
|
||||
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. See [docs/async-polling.md](async-polling.md). |
|
||||
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. See [docs/reference/protocols/async-ca-polling.md](../protocols/async-ca-polling.md). |
|
||||
|
||||
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. `GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m capped, ±20% jitter, default 10-minute deadline) — see [async-polling.md](async-polling.md).
|
||||
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. `GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m capped, ±20% jitter, default 10-minute deadline) — see [async-polling.md](../protocols/async-ca-polling.md).
|
||||
|
||||
**Authentication:** API key passed via `X-DC-DEVKEY` header, with organization ID in request body.
|
||||
|
||||
@@ -515,9 +563,9 @@ The Sectigo connector integrates with Sectigo Certificate Manager's REST API for
|
||||
| `CERTCTL_SECTIGO_CERT_TYPE` | — | Certificate type ID (integer, from `/ssl/v1/types`) |
|
||||
| `CERTCTL_SECTIGO_TERM` | `365` | Certificate validity in days |
|
||||
| `CERTCTL_SECTIGO_BASE_URL` | `https://cert-manager.com/api` | Sectigo API base URL |
|
||||
| `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. The `collectNotReady` sentinel (cert approved but not yet retrievable) rides the same backoff schedule. See [docs/async-polling.md](async-polling.md). |
|
||||
| `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. The `collectNotReady` sentinel (cert approved but not yet retrievable) rides the same backoff schedule. See [docs/reference/protocols/async-ca-polling.md](../protocols/async-ca-polling.md). |
|
||||
|
||||
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. `GetOrderStatus` runs bounded internal polling — see [async-polling.md](async-polling.md).
|
||||
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. `GetOrderStatus` runs bounded internal polling — see [async-polling.md](../protocols/async-ca-polling.md).
|
||||
|
||||
**Authentication:** Three custom headers on every request — `customerUri`, `login`, and `password`.
|
||||
|
||||
@@ -622,7 +670,7 @@ Entrust CA Gateway REST API with mutual TLS (mTLS) client certificate authentica
|
||||
| `CERTCTL_ENTRUST_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_ENTRUST_CA_ID` | Yes | — | Certificate Authority ID (from `GET /certificate-authorities`) |
|
||||
| `CERTCTL_ENTRUST_PROFILE_ID` | No | — | Optional enrollment profile ID |
|
||||
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. Approval-pending workflows where humans approve enrollments should bump to `86400` (24h) so a single tick can wait through the approval window. See [docs/async-polling.md](async-polling.md). |
|
||||
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. Approval-pending workflows where humans approve enrollments should bump to `86400` (24h) so a single tick can wait through the approval window. See [docs/reference/protocols/async-ca-polling.md](../protocols/async-ca-polling.md). |
|
||||
|
||||
**Authentication:** Mutual TLS — the client certificate and key are loaded via `tls.LoadX509KeyPair()` and attached to the HTTP transport. No API key or token required.
|
||||
|
||||
@@ -646,7 +694,7 @@ GlobalSign Atlas High Volume CA REST API with dual authentication: mTLS for the
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_GLOBALSIGN_SERVER_CA_PATH` | No | system trust store | PEM bundle used to verify the Atlas API server certificate. Set this for private/lab Atlas deployments whose server TLS chain is not in the host's default trust bundle. |
|
||||
| `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. GlobalSign tracks orders by serial number rather than order ID; the polling shape is identical. See [docs/async-polling.md](async-polling.md). |
|
||||
| `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. GlobalSign tracks orders by serial number rather than order ID; the polling shape is identical. See [docs/reference/protocols/async-ca-polling.md](../protocols/async-ca-polling.md). |
|
||||
|
||||
**Authentication:** Dual — mTLS client certificate for TLS handshake plus `X-API-Key` and `X-API-Secret` headers on every request.
|
||||
|
||||
@@ -794,16 +842,16 @@ issued, SCEP-issued certs).
|
||||
|
||||
See:
|
||||
|
||||
- [ACME Server Reference](./acme-server.md) — env-var reference,
|
||||
- [ACME Server Reference](../protocols/acme-server.md) — env-var reference,
|
||||
endpoints, auth-mode decision tree, RFC 8555 conformance statement,
|
||||
troubleshooting, FAQ.
|
||||
- [cert-manager Walkthrough](./acme-cert-manager-walkthrough.md) — kind
|
||||
- [cert-manager Walkthrough](../../migration/acme-from-cert-manager.md) — kind
|
||||
→ cert-manager → certctl-server → Certificate flow.
|
||||
- [Caddy Walkthrough](./acme-caddy-walkthrough.md) — Caddyfile `acme_ca`
|
||||
- [Caddy Walkthrough](../../migration/acme-from-caddy.md) — Caddyfile `acme_ca`
|
||||
+ trust configuration.
|
||||
- [Traefik Walkthrough](./acme-traefik-walkthrough.md) — `certificatesResolvers`
|
||||
- [Traefik Walkthrough](../../migration/acme-from-traefik.md) — `certificatesResolvers`
|
||||
+ `serversTransport.rootCAs`.
|
||||
- [Threat Model](./acme-server-threat-model.md) — JWS forgery
|
||||
- [Threat Model](../protocols/acme-server-threat-model.md) — JWS forgery
|
||||
resistance, nonce store integrity, HTTP-01 SSRF, DNS-01 cache
|
||||
posture, TLS-ALPN-01 chain-not-validated rationale, rate-limit
|
||||
tuning, audit trail.
|
||||
@@ -1277,7 +1325,7 @@ certctl's SSH connector dials each target with `HostKeyCallback: ssh.InsecureIgn
|
||||
**When you should NOT use the SSH connector:**
|
||||
|
||||
- Deploying to **unknown / dynamic / multi-tenant** hosts where the IP-to-hostname binding isn't operator-controlled.
|
||||
- Environments with strict **regulatory MITM-resistance** requirements (PCI-DSS Level 1, FedRAMP High, etc.) — the inline-comment "out of scope" framing doesn't satisfy compliance auditors who want documented host-key verification at the connector level.
|
||||
- Environments with strict **regulatory MITM-resistance** requirements where the inline-comment "out of scope" framing doesn't satisfy reviewers who want documented host-key verification at the connector level.
|
||||
- For these cases, switch to a different connector (Kubernetes Secrets, WinCertStore, F5 with iControl REST under operator-managed cert pinning) **OR** layer a custom `SSHClient` with full `known_hosts` validation per the mitigations above.
|
||||
|
||||
**V3-Pro forward path:**
|
||||
@@ -1498,7 +1546,7 @@ The ARN updates in place across renewals (ACM `ImportCertificate` is upsert-styl
|
||||
- The cert key is held only in agent memory during the import call; never written to disk.
|
||||
- Every imported ACM cert is tagged with `certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback to the snapshotted previous cert; both outcomes are surfaced via Prometheus.
|
||||
- The minimum IAM policy is 5 actions on `arn:aws:acm:*:*:certificate/*`; CloudTrail captures every API call for compliance audits.
|
||||
- The minimum IAM policy is 5 actions on `arn:aws:acm:*:*:certificate/*`; CloudTrail captures every API call for audit.
|
||||
|
||||
**ValidateOnly contract.** ACM has no dry-run API for `ImportCertificate`; `ValidateOnly` returns `target.ErrValidateOnlyNotSupported` per the deploy-hardening I Phase 3 sentinel contract. Operators preview deploys via `ValidateConfig` + `aws acm describe-certificate --certificate-arn <arn>` against the current ARN.
|
||||
|
||||
@@ -1580,7 +1628,7 @@ Application Gateway / Front Door reference the cert by KID URI; certctl rotates
|
||||
- The cert key is held only in agent memory during the PFX wrap + import call; never written to disk.
|
||||
- Every imported Key Vault cert is tagged with `certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback by re-importing the snapshotted previous version's bytes; both outcomes are surfaced via Prometheus.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log captures every API call for compliance audits.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log captures every API call for audit.
|
||||
|
||||
**ValidateOnly contract.** Key Vault has no dry-run API; `ValidateOnly` returns `target.ErrValidateOnlyNotSupported`. Operators preview deploys via `ValidateConfig` + `az keyvault certificate show --vault-name <name> --name <cert>`.
|
||||
|
||||
@@ -1663,7 +1711,7 @@ ORDER BY created_at DESC;
|
||||
|
||||
Each row corresponds to one fired alert. The `channel` metadata field tells you which notifier ran. Combined with the Prometheus `certctl_expiry_alerts_total{result="failure"}` counter, you have full forensic visibility on every dispatch attempt.
|
||||
|
||||
**V3-Pro forward path.** Per-owner / per-team channel routing (route the Production-CDN cert's alerts to its dedicated owner's PagerDuty service, the Internal-API cert's alerts to a different one), calendar-aware suppression (no T-30 informational alerts on weekends for non-on-call teams), and escalation chains (T-1 unanswered for 30m → escalate to manager) are tracked on `cowork/WORKSPACE-ROADMAP.md` under "Adapter hardening" → "Multi-channel expiry alerts: per-owner routing".
|
||||
**V3-Pro forward path.** Per-owner / per-team channel routing (route the Production-CDN cert's alerts to its dedicated owner's PagerDuty service, the Internal-API cert's alerts to a different one), calendar-aware suppression (no T-30 informational alerts on weekends for non-on-call teams), and escalation chains (T-1 unanswered for 30m → escalate to manager) are tracked on the project roadmap under "Adapter hardening" → "Multi-channel expiry alerts: per-owner routing".
|
||||
|
||||
### Email (SMTP) Notifier
|
||||
|
||||
@@ -1699,7 +1747,7 @@ The digest HTML template includes:
|
||||
- Expiring certificates table (color-coded by urgency: 7d, 14d, 30d)
|
||||
- Auto-refresh and responsive email layout
|
||||
|
||||
**Scheduler Integration:** The opt-in digest scheduler loop runs on configurable interval (default 24 hours). It does NOT run on startup — waits for first scheduled tick. Operation timeout is 5 minutes. Each loop execution is guarded by `sync/atomic.Bool` idempotency. See `docs/architecture.md` for the full scheduler topology (12 loops, 8 always-on + 4 opt-in).
|
||||
**Scheduler Integration:** The opt-in digest scheduler loop runs on configurable interval (default 24 hours). It does NOT run on startup — waits for first scheduled tick. Operation timeout is 5 minutes. Each loop execution is guarded by `sync/atomic.Bool` idempotency. See `docs/reference/architecture.md` for the full scheduler topology (12 loops, 8 always-on + 4 opt-in).
|
||||
|
||||
Configuration:
|
||||
|
||||
@@ -1727,7 +1775,7 @@ API Endpoints:
|
||||
>
|
||||
> Then pass `--cacert "$CA"` (or `-k` for one-off smoke tests, never in
|
||||
> production). The same pattern is documented in
|
||||
> [`quickstart.md`](quickstart.md). Pre-U-2 these examples used `http://`
|
||||
> [`quickstart.md`](../../getting-started/quickstart.md). Pre-U-2 these examples used `http://`
|
||||
> and silently failed against the HTTPS listener; post-U-2 they speak
|
||||
> HTTPS with the operator-managed CA bundle.
|
||||
|
||||
@@ -1979,7 +2027,7 @@ curl --cacert "$CA" -s -X DELETE https://localhost:8443/api/v1/network-scan-targ
|
||||
|
||||
### Scheduler Integration
|
||||
|
||||
When `CERTCTL_NETWORK_SCAN_ENABLED=true`, the server runs the opt-in network scanner scheduler loop alongside the always-on loops (renewal, jobs, job retry, job timeout, agent health, notifications, notification retry, short-lived expiry). It scans all enabled targets at the configured interval (default 6h). Each target tracks `last_scan_at`, `last_scan_duration_ms`, and `last_scan_certs_found` for monitoring scan health. See `docs/architecture.md` for the full 12-loop scheduler topology.
|
||||
When `CERTCTL_NETWORK_SCAN_ENABLED=true`, the server runs the opt-in network scanner scheduler loop alongside the always-on loops (renewal, jobs, job retry, job timeout, agent health, notifications, notification retry, short-lived expiry). It scans all enabled targets at the configured interval (default 6h). Each target tracks `last_scan_at`, `last_scan_duration_ms`, and `last_scan_certs_found` for monitoring scan health. See `docs/reference/architecture.md` for the full 12-loop scheduler topology.
|
||||
|
||||
### Use Cases
|
||||
|
||||
@@ -2037,7 +2085,7 @@ Source path format: `gcp-sm://{project}/{secret-name}`. Sentinel agent: `cloud-g
|
||||
|
||||
### Cloud Discovery Scheduler
|
||||
|
||||
All enabled cloud sources run on a shared opt-in cloud discovery scheduler loop (see `docs/architecture.md` for the full 12-loop scheduler topology). The interval is configurable:
|
||||
All enabled cloud sources run on a shared opt-in cloud discovery scheduler loop (see `docs/reference/architecture.md` for the full 12-loop scheduler topology). The interval is configurable:
|
||||
|
||||
| Variable | Description | Default |
|
||||
|---|---|---|
|
||||
@@ -2048,6 +2096,6 @@ The loop runs immediately on startup and then on each tick. Each source runs seq
|
||||
|
||||
## What's Next
|
||||
|
||||
- [Architecture Guide](architecture.md) — Understanding the full system design
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](demo-advanced.md) — See the full certificate lifecycle in action
|
||||
- [Architecture Guide](../architecture.md) — Understanding the full system design
|
||||
- [Quick Start](../../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](../../getting-started/advanced-demo.md) — See the full certificate lifecycle in action
|
||||
@@ -0,0 +1,189 @@
|
||||
# Java Keystore (JKS / PKCS#12) Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Java Keystore target
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Java Keystore connector deploys certificates to JKS or
|
||||
PKCS#12 keystores via the `keytool` CLI. This enables TLS cert
|
||||
deployment for Tomcat, Jetty, Kafka, Elasticsearch, and any
|
||||
JVM-based service.
|
||||
|
||||
Flow: PEM → temp PKCS#12 → `keytool -importkeystore` into the
|
||||
target keystore. The flow is engineered for atomicity and
|
||||
rollback, not just convenience.
|
||||
|
||||
Implementation lives at `internal/connector/target/javakeystore/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Java Keystore connector when:
|
||||
|
||||
- The target is a JVM-based service (Tomcat, Jetty, Kafka,
|
||||
Elasticsearch, ZooKeeper) that reads TLS material from a
|
||||
keystore file.
|
||||
- You need PKCS#12 or JKS format support; the connector handles
|
||||
both.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The JVM service has been re-fronted with a non-Java reverse
|
||||
proxy (NGINX, HAProxy) that handles TLS termination — deploy
|
||||
to the proxy instead.
|
||||
- The service uses PKCS#11 or a hardware token rather than a
|
||||
keystore file — that's outside this connector's scope.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"keystore_path": "/opt/tomcat/conf/keystore.p12",
|
||||
"keystore_password": "changeit",
|
||||
"keystore_type": "PKCS12",
|
||||
"alias": "server",
|
||||
"reload_command": "systemctl restart tomcat"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `keystore_path` | (required) | Absolute path to the keystore file |
|
||||
| `keystore_password` | (required) | Keystore password |
|
||||
| `keystore_type` | `"PKCS12"` | `"PKCS12"` or `"JKS"` |
|
||||
| `alias` | `"server"` | Key entry alias in the keystore |
|
||||
| `reload_command` | — | Optional command to run after keystore update |
|
||||
| `create_keystore` | `true` | Create keystore if it doesn't exist |
|
||||
| `keytool_path` | `"keytool"` | Override keytool binary path |
|
||||
| `backup_retention` | `3` | Number of `.certctl-bak.<unix-nanos>.p12` snapshot files to keep after a successful deploy. `0` means use the default of 3; `-1` opts out of pruning entirely. |
|
||||
| `backup_dir` | `dirname(keystore_path)` | Override directory where rollback snapshots are written and pruned from. Defaults to the keystore's own directory so snapshots land on the same filesystem. |
|
||||
|
||||
## Atomic-rollback contract (Bundle 8)
|
||||
|
||||
The deploy flow is **snapshot → delete → import → reload**.
|
||||
|
||||
Before the irreversible `keytool -delete` step (which removes the
|
||||
existing alias from the keystore), the connector runs `keytool
|
||||
-exportkeystore` to write a sibling `.certctl-bak.<unix-nanos>.p12`
|
||||
file containing the prior alias.
|
||||
|
||||
If the subsequent `keytool -importkeystore` fails for any reason,
|
||||
the rollback path runs `keytool -delete` (best-effort cleanup of
|
||||
any partial alias the failed import created) followed by
|
||||
`keytool -importkeystore` from the snapshot PFX, restoring the
|
||||
keystore to its pre-deploy state.
|
||||
|
||||
If both the import AND the rollback fail, the connector returns
|
||||
an operator-actionable wrapped error containing both error
|
||||
strings AND the snapshot path so the operator can manually
|
||||
`keytool -importkeystore` from the `.p12` file to recover.
|
||||
|
||||
Successful deploys prune older `.certctl-bak.*.p12` files beyond
|
||||
the configured `backup_retention` count; pruning sorts by file
|
||||
ModTime and removes the oldest entries first. Operators that wire
|
||||
their own archival/rotation logic can opt out via
|
||||
`backup_retention: -1`.
|
||||
|
||||
First-time deploys (no keystore file exists at the configured
|
||||
path) skip the snapshot phase entirely — there's nothing to roll
|
||||
back to. The same is true for "alias-not-present-in-existing-
|
||||
keystore" deploys: `keytool -exportkeystore` returns "alias does
|
||||
not exist" which the connector recognises as a normal first-
|
||||
time-on-existing-keystore signal, not an outage.
|
||||
|
||||
## Operator playbook: keytool argv password exposure
|
||||
|
||||
Java's `keytool` accepts the keystore password via the
|
||||
`-storepass` argv flag — there is no stdin or file-based password
|
||||
mode in OpenJDK keytool. While the keytool subprocess is running,
|
||||
the password is visible in `ps(1)` output to any user on the same
|
||||
host who can read `/proc/<pid>/cmdline`. **This is a standard
|
||||
keytool limitation, not a certctl-specific issue**, but operators
|
||||
in regulated environments should know about it.
|
||||
|
||||
### What this means in practice
|
||||
|
||||
- The password is visible for the duration of each keytool
|
||||
invocation (typically <1s on modern hardware; the connector
|
||||
runs 2-4 keytool calls per deploy: snapshot, optional
|
||||
pre-import delete, import, optional rollback).
|
||||
- A local user with shell access on the agent host who polls
|
||||
`ps -ef` aggressively can capture the password.
|
||||
- The exposure is local to the agent host; remote attackers
|
||||
without shell access cannot see it.
|
||||
- The same applies to the snapshot's transient `-deststorepass`
|
||||
(which mirrors the operator's keystore password by design —
|
||||
see "Why the snapshot reuses the keystore password" below).
|
||||
|
||||
### Mitigations
|
||||
|
||||
Layer one or more depending on threat model:
|
||||
|
||||
- **Restrict shell access to the agent host.** Only the certctl
|
||||
agent's service account should have a login shell. Other admins
|
||||
SSH to a bastion that doesn't host the agent.
|
||||
- **Use Linux user namespaces or AppArmor** to deny `ps`-
|
||||
visibility into the keytool subprocess for non-root users.
|
||||
systemd's `ProtectKernelTunables=yes` + `ProtectProc=invisible`
|
||||
(kernel 5.8+) hides `/proc/<pid>` from non-owner users.
|
||||
- **Run the certctl agent in a single-purpose container** so only
|
||||
the agent's processes are visible to anyone who execs into the
|
||||
container. The host's `ps` doesn't see container internals if
|
||||
proper PID-namespace isolation is configured.
|
||||
- **Rotate the keystore password post-deployment.** For
|
||||
high-security environments where the brief exposure is
|
||||
unacceptable, the rotation can itself be automated via a
|
||||
post-deploy hook running `keytool -storepasswd`. The certctl
|
||||
`reload_command` is the natural place for this; just be aware
|
||||
the new password must be propagated to whatever service reads
|
||||
the keystore (Tomcat's `server.xml`, Kafka's
|
||||
`kafka.properties`, etc.).
|
||||
- **For FIPS environments**, use the `BCFKS` (BouncyCastle FIPS)
|
||||
keystore type which supports stronger password-derivation. Same
|
||||
argv-exposure caveat applies; the keystore-format change
|
||||
doesn't affect how keytool receives the password.
|
||||
|
||||
For a fundamentally different password-handling model, switch to
|
||||
a non-Java target (e.g. PEM-on-disk via the SSH connector + a
|
||||
JCA-shim like `tomcat-native` reading PEMs directly) or a
|
||||
PKCS#11 keystore (where the password is supplied to the cryptoki
|
||||
library, not via argv).
|
||||
|
||||
### Why the snapshot reuses the keystore password
|
||||
|
||||
The snapshot's `keytool -exportkeystore` writes a PKCS#12 file
|
||||
under a `-deststorepass`. The connector reuses the operator's
|
||||
`keystore_password` for this rather than generating a separate
|
||||
transient password. Two reasons:
|
||||
|
||||
1. The operator already trusts the connector with this secret,
|
||||
so the surface area doesn't grow.
|
||||
2. The rollback's matching `keytool -importkeystore` needs to
|
||||
know the password too, and threading a second random
|
||||
password through the in-memory state machine adds complexity
|
||||
(and another argv-exposure window) for no security gain.
|
||||
|
||||
If you rotate the keystore password between deploys, the
|
||||
rollback may fail to read the snapshot — keep stale
|
||||
`.certctl-bak.*.p12` files on disk until the rotation completes,
|
||||
and clean them up manually if rotation invalidates them.
|
||||
|
||||
## Security baseline
|
||||
|
||||
- Reload commands validated against shell injection via
|
||||
`validation.ValidateShellCommand()`.
|
||||
- Alias validated against injection (alphanumeric, hyphens,
|
||||
underscores only).
|
||||
- Path traversal prevention on keystore path.
|
||||
- Transient PKCS#12 temp file cleaned up after import (even on
|
||||
error).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [Windows Certificate Store](wincertstore.md) — comparable cert-store deploy on Windows
|
||||
- [SSH agentless](ssh.md) — alternative when the JVM target is reachable via SSH and you'd rather drop PEM files than maintain a keystore
|
||||
@@ -1,6 +1,11 @@
|
||||
# Kubernetes Secrets Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -113,5 +118,5 @@ update, then re-apply if desired.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
@@ -0,0 +1,169 @@
|
||||
# Local CA Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Local CA issuer. For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Local CA issuer signs certificates using Go's `crypto/x509`
|
||||
library directly inside certctl-server. There is no external CA
|
||||
service involved — certctl owns the signing key and emits
|
||||
certificates synchronously.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/local/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Local CA when:
|
||||
|
||||
- You're standing up an internal-only PKI and don't want to operate
|
||||
a separate CA service (Vault, step-ca, EJBCA).
|
||||
- You want certctl to be the single point of administration:
|
||||
signing key, profile policy, CRL and OCSP responder, and
|
||||
lifecycle automation all live in one process.
|
||||
- You want sub-CA mode to chain into an enterprise root (ADCS,
|
||||
HSM-backed root, or another upstream CA) so existing trust
|
||||
stores validate certctl-issued leaves automatically.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You need a public-trust certificate — the Local CA is internal
|
||||
only. Use ACME or DigiCert / Sectigo for public trust.
|
||||
- You want signing material backed by an HSM or cloud KMS — that
|
||||
is on the roadmap (the `internal/crypto/signer/` driver
|
||||
abstraction exists; HSM, cloud KMS, and SSH-CA drivers don't
|
||||
yet ship). Until those drivers ship, sub-CA mode pointing at a
|
||||
hardware-protected root is the closest production posture.
|
||||
|
||||
## Modes
|
||||
|
||||
### Self-signed mode (default)
|
||||
|
||||
Creates a CA on first use (in memory), issues certificates with
|
||||
proper serial numbers, validity periods, SANs, and key usage
|
||||
extensions. Designed for development and demos — certificates are
|
||||
self-signed and not trusted by browsers without operator-side
|
||||
trust-store work.
|
||||
|
||||
### Sub-CA mode (production)
|
||||
|
||||
Loads a CA certificate and private key from disk
|
||||
(`CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH`). The CA cert was
|
||||
signed by an upstream CA (e.g. ADCS), so all issued certificates
|
||||
chain to the enterprise root trust hierarchy. Clients that
|
||||
already trust the enterprise root automatically trust
|
||||
certctl-issued certs.
|
||||
|
||||
Supports RSA, ECDSA, and PKCS#8 key formats. If the paths are not
|
||||
set, the connector falls back to self-signed mode. The loaded
|
||||
certificate must have `IsCA=true` and `KeyUsageCertSign`.
|
||||
|
||||
### Tree mode (Rank 8 — multi-level CA hierarchy)
|
||||
|
||||
When `Issuer.HierarchyMode = "tree"` is set on the issuer row, the
|
||||
connector reads the active CA hierarchy from the
|
||||
`intermediate_cas` table and assembles `IssuanceResult.ChainPEM`
|
||||
by walking the `parent_ca_id` ancestry from the issuing leaf CA up
|
||||
to the root.
|
||||
|
||||
Tree mode is operator-managed via the admin-gated
|
||||
`/api/v1/issuers/{id}/intermediates` and
|
||||
`/api/v1/intermediates/{id}` endpoints (`POST` to create / sign
|
||||
children, `GET` to list / inspect, `POST .../retire` to two-phase
|
||||
retire). The signing path is shared with single-mode (cert is
|
||||
signed via `c.caCert` + `c.caSigner` from the on-disk issuing CA
|
||||
cert+key); only the chain bytes differ.
|
||||
|
||||
RFC 5280 §3.2 (self-signed root validation), §4.2.1.9 (path-length
|
||||
tightening), and §4.2.1.10 (NameConstraints subset semantics) are
|
||||
enforced at the service layer fail-closed. The default is
|
||||
`single`, byte-identical to the pre-Rank-8 historical flow.
|
||||
|
||||
See [intermediate-ca-hierarchy.md](../intermediate-ca-hierarchy.md)
|
||||
for the operator runbook covering 4-level boundary, 3-level policy,
|
||||
and 2-level internal-PKI patterns, and the migration runbook for
|
||||
flipping a single-mode issuer to tree.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"ca_common_name": "CertCtl Local CA",
|
||||
"validity_days": 90,
|
||||
"ca_cert_path": "/etc/certctl/ca/ca.pem",
|
||||
"ca_key_path": "/etc/certctl/ca/ca-key.pem"
|
||||
}
|
||||
```
|
||||
|
||||
## CRL and OCSP (M15b)
|
||||
|
||||
The Local CA serves DER-encoded X.509 CRLs unauthenticated at
|
||||
`GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615,
|
||||
`Content-Type: application/pkix-crl`) with 24-hour validity.
|
||||
|
||||
An embedded OCSP responder at
|
||||
`GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960,
|
||||
`Content-Type: application/ocsp-response`) returns signed OCSP
|
||||
responses for issued certificates (good / revoked / unknown
|
||||
status).
|
||||
|
||||
Both endpoints are reachable by relying parties with no certctl
|
||||
API credentials, which is how standard TLS clients, browsers, and
|
||||
hardware appliances consume these resources.
|
||||
|
||||
Certificates with profile TTL < 1 hour automatically skip
|
||||
CRL/OCSP — expiry is treated as sufficient revocation for
|
||||
short-lived credentials.
|
||||
|
||||
## Extended Key Usage support (M27)
|
||||
|
||||
The Local CA respects EKU constraints from certificate profiles
|
||||
and adjusts key usage flags accordingly:
|
||||
|
||||
- **S/MIME** (`emailProtection` EKU) →
|
||||
`DigitalSignature | ContentCommitment`.
|
||||
- **TLS** (`serverAuth` / `clientAuth` EKU) →
|
||||
`DigitalSignature | KeyEncipherment`.
|
||||
|
||||
This enables a single CA to issue TLS, S/MIME, code signing, and
|
||||
timestamping certificates from one issuer row.
|
||||
|
||||
## MaxTTL enforcement (M11c)
|
||||
|
||||
When a certificate profile defines a maximum TTL, the Local CA
|
||||
caps the `NotAfter` field to `min(validity_days, maxTTL)`. This
|
||||
ensures certificates never exceed the profile's configured
|
||||
lifetime regardless of the issuer's `validity_days` setting.
|
||||
|
||||
## L-014 file-on-disk threat-model carve-out
|
||||
|
||||
In file-driver mode (the default), the CA private key sits on the
|
||||
certctl-server filesystem as a PEM at `CERTCTL_CA_KEY_PATH`. This
|
||||
is a standard internal-PKI posture but means filesystem
|
||||
compromise of the certctl host equals signing-key compromise.
|
||||
Mitigations:
|
||||
|
||||
- **Filesystem permissions.** Mode 0600, owned by the certctl
|
||||
service user. The connector preflight refuses to load a key
|
||||
whose mode is wider than 0600.
|
||||
- **Sub-CA rotation.** Rotate the certctl sub-CA cert+key
|
||||
periodically (yearly is a sensible default) so a captured key
|
||||
has a bounded blast-radius window.
|
||||
- **Filesystem audit.** Add an `auditctl` watch on the key path;
|
||||
any read/write attempt outside certctl-server's process is
|
||||
logged.
|
||||
- **Move to alternate signer drivers when they ship.** The
|
||||
`internal/crypto/signer/` interface is the integration seam;
|
||||
HSM (PKCS#11), cloud KMS, and SSH-CA drivers will close the
|
||||
filesystem-residency leg without changing the rest of the
|
||||
signing path.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [ADCS integration](adcs.md) — sub-CA mode rooted at ADCS
|
||||
- [Intermediate CA hierarchy](../intermediate-ca-hierarchy.md) — tree mode operator runbook
|
||||
- [CRL and OCSP](../protocols/crl-ocsp.md) — RFC 5280 / RFC 6960 endpoint reference
|
||||
@@ -1,7 +1,12 @@
|
||||
# NGINX Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. Operator-
|
||||
> grade documentation for the NGINX target connector.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. Operator-grade
|
||||
> documentation for the NGINX target connector. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -153,7 +158,7 @@ handshakes during a deploy succeed without 5xx errors.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
— the Bundle I primitive every connector consumes.
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Connectors reference](connectors.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
- [Connectors reference](index.md)
|
||||
@@ -0,0 +1,156 @@
|
||||
# OpenSSL / Custom CA Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the script-based OpenSSL /
|
||||
> Custom CA issuer connector. For the connector-development context
|
||||
> (interface contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
Script-based issuer connector for organizations with existing CA
|
||||
tooling. Delegates certificate signing, revocation, and CRL
|
||||
generation to user-provided shell scripts. The connector `exec`s
|
||||
the script for every certificate lifecycle operation; the script
|
||||
runs as the certctl-server user with that user's full filesystem
|
||||
and network access.
|
||||
|
||||
This is the highest-flexibility, highest-trust connector in
|
||||
certctl. It exists to integrate with arbitrary CLI-driven CAs that
|
||||
don't have a Go SDK — at the cost of a wider attack surface than
|
||||
any other issuer.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/openssl/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the OpenSSL / Custom CA connector when:
|
||||
|
||||
- Your CA is a CLI tool (BoringSSL, custom OpenSSL wrapper,
|
||||
hardware-CA controller, internal CA with no published SDK) and
|
||||
no Go-native adapter exists.
|
||||
- You're prepared to operate the script with the same care as any
|
||||
privileged binary on the host (review every line, lock the path
|
||||
ownership and mode, audit invocations).
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- A Go-native adapter exists for your CA (Vault, DigiCert,
|
||||
Sectigo, ACME, AWS ACM PCA, Google CAS, EJBCA, Entrust,
|
||||
GlobalSign, step-ca). Use the native adapter — narrower attack
|
||||
surface, no shell-out exposure.
|
||||
- You're in a regulated environment where shell-out attack
|
||||
surfaces are formally disallowed by your security policy.
|
||||
- You're running multi-tenant certctl-server where tenant-A's
|
||||
script can affect tenant-B's certificates.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Required | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_OPENSSL_SIGN_SCRIPT` | Yes | Script that receives CSR on stdin and outputs signed PEM cert on stdout |
|
||||
| `CERTCTL_OPENSSL_REVOKE_SCRIPT` | No | Script to revoke a certificate (receives serial number as argument) |
|
||||
| `CERTCTL_OPENSSL_CRL_SCRIPT` | No | Script that outputs DER-encoded CRL on stdout |
|
||||
| `CERTCTL_OPENSSL_TIMEOUT_SECONDS` | No | Script execution timeout (default 30s) |
|
||||
|
||||
The sign script receives the CSR PEM on stdin and outputs the
|
||||
signed certificate PEM on stdout. The connector parses the
|
||||
certificate to extract serial number, validity dates, and chain
|
||||
information.
|
||||
|
||||
Before shell execution, serial numbers are validated as hex-only
|
||||
(`^[0-9a-fA-F]+$`) and revocation reason codes are validated
|
||||
against the RFC 5280 specification to prevent argv injection. Both
|
||||
checks live in `internal/validation/command.go`.
|
||||
|
||||
## Threat model
|
||||
|
||||
certctl's OpenSSL adapter is a deliberate trade between
|
||||
flexibility and attack surface. Top-10 fix #6 of the 2026-05-03
|
||||
issuer-coverage audit captured the threat model in detail; the
|
||||
short version is below.
|
||||
|
||||
### What the adapter accepts
|
||||
|
||||
- A trusted operator pointing at a trusted script that lives in a
|
||||
trusted filesystem location (`/usr/local/bin/`,
|
||||
`/opt/<vendor>/bin/`, etc.) with appropriate ownership
|
||||
(root-owned, mode 0755) and a clear audit trail
|
||||
(filesystem-monitored, version-controlled).
|
||||
- Env-var inheritance from the certctl-server process. Operators
|
||||
must NOT export sensitive credentials (Vault tokens, API keys
|
||||
for OTHER systems) into certctl-server's environment — or, if
|
||||
they must, must accept that those credentials are visible to the
|
||||
issuance script. The connector does not whitelist or strip env
|
||||
vars before fork.
|
||||
- The hex-only serial-number filter and the RFC 5280 reason-code
|
||||
allow-list as defenses against argv injection. They are NOT
|
||||
defenses against a malicious script.
|
||||
|
||||
### What the adapter does NOT accept
|
||||
|
||||
- A script path under operator-writable filesystem (`/tmp`,
|
||||
`/var/tmp`, `~`) where a non-root user can swap the binary
|
||||
mid-flight. **Symlink attack:** a non-root user with write
|
||||
access to the directory replaces the script with a symlink to
|
||||
`/etc/shadow` or `/root/.ssh/authorized_keys`; certctl-server
|
||||
reads (or in the worst case writes via a malicious script)
|
||||
those files.
|
||||
- Untrusted script content. The script can do anything the
|
||||
certctl-server user can — modify state outside `/etc/certctl/`,
|
||||
exfiltrate data, write SSH keys to enable persistence.
|
||||
Operators MUST review every script line before deploying.
|
||||
- A multi-tenant host where multiple operators deploy scripts
|
||||
under the same certctl-server. Process-level isolation isn't
|
||||
enforced; one operator's script can read another's working
|
||||
files (the temp CSR/cert files the connector writes to
|
||||
`os.TempDir()` are mode 0600 but are visible by name to anyone
|
||||
who can list the directory).
|
||||
|
||||
## Mitigations operators can layer on
|
||||
|
||||
- **Run certctl-server under a dedicated unprivileged user**
|
||||
(e.g. `certctl:certctl`). The systemd unit ships with
|
||||
`User=certctl` by default — keep it that way.
|
||||
- **Pin the script path to a root-owned mode-0755 binary**
|
||||
(`/usr/local/bin/issue-cert.sh`, root:root, 0755). Add a
|
||||
filesystem audit rule (`auditctl -w /usr/local/bin/issue-cert.sh
|
||||
-p wa -k certctl-script`) so any write attempt to the script is
|
||||
logged.
|
||||
- **Set a per-call timeout via `CERTCTL_OPENSSL_TIMEOUT_SECONDS`**
|
||||
(default 30s). The connector wires this through
|
||||
`exec.CommandContext` so a hung script is killed at the
|
||||
wall-clock budget. Production operators should set it to the
|
||||
upper bound of legitimate issuance time — anything longer is a
|
||||
runaway.
|
||||
- **Sanitise the certctl-server environment.** systemd's
|
||||
`Environment=` directive lets operators allow-list which env
|
||||
vars certctl-server (and therefore the script) sees.
|
||||
Default-deny is the safe posture; the connector itself does NOT
|
||||
scrub envs before fork.
|
||||
- **Use a chroot or container.** systemd's `RootDirectory=` or
|
||||
running certctl-server in a container limits the filesystem the
|
||||
script can touch.
|
||||
- **Audit the script's behaviour.** A wrapper script that logs
|
||||
every invocation's argv + env-snapshot + exit code to a
|
||||
separate audit log gives operators a forensic trail.
|
||||
- **Per-call concurrency bound.** The renewal scheduler's
|
||||
`CERTCTL_RENEWAL_CONCURRENCY` (Bundle L closure) bounds
|
||||
scheduled traffic; ad-hoc `POST /api/v1/certificates` traffic
|
||||
isn't bounded. For high-volume environments, layer a
|
||||
reverse-proxy rate limit (NGINX, HAProxy) in front of the API.
|
||||
|
||||
## V3-Pro forward path
|
||||
|
||||
The hardened OpenSSL adapter (chroot/container by default,
|
||||
env-var allow-list at the adapter layer, signed-script-binary
|
||||
verification, audit-log-on-every-invocation, per-call concurrency
|
||||
bound shared with the API surface) is V3-Pro work. Tracking:
|
||||
the project roadmap (search "OpenSSL hardened mode").
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Local CA issuer](local-ca.md) — Go-native alternative when the CA can be run as a sub-CA under certctl
|
||||
- [Vault PKI](vault.md), [EJBCA](ejbca.md), [DigiCert](digicert.md) — Go-native alternatives for common CA stacks
|
||||
@@ -0,0 +1,175 @@
|
||||
# Postfix / Dovecot Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Postfix / Dovecot mail server
|
||||
> TLS connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
A dual-mode mail-server TLS connector. Writes certificate, key, and
|
||||
chain files to configured paths and reloads the mail service. The
|
||||
`mode` field selects between Postfix MTA and Dovecot IMAP/POP3,
|
||||
which determines default file paths and reload commands.
|
||||
|
||||
This connector pairs with certctl's S/MIME certificate support
|
||||
(email protection EKU, email SAN routing) for a complete email
|
||||
infrastructure story — TLS for transport encryption, S/MIME for
|
||||
end-to-end message signing and encryption.
|
||||
|
||||
Implementation lives at `internal/connector/target/postfix/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Postfix / Dovecot connector when:
|
||||
|
||||
- You operate a self-hosted mail server (Postfix as MTA, Dovecot
|
||||
as IMAPS/POP3S) and want certctl to rotate the TLS material in
|
||||
place.
|
||||
- You want validate-before-reload behaviour to keep a bad cert
|
||||
config from taking down mail.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're running a mail provider (Google Workspace, Microsoft 365)
|
||||
— the provider rotates certs internally.
|
||||
- Your MTA is something else (Exim, Sendmail) — these don't have
|
||||
built-in connectors yet; use a [generic file-based
|
||||
target](index.md#target-connector) by hand or commission a
|
||||
custom adapter.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Postfix mode
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "postfix",
|
||||
"cert_path": "/etc/postfix/certs/cert.pem",
|
||||
"key_path": "/etc/postfix/certs/key.pem",
|
||||
"chain_path": "/etc/postfix/certs/chain.pem",
|
||||
"reload_command": "postfix reload",
|
||||
"validate_command": "postfix check"
|
||||
}
|
||||
```
|
||||
|
||||
### Dovecot mode
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "dovecot",
|
||||
"cert_path": "/etc/dovecot/certs/cert.pem",
|
||||
"key_path": "/etc/dovecot/certs/key.pem",
|
||||
"chain_path": "/etc/dovecot/certs/chain.pem",
|
||||
"reload_command": "doveadm reload",
|
||||
"validate_command": "doveconf -n"
|
||||
}
|
||||
```
|
||||
|
||||
### Field reference
|
||||
|
||||
| Field | Default (Postfix) | Default (Dovecot) | Description |
|
||||
|---|---|---|---|
|
||||
| `mode` | `postfix` | `dovecot` | Service mode — determines defaults |
|
||||
| `cert_path` | `/etc/postfix/certs/cert.pem` | `/etc/dovecot/certs/cert.pem` | Path for certificate file |
|
||||
| `key_path` | `/etc/postfix/certs/key.pem` | `/etc/dovecot/certs/key.pem` | Path for private key (0600 permissions) |
|
||||
| `chain_path` | (empty) | (empty) | If set, chain written separately; otherwise appended to cert |
|
||||
| `reload_command` | `postfix reload` | `doveadm reload` | Command to reload the mail service |
|
||||
| `validate_command` | `postfix check` | `doveconf -n` | Optional config validation before reload |
|
||||
|
||||
All commands are validated against shell injection via
|
||||
`validation.ValidateShellCommand()`. File permissions: cert /
|
||||
chain 0644, key 0600.
|
||||
|
||||
## Choosing Mode=postfix vs Mode=dovecot
|
||||
|
||||
Both modes share the same Go connector code (atomic-write,
|
||||
PreCommit/PostCommit hooks, post-deploy verify, rollback), so the
|
||||
rollback contract is identical across modes. The mode flag just
|
||||
swaps the daemon-specific defaults.
|
||||
|
||||
`mode: postfix` is also the **default when `mode` is unset**.
|
||||
|
||||
### Hosts running BOTH Postfix and Dovecot
|
||||
|
||||
The common mail-server pattern. Configure **two separate targets**
|
||||
in the certctl control plane, one per daemon. Each gets its own
|
||||
cert path, its own validate / reload command, and its own
|
||||
optional verify endpoint. The cert + key bytes can be identical
|
||||
across the two targets if your mail server uses the same TLS
|
||||
material for both daemons (which many do); certctl does not
|
||||
deduplicate the deploys, but the byte-equal cert hits the
|
||||
SHA-256 idempotency short-circuit on subsequent renewals when
|
||||
the target paths haven't changed.
|
||||
|
||||
### Sharing a single cert file across daemons via symlink
|
||||
|
||||
Works fine with the connector — the atomic-write path's
|
||||
`os.Rename` follows symlinks. Configure both targets to point at
|
||||
the same canonical path, or have one target's `cert_path`
|
||||
symlink into the other's. Operators who want byte-deduplication
|
||||
should rely on this approach rather than asking certctl to
|
||||
coordinate it.
|
||||
|
||||
## Daemon-specific quirks
|
||||
|
||||
### Postfix STARTTLS (port 25)
|
||||
|
||||
Typically requires the cert to chain to a public root for
|
||||
receiving mail from arbitrary external MTAs that validate
|
||||
SMTP-side server certs. If you're deploying a self-signed cert
|
||||
from `iss-local`, configure the receiving Postfix accordingly
|
||||
(e.g. `smtpd_use_tls=yes` + `smtpd_tls_security_level=may` for
|
||||
opportunistic TLS so external senders that don't validate
|
||||
continue to deliver).
|
||||
|
||||
### Dovecot IMAPS (port 993)
|
||||
|
||||
Typically client-facing — the chain you ship matters more here
|
||||
because IMAPS clients (Thunderbird, Outlook) actively validate.
|
||||
Set `chain_path` if your certificate chain is supplied
|
||||
separately; when `chain_path` is unset, the connector appends the
|
||||
chain bytes to `cert_path`.
|
||||
|
||||
### No shared TLS session cache
|
||||
|
||||
Postfix and Dovecot do not share a TLS session cache by default.
|
||||
Both reload independently, so a cert renewal that updates both
|
||||
targets via certctl requires both reloads to succeed before
|
||||
clients re-handshake. The two targets are fully independent in
|
||||
the certctl scheduler — one reload failing rolls back that
|
||||
target only.
|
||||
|
||||
## Post-deploy verify
|
||||
|
||||
Operator-supplied via `post_deploy_verify` (`enabled` +
|
||||
`endpoint` + `timeout`) — the connector does NOT bake in a
|
||||
per-mode default port. Operators that opt in should set
|
||||
`endpoint` to their daemon's listener (e.g. `mail.example.com:25`
|
||||
for Postfix STARTTLS, `mail.example.com:993` for Dovecot IMAPS).
|
||||
|
||||
## Test pins
|
||||
|
||||
Bundle 11 (commit `88e8881`) added end-to-end tests for
|
||||
`Mode=dovecot`:
|
||||
|
||||
- `TestPostfix_Atomic_DovecotMode_HappyPath` — confirms
|
||||
`applyDefaults` populates the dovecot validate + reload
|
||||
commands AND the deploy threads them through to `runValidate`
|
||||
+ `runReload`.
|
||||
- `TestPostfix_Atomic_DovecotMode_VerifyFails_Rollback` —
|
||||
confirms the rollback path under `Mode=dovecot` restores
|
||||
pre-deploy cert + key bytes byte-exact.
|
||||
|
||||
The `Mode=postfix` branch has equivalent test coverage in the
|
||||
same file (see `TestPostfix_HappyPath`,
|
||||
`TestPostfix_VerifyMismatch_Rollback`,
|
||||
`TestPostfix_ReloadFails_Rollback`).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [NGINX](nginx.md) — comparable file-based deploy with explicit reload
|
||||
- [Apache](apache.md) — comparable file-based deploy with `apachectl configtest`
|
||||
@@ -0,0 +1,98 @@
|
||||
# Sectigo SCM Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Sectigo Certificate Manager
|
||||
> (SCM) issuer connector. For the connector-development context
|
||||
> (interface contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Sectigo connector integrates with Sectigo Certificate Manager's
|
||||
REST API for ordering and managing DV, OV, and EV certificates.
|
||||
Like DigiCert, it uses an async order model: submit an enrollment,
|
||||
receive an `sslId`, then poll for completion.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/sectigo/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Sectigo SCM connector when:
|
||||
|
||||
- You're already a Sectigo Certificate Manager customer (formerly
|
||||
Comodo CA / SecureTrust SCM).
|
||||
- You need OV / EV certificates that Sectigo validates before
|
||||
issuance.
|
||||
- You want certctl to drive renewal lifecycle on top of Sectigo's
|
||||
commercial issuance.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're using Sectigo through their ACME endpoint — the
|
||||
[ACME connector](acme.md) is a simpler path.
|
||||
- You only need DV certificates and want a free public-trust CA —
|
||||
Let's Encrypt or ZeroSSL via the ACME connector.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SECTIGO_CUSTOMER_URI` | — | Sectigo customer URI (organization identifier) |
|
||||
| `CERTCTL_SECTIGO_LOGIN` | — | API account login |
|
||||
| `CERTCTL_SECTIGO_PASSWORD` | — | API account password |
|
||||
| `CERTCTL_SECTIGO_ORG_ID` | — | Organization ID (integer) |
|
||||
| `CERTCTL_SECTIGO_CERT_TYPE` | — | Certificate type ID (integer, from `/ssl/v1/types`) |
|
||||
| `CERTCTL_SECTIGO_TERM` | `365` | Certificate validity in days |
|
||||
| `CERTCTL_SECTIGO_BASE_URL` | `https://cert-manager.com/api` | Sectigo API base URL |
|
||||
| `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus` |
|
||||
|
||||
## Authentication
|
||||
|
||||
Three custom headers on every request: `customerUri`, `login`,
|
||||
and `password`. No mTLS or OAuth2.
|
||||
|
||||
## Issuance model
|
||||
|
||||
`POST /ssl/v1/enroll` returns an `sslId`. DV certificates may
|
||||
issue immediately; OV/EV certificates require Sectigo-side
|
||||
validation and poll-based completion.
|
||||
|
||||
`GetOrderStatus` runs bounded internal polling
|
||||
(5s/15s/45s/2m/5m capped, ±20% jitter, default 10-minute
|
||||
deadline). The `collectNotReady` sentinel (cert approved but not
|
||||
yet retrievable) rides the same backoff schedule. Bump
|
||||
`CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` for OV/EV workflows where
|
||||
human approval extends past 10 minutes — see
|
||||
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
|
||||
schedule shape and tuning guidance.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by Sectigo. certctl records revocations
|
||||
locally and notifies Sectigo via `/ssl/v1/revoke/{sslId}`. Unlike
|
||||
DigiCert (no auto-notify), Sectigo's revocation is part of the
|
||||
connector's revoke path.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Credential rotation
|
||||
|
||||
Rotate the API password in Sectigo's admin portal, then either
|
||||
restart certctl-server with the new value in
|
||||
`CERTCTL_SECTIGO_PASSWORD` or hot-swap via `PUT /api/v1/issuers/{id}`.
|
||||
The registry's Rebuild path replaces the connector with the new
|
||||
credentials. No certificate state is invalidated.
|
||||
|
||||
### Diagnosing slow OV/EV issuance
|
||||
|
||||
Sectigo's OV/EV vetting is human-driven and can take hours to
|
||||
days. The same operational pattern as DigiCert applies: issue OV/EV
|
||||
certs well ahead of expiry so the bounded poll deadline is short.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
- [DigiCert connector](digicert.md) — comparable commercial CA alternative
|
||||
- [ACME connector](acme.md) — simpler path when Sectigo is reachable via ACME
|
||||
@@ -0,0 +1,193 @@
|
||||
# SSH (Agentless) Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the SSH agentless target
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The SSH connector enables agentless certificate deployment to any
|
||||
Linux/Unix server via SSH/SFTP. Instead of installing the certctl
|
||||
agent binary on every target, a single "proxy agent" in the same
|
||||
network zone deploys certificates to remote servers over SSH.
|
||||
|
||||
This is ideal for environments where installing agents on every
|
||||
server is impractical — air-gapped servers, legacy fleets, or
|
||||
brownfield environments where agent installation requires change-
|
||||
control tickets per host.
|
||||
|
||||
Implementation lives at `internal/connector/target/ssh/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the SSH connector when:
|
||||
|
||||
- Installing the certctl agent on every target is impractical or
|
||||
politically expensive.
|
||||
- The agent-to-target network path is operator-controlled.
|
||||
- You're deploying to known, registered infrastructure where the
|
||||
operator implicitly trusts the host (you're already shipping it
|
||||
a TLS cert).
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're deploying across the public internet to dynamic /
|
||||
multi-tenant hosts. The connector accepts any host key
|
||||
(`InsecureIgnoreHostKey`); MITM resistance requires the
|
||||
mitigations below.
|
||||
- Your environment has strict regulatory MITM-resistance
|
||||
requirements. The inline-comment "out of scope" framing on
|
||||
host-key acceptance doesn't satisfy reviewers who want
|
||||
documented host-key verification at the connector level.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Key authentication (recommended)
|
||||
|
||||
```json
|
||||
{
|
||||
"host": "web-server.internal",
|
||||
"port": 22,
|
||||
"user": "certctl",
|
||||
"auth_method": "key",
|
||||
"private_key_path": "/home/certctl/.ssh/id_ed25519",
|
||||
"cert_path": "/etc/ssl/certs/cert.pem",
|
||||
"key_path": "/etc/ssl/private/key.pem",
|
||||
"chain_path": "/etc/ssl/certs/chain.pem",
|
||||
"reload_command": "systemctl reload nginx",
|
||||
"timeout": 30
|
||||
}
|
||||
```
|
||||
|
||||
### Password authentication
|
||||
|
||||
```json
|
||||
{
|
||||
"host": "legacy-server.internal",
|
||||
"user": "deploy",
|
||||
"auth_method": "password",
|
||||
"password": "s3cret",
|
||||
"cert_path": "/etc/ssl/cert.pem",
|
||||
"key_path": "/etc/ssl/key.pem",
|
||||
"reload_command": "systemctl reload apache2"
|
||||
}
|
||||
```
|
||||
|
||||
### Field reference
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `host` | (required) | SSH hostname or IP address |
|
||||
| `port` | 22 | SSH port |
|
||||
| `user` | (required) | SSH username |
|
||||
| `auth_method` | `"key"` | `"key"` or `"password"` |
|
||||
| `private_key_path` | — | Path to SSH private key file (key auth) |
|
||||
| `private_key` | — | Inline SSH private key PEM (alternative to path) |
|
||||
| `password` | — | SSH password (password auth) |
|
||||
| `passphrase` | — | Passphrase for encrypted private keys |
|
||||
| `cert_path` | (required) | Remote path for certificate file |
|
||||
| `key_path` | (required) | Remote path for private key file |
|
||||
| `chain_path` | — | Remote path for chain file (if empty, chain appended to cert) |
|
||||
| `cert_mode` | `"0644"` | File permissions for cert (octal) |
|
||||
| `key_mode` | `"0600"` | File permissions for private key (octal) |
|
||||
| `reload_command` | — | Command to execute after deployment |
|
||||
| `timeout` | 30 | SSH connection timeout in seconds |
|
||||
|
||||
## Security baseline
|
||||
|
||||
- **Key-based authentication is recommended** over password
|
||||
authentication. Encrypted private keys are supported via
|
||||
`passphrase`.
|
||||
- **Reload commands are validated against shell injection** (same
|
||||
validation as Postfix/Dovecot connectors).
|
||||
- **Host field is regex-validated** to prevent shell metacharacters.
|
||||
- **Private keys are written with 0600 permissions** by default.
|
||||
- **Host key verification is intentionally skipped.** See the
|
||||
threat model below.
|
||||
|
||||
## Operator playbook: SSH host-key verification
|
||||
|
||||
certctl's SSH connector dials each target with
|
||||
`HostKeyCallback: ssh.InsecureIgnoreHostKey()`, meaning **the
|
||||
connector accepts any server host key without comparison against
|
||||
`known_hosts`**. This is a documented design choice, not an
|
||||
oversight.
|
||||
|
||||
### Why the connector accepts any host key
|
||||
|
||||
- certctl deploys to operator-configured target infrastructure.
|
||||
Each target is registered explicitly in the control plane with
|
||||
hostname + auth credentials + cert/key paths; the operator
|
||||
implicitly trusts the host they're deploying to (otherwise why
|
||||
give it a TLS cert).
|
||||
- Mirrors the same posture certctl applies to the network scanner
|
||||
(`InsecureSkipVerify` for cert-monitoring TLS handshakes) and
|
||||
the F5 connector (`Insecure` flag for self-signed BIG-IP
|
||||
management interfaces).
|
||||
- Avoids a heavyweight per-target `known_hosts` management layer
|
||||
that would shift complexity onto operators with no
|
||||
proportional security gain when the network model is
|
||||
"operator-configured infrastructure on operator-controlled
|
||||
network".
|
||||
|
||||
### Threat model the design accepts
|
||||
|
||||
- A passive eavesdropper on the agent-to-target link. SSH's
|
||||
transport encryption still applies — host-key acceptance
|
||||
affects MITM vulnerability, not on-the-wire confidentiality.
|
||||
- A MITM attacker on the agent-to-target link who can intercept
|
||||
the SSH TCP handshake AND has positioned themselves on a
|
||||
hostname the operator has registered as a deploy target.
|
||||
Layered authentication (per-target SSH keys with strong
|
||||
passphrases stored at the agent) limits the blast radius — the
|
||||
MITM gets one target's cert+key payload, not the agent's
|
||||
broader credentials.
|
||||
|
||||
### Threat model the design does NOT accept
|
||||
|
||||
- Deploying across the public internet to a host whose IP
|
||||
rotates (e.g. ephemeral cloud instances behind a load balancer
|
||||
that doesn't pin SSH host keys). In that scenario,
|
||||
`InsecureIgnoreHostKey` opens an MITM window during IP
|
||||
rotation — register a `known_hosts` file path or use SSH
|
||||
certificates (below) instead.
|
||||
- Multi-tenant networks where another tenant could plausibly
|
||||
impersonate the target host. certctl's design assumes
|
||||
operator-controlled network paths.
|
||||
|
||||
### Mitigations operators can layer on
|
||||
|
||||
- **`known_hosts` enforcement**: implement a custom `SSHClient`
|
||||
(the connector's `SSHClient` interface accepts injected clients
|
||||
via `NewWithClient`) whose `Connect` method builds an
|
||||
`ssh.ClientConfig` with `HostKeyCallback` set to
|
||||
`knownhosts.New("/path/to/known_hosts")` from
|
||||
`golang.org/x/crypto/ssh/knownhosts`.
|
||||
- **SSH certificate authentication**: use OpenSSH 5.4+ host
|
||||
certificates signed by an organizational CA. Configure the
|
||||
agent's `known_hosts` CA pinning via `@cert-authority` lines so
|
||||
any host presenting a certificate signed by the CA is trusted,
|
||||
regardless of IP rotation.
|
||||
- **Network segmentation**: run the certctl agent on the same
|
||||
private network segment as its targets; require VPN tunnels
|
||||
for cross-network deploys; use bastion hosts with their own
|
||||
host-key validation.
|
||||
- **Per-target SSH keys**: rotate the agent's SSH credentials
|
||||
per target so a successful MITM compromise is bounded to that
|
||||
one target's cert+key, not the agent's broader credential set.
|
||||
|
||||
### V3-Pro forward path
|
||||
|
||||
The operator-managed `known_hosts` integration (config field +
|
||||
`HostKeyCallback` plumbing + per-target root-of-trust enforcement)
|
||||
is documented as V3-Pro work. Tracking:
|
||||
`WORKSPACE-ROADMAP.md` (search for "SSH known_hosts").
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [F5 BIG-IP](f5.md) — comparable proxy-agent target where the agent doesn't run on the appliance itself
|
||||
- [Kubernetes Secrets](k8s.md) — agent-in-cluster alternative when the targets are workloads rather than VMs
|
||||
@@ -0,0 +1,99 @@
|
||||
# step-ca (Smallstep) Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the step-ca issuer connector.
|
||||
> For the connector-development context (interface contract,
|
||||
> registry, ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The step-ca connector integrates with Smallstep's step-ca private
|
||||
CA using its native `/sign` API with JWK provisioner
|
||||
authentication. Issuance is synchronous — submit a CSR plus a
|
||||
provisioner-signed token, get back a signed certificate in the
|
||||
same response.
|
||||
|
||||
This is simpler than ACME for internal PKI: no challenge solving,
|
||||
no domain validation, just CSR + auth token → signed certificate.
|
||||
For ACME-based step-ca usage, point the ACME connector at
|
||||
step-ca's ACME directory URL instead.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/stepca/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the step-ca connector when:
|
||||
|
||||
- You already run step-ca as your internal CA and want certctl to
|
||||
drive lifecycle automation on top.
|
||||
- You want synchronous issuance against an internal CA without
|
||||
ACME's challenge dance.
|
||||
- You want certctl to enforce profile / MaxTTL policy on step-ca-
|
||||
issued certs.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You want to use step-ca's ACME directory — that path goes
|
||||
through the [ACME connector](acme.md) instead, which gives you
|
||||
ACME features (ARI, EAB, profile selection) on top.
|
||||
- You don't already run step-ca and want a simpler internal CA —
|
||||
the [Local CA](local-ca.md) issuer is a one-process alternative.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"ca_url": "https://ca.internal:9000",
|
||||
"provisioner_name": "certctl",
|
||||
"provisioner_key_path": "/etc/certctl/stepca/provisioner.json",
|
||||
"provisioner_password": "...",
|
||||
"root_cert_path": "/etc/certctl/stepca/root_ca.crt",
|
||||
"validity_days": 90
|
||||
}
|
||||
```
|
||||
|
||||
Environment variables:
|
||||
|
||||
- `CERTCTL_STEPCA_URL` — step-ca server URL
|
||||
- `CERTCTL_STEPCA_PROVISIONER` — JWK provisioner name
|
||||
- `CERTCTL_STEPCA_KEY_PATH` — Path to provisioner private key
|
||||
(JWK JSON)
|
||||
- `CERTCTL_STEPCA_PASSWORD` — Provisioner key password
|
||||
|
||||
## Authentication: JWK provisioner
|
||||
|
||||
A JWK provisioner is created in step-ca with a passphrase-encrypted
|
||||
private key (JSON Web Key format). certctl signs short-lived
|
||||
proof-of-authorization tokens with the provisioner key for each
|
||||
issuance request. The provisioner password is needed to decrypt the
|
||||
JWK on disk; it is held in memory by certctl-server.
|
||||
|
||||
Rotation: rotate the JWK provisioner in step-ca, distribute the new
|
||||
JWK + password to certctl, then either restart certctl-server or
|
||||
hot-swap via `PUT /api/v1/issuers/{id}` so the registry's Rebuild
|
||||
path replaces the connector with the new provisioner config.
|
||||
|
||||
## MaxTTL enforcement (M11c)
|
||||
|
||||
When a certificate profile defines a maximum TTL, the step-ca
|
||||
connector caps the `NotAfter` field to ensure the issued
|
||||
certificate does not exceed the profile limit, regardless of the
|
||||
step-ca provisioner's own maximum.
|
||||
|
||||
## Revocation and CRL/OCSP
|
||||
|
||||
step-ca-issued certificates rely on step-ca's own CRL/OCSP
|
||||
infrastructure. certctl's local CRL/OCSP endpoints
|
||||
(`GET /.well-known/pki/crl/{issuer_id}` and
|
||||
`GET /.well-known/pki/ocsp/{issuer_id}/{serial}`, served
|
||||
unauthenticated per RFC 5280 §5 / RFC 6960 / RFC 8615) are
|
||||
populated from step-ca's revocation data if available, but clients
|
||||
should validate against step-ca's endpoints for the authoritative
|
||||
status.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [ACME connector](acme.md) — alternative path to step-ca via its ACME directory URL
|
||||
- [Local CA issuer](local-ca.md) — simpler internal-CA alternative when step-ca isn't already deployed
|
||||
@@ -0,0 +1,105 @@
|
||||
# Traefik Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Traefik target connector.
|
||||
> For the connector-development context (interface contract,
|
||||
> registry, atomic deploy primitive shared across all targets), see
|
||||
> the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Traefik connector uses Traefik's **file provider** — it writes
|
||||
certificate and key files to a watched directory, and Traefik
|
||||
automatically picks up the changes without any explicit reload
|
||||
command. This is the simplest deployment model in the catalog:
|
||||
write the files, Traefik does the rest.
|
||||
|
||||
Implementation lives at `internal/connector/target/traefik/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Traefik connector when:
|
||||
|
||||
- Traefik fronts your services with the file provider configured
|
||||
(`providers.file.directory` in Traefik's static config).
|
||||
- You want a no-reload deployment path — Traefik picks up file
|
||||
changes automatically.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're running Traefik with its built-in ACME client. Either
|
||||
point Traefik at certctl's ACME server (see
|
||||
[migration/acme-from-traefik.md](../../migration/acme-from-traefik.md))
|
||||
or let certctl-issued certs flow through this file-provider
|
||||
connector — but don't run both.
|
||||
- Traefik is not exposed (e.g. behind another reverse proxy that
|
||||
terminates TLS); the front-most TLS terminator is what wants
|
||||
the cert.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"cert_dir": "/etc/traefik/certs",
|
||||
"cert_file": "site.crt",
|
||||
"key_file": "site.key"
|
||||
}
|
||||
```
|
||||
|
||||
The `cert_dir` is the directory Traefik is configured to watch
|
||||
via its file provider. The connector writes `cert_file` and
|
||||
`key_file` into this directory with appropriate permissions
|
||||
(0644 for the cert, 0600 for the key). Traefik's file watcher
|
||||
detects the change and reloads the TLS configuration
|
||||
automatically.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)`
|
||||
flow:
|
||||
|
||||
1. Idempotency check on cert + key bytes.
|
||||
2. Pre-deploy backup of existing files.
|
||||
3. Atomic write of cert + key to temp paths.
|
||||
4. Atomic rename of temp paths to final cert / key paths.
|
||||
5. **No reload command** — Traefik's file watcher handles it.
|
||||
6. Post-deploy TLS verify when configured (dials the endpoint;
|
||||
pulls leaf cert SHA-256; compares).
|
||||
|
||||
The validate / reload / rollback semantics that NGINX and HAProxy
|
||||
depend on don't apply here — Traefik's file watcher is the
|
||||
"reload"; if Traefik fails to load the new file, that's a Traefik
|
||||
problem visible in Traefik's logs, and the previous cert remains
|
||||
served until Traefik retries.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### File watcher latency
|
||||
|
||||
Traefik's file watcher polls the directory; the cert may take a
|
||||
few seconds to be picked up after the atomic rename. Post-deploy
|
||||
verify with `PostDeployVerifyAttempts: 5` and a small backoff
|
||||
covers this comfortably.
|
||||
|
||||
### Multi-router deployments
|
||||
|
||||
Traefik routes traffic by hostname, and the file provider can
|
||||
expose multiple certs in the same directory. Configure one
|
||||
certctl target per cert (one `cert_file` + `key_file` pair per
|
||||
hostname); they all land in the same watched directory and
|
||||
Traefik picks them up.
|
||||
|
||||
### Mixing file provider with ACME
|
||||
|
||||
If Traefik is also running its own ACME client, both can write to
|
||||
the same `certificatesResolvers` config but with different
|
||||
storage backends. Best practice: don't mix. Pick one source of
|
||||
truth — either Traefik's ACME or certctl-supplied files — and
|
||||
delete the other config block from `traefik.yml`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [NGINX](nginx.md) — explicit-reload deploy contract counterpart
|
||||
- [Migration: point Traefik at certctl's ACME](../../migration/acme-from-traefik.md) — alternative pattern when Traefik should pull rather than have certctl push
|
||||
@@ -0,0 +1,128 @@
|
||||
# Vault PKI Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the HashiCorp Vault PKI issuer
|
||||
> connector. For the connector-development context (interface contract,
|
||||
> registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Vault PKI connector integrates with HashiCorp Vault's PKI secrets
|
||||
engine using its native `/sign` API with token-based authentication.
|
||||
The flow is purely synchronous — Vault returns the signed certificate
|
||||
in the same HTTP response that submits the CSR — so there is no
|
||||
challenge-solving or async polling on the certctl side.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/vault/`. The
|
||||
factory key is `Vault`; the registry binds it under whatever issuer
|
||||
ID the operator picks (e.g. `iss-vault`).
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Vault PKI connector when:
|
||||
|
||||
- Your organization already runs Vault as the system of record for
|
||||
internal certificates.
|
||||
- You want a synchronous, low-latency issuance path with no challenge
|
||||
flow (no DNS records, no HTTP-01).
|
||||
- You want certctl to manage the lifecycle (renewal scheduling,
|
||||
deployment, alerts) while Vault keeps the signing material.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- Public-trust certificates are required — Vault PKI is internal-only.
|
||||
Use ACME (Let's Encrypt, ZeroSSL, Sectigo) or DigiCert / Sectigo SCM
|
||||
for public-trust workloads.
|
||||
- The Vault PKI engine is not already deployed and you don't want to
|
||||
run Vault. The Local CA issuer is a simpler self-contained path for
|
||||
small internal CAs.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_VAULT_ADDR` | — | Vault server address (e.g. `https://vault.internal:8200`) |
|
||||
| `CERTCTL_VAULT_TOKEN` | — | Vault auth token with permissions on the PKI mount |
|
||||
| `CERTCTL_VAULT_MOUNT` | `pki` | PKI secrets engine mount path |
|
||||
| `CERTCTL_VAULT_ROLE` | — | PKI role name for certificate signing |
|
||||
| `CERTCTL_VAULT_TTL` | `8760h` | Certificate validity period (TTL) |
|
||||
|
||||
Vault issues certificates synchronously via the
|
||||
`/v1/{mount}/sign/{role}` API with `X-Vault-Token` header
|
||||
authentication. The issued certificate is parsed to extract serial
|
||||
number, validity dates, and chain information.
|
||||
|
||||
## Token TTL and automatic renewal
|
||||
|
||||
This was Top-10 fix #5 from the 2026-05-03 issuer-coverage audit.
|
||||
|
||||
certctl-server periodically calls `POST /v1/auth/token/renew-self` at
|
||||
half the token's TTL to keep the integration alive without manual
|
||||
rotation. The cadence is read from a one-shot `lookup-self` at
|
||||
startup and re-derived on every successful renewal — so a short
|
||||
bootstrap token that gets renewed up to a longer Max TTL shifts to
|
||||
the longer cadence automatically.
|
||||
|
||||
The renewal loop emits the
|
||||
`certctl_vault_token_renewals_total{result="success"|"failure"|"not_renewable"}`
|
||||
Prometheus counter so operators see expiry trouble in Grafana before
|
||||
issuance breaks.
|
||||
|
||||
When Vault returns `renewable: false` (configured Max TTL reached),
|
||||
the loop logs a WARN, increments `{result="not_renewable"}`, and
|
||||
exits. The operator must rotate the Vault token and either restart
|
||||
certctl-server or use the GUI / MCP issuer-update path to swap the
|
||||
token in place — the registry's Rebuild path re-Starts the lifecycle
|
||||
on the new connector.
|
||||
|
||||
Per-tick failures (e.g. transient 5xx, brief network blips) bump
|
||||
`{result="failure"}` and the loop keeps ticking. Only the explicit
|
||||
`renewable: false` case stops it.
|
||||
|
||||
## MaxTTL enforcement (M11c)
|
||||
|
||||
When a certificate profile defines a maximum TTL, the Vault connector
|
||||
overrides the TTL string in the signing request to ensure the issued
|
||||
certificate does not exceed the profile limit. This is applied
|
||||
**before** Vault's own role-level max TTL — so the effective limit is
|
||||
the minimum of (profile.MaxTTL, role.MaxLeaseTTL).
|
||||
|
||||
## Revocation and CRL/OCSP
|
||||
|
||||
CRL and OCSP are managed by Vault itself. Clients should validate
|
||||
certificate status against Vault's own CRL/OCSP endpoints
|
||||
(`GET /v1/{mount}/crl` and Vault's OCSP responder). certctl does not
|
||||
generate local CRL/OCSP for Vault-issued certificates. Revocation is
|
||||
recorded locally (audit row + cert state) but Vault is the
|
||||
authoritative source for relying parties.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Token rotation without downtime
|
||||
|
||||
Two paths:
|
||||
|
||||
1. **Restart-driven.** Update `CERTCTL_VAULT_TOKEN` env var on the
|
||||
server, restart certctl-server. The renewal loop picks up the new
|
||||
token's lookup-self response and resumes ticking.
|
||||
2. **Hot-swap via API/GUI.** `PUT /api/v1/issuers/{id}` with the
|
||||
updated config; the registry's Rebuild path replaces the connector
|
||||
without restart. Use this when Vault's Max TTL has been reached
|
||||
and the existing token can no longer be renewed.
|
||||
|
||||
### Diagnosing renewal failures
|
||||
|
||||
Watch
|
||||
`certctl_vault_token_renewals_total{result="not_renewable"}` and
|
||||
`{result="failure"}`. Sustained failures with no `not_renewable`
|
||||
generally indicate Vault unreachability or token-policy drift; a
|
||||
spike in `not_renewable` is the canonical signal that a Max TTL
|
||||
boundary was hit and operator action is required.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Issuer hierarchy primitive](../intermediate-ca-hierarchy.md) — how Vault sits as a sub-CA under another issuer
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive used by other issuers; Vault is synchronous so does not consume it
|
||||
@@ -0,0 +1,118 @@
|
||||
# Windows Certificate Store Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Windows Certificate Store
|
||||
> target connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Windows Certificate Store connector imports certificates into
|
||||
the Windows cert store via PowerShell, **without managing IIS site
|
||||
bindings**. Use this for non-IIS Windows services that read
|
||||
certificates from the cert store: Exchange, RDP, SQL Server, ADFS,
|
||||
LSA-protected services, etc.
|
||||
|
||||
Same injectable `PowerShellExecutor` pattern as the IIS connector,
|
||||
with optional WinRM proxy mode for agentless deployment to remote
|
||||
Windows hosts.
|
||||
|
||||
Implementation lives at `internal/connector/target/wincertstore/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Windows Certificate Store connector when:
|
||||
|
||||
- The target is a Windows service that reads certs from the
|
||||
Windows cert store (Exchange transport TLS, RDP listener, SQL
|
||||
Server SSL endpoint, ADFS token-signing cert, etc.).
|
||||
- You don't want IIS-binding management (use the
|
||||
[IIS connector](iis.md) for that).
|
||||
- You're deploying via an in-host agent (`mode: local`) or via
|
||||
WinRM from a proxy agent (`mode: winrm`).
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The target is IIS with site bindings — use the
|
||||
[IIS connector](iis.md) for binding management.
|
||||
- The target reads certs from a JKS / PKCS#12 keystore — use the
|
||||
[Java Keystore](jks.md) connector.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"store_name": "My",
|
||||
"store_location": "LocalMachine",
|
||||
"friendly_name": "Production API Cert",
|
||||
"remove_expired": true
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `store_name` | `"My"` | Windows cert store name (My, Root, WebHosting, etc.) |
|
||||
| `store_location` | `"LocalMachine"` | `"LocalMachine"` or `"CurrentUser"` |
|
||||
| `friendly_name` | — | Optional friendly name for the imported certificate |
|
||||
| `remove_expired` | `false` | Remove expired certs with same CN after import |
|
||||
| `mode` | `"local"` | `"local"` (agent-local) or `"winrm"` (remote) |
|
||||
| `winrm_host` | — | WinRM hostname (required for winrm mode) |
|
||||
| `winrm_port` | 5985 | WinRM port (5985 HTTP, 5986 HTTPS) |
|
||||
| `winrm_username` | — | WinRM username (required for winrm mode) |
|
||||
| `winrm_password` | — | WinRM password (required for winrm mode) |
|
||||
| `winrm_https` | `false` | Use HTTPS for WinRM |
|
||||
| `winrm_insecure` | `false` | Skip TLS verification for WinRM |
|
||||
| `exec_deadline` | `60s` | Per-PowerShell-subprocess cap that fires only when the caller's `ctx` has no deadline of its own. A caller-supplied deadline always wins; this is a safety net so a hung WinRM session or stuck `Cert:` provider call cannot block the deploy worker indefinitely. Operators on slow links can extend with e.g. `"exec_deadline": "5m"`. |
|
||||
|
||||
## Deploy modes
|
||||
|
||||
### `mode: local`
|
||||
|
||||
Runs PowerShell in-process on the agent host. Requires the agent
|
||||
to be installed on the Windows target itself. Best fit for
|
||||
single-host services (a Windows server running Exchange or SQL
|
||||
Server alone).
|
||||
|
||||
### `mode: winrm`
|
||||
|
||||
Runs PowerShell remotely via WinRM from a proxy agent. Best fit
|
||||
for fleets where you don't want to install the certctl agent on
|
||||
every Windows host. Use HTTPS WinRM (port 5986) with
|
||||
`winrm_insecure: false` for production; HTTP WinRM (5985) is
|
||||
acceptable on operator-controlled networks.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Selecting the right store
|
||||
|
||||
- `My` — personal cert store under LocalMachine. Default for
|
||||
Exchange transport TLS, SQL Server, RDP, most service-account
|
||||
workloads.
|
||||
- `Root` — trusted root CA store. **Don't import leaves here.**
|
||||
This is for adding trust anchors only.
|
||||
- `WebHosting` — alternative store for IIS websites; the IIS
|
||||
connector typically uses `My` instead.
|
||||
|
||||
### Removing expired certs
|
||||
|
||||
`remove_expired: true` cleans up old cert versions with the same
|
||||
Subject CN after a successful import. Useful in long-running
|
||||
fleets where the cert store accumulates dozens of expired entries
|
||||
over years of rotations.
|
||||
|
||||
### Handling private-key permissions
|
||||
|
||||
Imported certs land with the Network Service account having read
|
||||
access by default. For services running as a different account
|
||||
(e.g. a domain user for SQL Server), the operator needs to grant
|
||||
that account read access to the private key after import — this
|
||||
isn't automated by the connector. Use the post-deploy
|
||||
`reload_command` to run a `Set-Acl` step if you need it.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [IIS connector](iis.md) — IIS site-binding management on top of the cert store
|
||||
- [Java Keystore](jks.md) — JVM-based service alternative
|
||||
@@ -1,5 +1,7 @@
|
||||
# Deployment Atomicity, Post-Deploy Verification, and Rollback
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> Deploy-hardening I master bundle (v2.X.0). Operator + integrator
|
||||
> reference for the atomic-write + post-deploy TLS verify +
|
||||
> rollback pipeline that closes the procurement-checklist gap with
|
||||
@@ -21,7 +23,7 @@ a single shared primitive:
|
||||
|---|---|---|
|
||||
| **Atomic deploy with rollback** | F5 only (transactional API) | 12 of 13 connectors via `deploy.Apply` (K8s pending Bundle 2 — see [Section 1.5](#15-audit-closure-status-2026-05-02-deployment-target-audit)) |
|
||||
| **Post-deploy TLS verification** | None | NGINX/Apache/HAProxy/Traefik/Caddy/Envoy/Postfix all do TLS handshake + SHA-256 fingerprint compare; fail → rollback |
|
||||
| **Vendor-specific deployment recipes** | Light docs | (Bundle II — `cowork/deploy-hardening-ii-prompt.md`) |
|
||||
| **Vendor-specific deployment recipes** | Light docs | (Bundle II — per the project's deploy-hardening II spec) |
|
||||
|
||||
This document describes the operator-visible surface. The Go-level
|
||||
contract lives at `internal/deploy/doc.go`.
|
||||
@@ -29,7 +31,7 @@ contract lives at `internal/deploy/doc.go`.
|
||||
## 1.5. Audit closure status (2026-05-02 deployment-target audit)
|
||||
|
||||
The 2026-05-02 deployment-target coverage audit
|
||||
(`cowork/deployment-target-audit-2026-05-02/RESULTS.md`) tightened the
|
||||
(the 2026-05-02 deployment-target audit) tightened the
|
||||
atomic + rollback contract on the connectors below. All bundles in the
|
||||
table are committed to `master` as of this section's last edit; commit
|
||||
hashes pin to the canonical landing commit for each piece of work.
|
||||
@@ -52,7 +54,7 @@ hashes pin to the canonical landing commit for each piece of work.
|
||||
real `k8s.io/client-go` implementation + `ResourceVersion` plumbing
|
||||
+ post-deploy SHA-256 verify + kubelet sync poll is the remaining
|
||||
V2 P0 blocker. Tracking prompt:
|
||||
`cowork/deployment-target-audit-2026-05-02/k8s-real-client-prompt.md`.
|
||||
the project's k8s-real-client spec.
|
||||
|
||||
Bundle 10 (per-connector loadtest harness, commit `6286cd4`) does not
|
||||
modify the per-connector contract table; it's a CI / observability
|
||||
@@ -132,7 +134,7 @@ Apply's algorithm:
|
||||
| ssh | (Connect probe) | (SCP upload + remote chmod) | `tls.Dial` to remote TLS port | Pre-deploy SCP backup of remote files |
|
||||
| wincertstore | (Get-ChildItem Cert:\) | (Import-PfxCertificate) | (admin probe) | Get-ChildItem snapshot for rollback |
|
||||
| javakeystore | (`keytool -list`) | (`keytool -importkeystore`) | (admin probe) | keytool snapshot; rollback via `keytool -delete` + re-import |
|
||||
| k8ssecret | (V2 blocker — see note below) | (V2 blocker — see note below) | (V2 blocker — see note below) | **V2 blocker — Bundle 2 of the 2026-05-02 deployment-target audit.** Production `realK8sClient` at `internal/connector/target/k8ssecret/k8ssecret.go:397-420` is a stub (every method returns `"real Kubernetes client not implemented — use NewWithClient for tests"`). The SHA-256 post-deploy verify and kubelet sync poll are designed but not yet implemented; production deploys to a real cluster fail with "not implemented" until Bundle 2 lands. Test mocks via `NewWithClient` work today. Tracking prompt: `cowork/deployment-target-audit-2026-05-02/k8s-real-client-prompt.md`. |
|
||||
| k8ssecret | (V2 blocker — see note below) | (V2 blocker — see note below) | (V2 blocker — see note below) | **V2 blocker — Bundle 2 of the 2026-05-02 deployment-target audit.** Production `realK8sClient` at `internal/connector/target/k8ssecret/k8ssecret.go:397-420` is a stub (every method returns `"real Kubernetes client not implemented — use NewWithClient for tests"`). The SHA-256 post-deploy verify and kubelet sync poll are designed but not yet implemented; production deploys to a real cluster fail with "not implemented" until Bundle 2 lands. Test mocks via `NewWithClient` work today. Tracking prompt: the project's k8s-real-client spec. |
|
||||
|
||||
> **Postfix vs Dovecot mode**: see "Choosing Mode=postfix vs Mode=dovecot" in
|
||||
> `docs/connectors.md` for the per-mode defaults (cert/key paths, validate +
|
||||
@@ -296,11 +298,11 @@ Out of scope for the V2-free deploy-hardening I bundle:
|
||||
- **Multi-region deployment coordination** — orchestration of N
|
||||
data-center deploys with operator approval gates per stage.
|
||||
- **Cert-pinning verification against mobile-app pin manifests**.
|
||||
- **SOC 2 evidence-report generator** — auto-export of the
|
||||
deploy audit trail in the format SOC 2 auditors expect.
|
||||
- **Audit-evidence report generator** — auto-export of the
|
||||
deploy audit trail in a reviewer-friendly format.
|
||||
- **Customer-paid validation matrices** — vendor-version certified
|
||||
quirks (e.g. "tested on F5 v15.1 + v17.0 + v17.5"). See
|
||||
`cowork/deploy-hardening-ii-prompt.md` for the per-vendor
|
||||
the project's deploy-hardening II spec for the per-vendor
|
||||
edge-case audit + integration test sidecars.
|
||||
|
||||
## 12. Per-connector quick reference
|
||||
@@ -0,0 +1,233 @@
|
||||
# Intermediate CA hierarchy — operator runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Rank 8 of the 2026-05-03 deep-research deliverable. This page is the
|
||||
canonical reference for operators running certctl as a multi-level
|
||||
internal PKI.
|
||||
|
||||
The default `single`-mode flow (one operator-supplied sub-CA loaded
|
||||
from disk at boot) is unchanged and will keep working byte-for-byte
|
||||
forever. This page is for operators who need a real CA tree:
|
||||
|
||||
- Boundary-CA deployments where you want separation of policy and
|
||||
issuing authorities.
|
||||
- Policy-CA deployments (one root, one policy CA per business unit,
|
||||
one issuing CA per environment).
|
||||
- OT / industrial control networks where the air-gapped root signs
|
||||
online sub-CAs that go in and out of service on a rotation.
|
||||
|
||||
## Concepts
|
||||
|
||||
`Issuer.HierarchyMode` is a per-issuer column on the `issuers` table.
|
||||
Two values are valid (the database default is `"single"` — back-compat
|
||||
byte-identical for unmigrated rows):
|
||||
|
||||
- `single` — pre-Rank-8 historical flow. The local connector loads a
|
||||
pre-signed CA cert+key from disk via `local.Config.CACertPath` /
|
||||
`local.Config.CAKeyPath`. Existing operators upgrade with no
|
||||
behavior change.
|
||||
- `tree` — the issuer's CAs are managed via the `intermediate_cas`
|
||||
table. Chain assembly walks the `parent_ca_id` foreign key from the
|
||||
issuing leaf CA up to the root and attaches the assembled chain to
|
||||
every `IssuanceResult`.
|
||||
|
||||
Each row in `intermediate_cas` is one CA cert (root, policy, issuing).
|
||||
The lifecycle is `created` → `active` → `retiring` → `retired`. The
|
||||
state column is a closed enum and validates at the service layer; the
|
||||
postgres CHECK constraint enforces it at the database layer too.
|
||||
|
||||
A CA's private key bytes are NEVER persisted on the row. The
|
||||
`key_driver_id` column is a reference (filesystem path / KMS key ID /
|
||||
HSM slot) that the `signer.Driver` resolves at sign time. A SQL
|
||||
injection or a row-leak surface MUST NEVER expose key bytes; only the
|
||||
reference can leak.
|
||||
|
||||
## Lifecycle states
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> created : CreateRoot / CreateChild
|
||||
created --> active : registration completes
|
||||
active --> retiring : Retire(confirm=false)
|
||||
retiring --> retired : Retire(confirm=true)
|
||||
retired --> [*]
|
||||
|
||||
note right of retiring
|
||||
Drain start. CA stops issuing
|
||||
NEW children; existing children
|
||||
keep issuing until they retire.
|
||||
end note
|
||||
|
||||
note right of retired
|
||||
Terminal. Refused if active children
|
||||
remain (ErrCAStillHasActiveChildren
|
||||
→ HTTP 409). OCSP keeps responding
|
||||
for already-issued leaves until expiry.
|
||||
end note
|
||||
```
|
||||
|
||||
Drain-first semantics: a CA in `retiring` state cannot terminalize to
|
||||
`retired` while it still has active children. The service layer
|
||||
returns `ErrCAStillHasActiveChildren`; the API surfaces HTTP 409. Drain
|
||||
the children first.
|
||||
|
||||
## Common deployment patterns
|
||||
|
||||
### Pattern A — 4-level boundary CA
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["Acme Root CA<br/>path_len=3<br/>offline air-gapped"]
|
||||
Policy["Acme Policy CA<br/>path_len=2<br/>boundary"]
|
||||
IssA["Acme Issuing A<br/>path_len=0<br/>prod workload leaves"]
|
||||
IssB["Acme Issuing B<br/>path_len=0<br/>ephemeral pod identity"]
|
||||
Root --> Policy --> IssA --> IssB
|
||||
```
|
||||
|
||||
Operator workflow:
|
||||
|
||||
1. Mint the root cert+key on the offline workstation. Move the cert
|
||||
PEM (no key) to the online operator workstation.
|
||||
2. `POST /api/v1/issuers/{id}/intermediates` with the empty
|
||||
`parent_ca_id` and `root_cert_pem` + `key_driver_id` populated
|
||||
(the operator pre-positions the root key file at the path the
|
||||
`key_driver_id` points to). The service validates RFC 5280 §3.2
|
||||
self-signed semantics + cross-checks the operator-supplied key
|
||||
matches the cert (rejects mismatched bundles at registration time
|
||||
with `ErrCAKeyMismatch`).
|
||||
3. `POST /api/v1/issuers/{id}/intermediates` with `parent_ca_id`
|
||||
pointing at the root for the Policy CA. The service generates the
|
||||
child key via `signer.Driver.Generate`, signs the child cert via
|
||||
the parent's signer (loaded from the parent's `key_driver_id`),
|
||||
and persists the new row with the next `path_len` value (parent's
|
||||
- 1 if unset). Repeat for each lower level.
|
||||
4. Set `Issuer.HierarchyMode = "tree"` on the issuer row + set the
|
||||
`treeIssuingCAID` connector field to point at the deepest CA
|
||||
(Acme Issuing B in the example above) — issued leaves chain via
|
||||
`AssembleChain` from B up to the root.
|
||||
|
||||
### Pattern B — 3-level financial-services policy CA
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["FinCo Root CA<br/>path_len=2"]
|
||||
Pol["FinCo Trading Policy CA<br/>path_len=1<br/>permitted DNS = trading.finco.example"]
|
||||
Iss["FinCo Trading Issuing CA<br/>path_len=0"]
|
||||
Root --> Pol --> Iss
|
||||
```
|
||||
|
||||
Per business-unit name constraints: each policy CA carries a
|
||||
`PermittedDNSDomains` list scoped to the business unit (RFC 5280
|
||||
§4.2.1.10). The service enforces subset semantics — a child policy CA
|
||||
cannot widen the parent's permitted set, and cannot remove an
|
||||
excluded subtree. Operators submit `name_constraints` on the
|
||||
`POST /api/v1/issuers/{id}/intermediates` body.
|
||||
|
||||
### Pattern C — 2-level internal PKI
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["Internal Root CA<br/>path_len=0"]
|
||||
Iss["Internal Issuing CA<br/>path_len=0<br/>issues leaves directly"]
|
||||
Root --> Iss
|
||||
```
|
||||
|
||||
The simplest tree-mode deployment. Roughly equivalent to single mode
|
||||
in terms of operator overhead, but provides one extra layer of
|
||||
indirection so the root key can stay offline while only the issuing
|
||||
CA's key sits on the certctl host.
|
||||
|
||||
## RFC 5280 enforcement
|
||||
|
||||
All enforcement happens at the service layer. The local connector
|
||||
trusts the service's contract; the API layer translates errors to
|
||||
HTTP codes.
|
||||
|
||||
- §3.2 self-signed root validation: `cert.CheckSignatureFrom(cert)` +
|
||||
subject == issuer DN. Rejected with `ErrCANotSelfSigned` →
|
||||
HTTP 400.
|
||||
- §4.2.1.9 path-length tightening: child's `PathLenConstraint` must
|
||||
be strictly less than parent's. Default to `parent - 1` when unset.
|
||||
Rejected with `ErrPathLenExceeded` → HTTP 400.
|
||||
- §4.2.1.10 NameConstraints subset: child's `Permitted` set must be a
|
||||
subset of parent's; child's `Excluded` set must be a superset of
|
||||
parent's. Rejected with `ErrNameConstraintExceeded` → HTTP 400.
|
||||
- §4.1.2.5 validity capping: child's `notAfter` capped to parent's
|
||||
`notAfter` automatically (chain breaks at parent's expiry
|
||||
regardless).
|
||||
|
||||
## Migrating a single-mode issuer to tree mode
|
||||
|
||||
Pre-flight: the load-bearing pin
|
||||
`TestLocal_HierarchyMode_SingleVsTree_ByteIdentical` guarantees that
|
||||
a 1-level tree wired around the same on-disk root cert+key produces
|
||||
byte-identical issuance bundles to single mode. Migration is therefore
|
||||
a no-downtime operation if done carefully:
|
||||
|
||||
1. Register the existing single-mode CA cert as an `intermediate_cas`
|
||||
row via `CreateRoot` (with the existing on-disk key referenced as
|
||||
`key_driver_id`).
|
||||
2. Update the issuer row's `hierarchy_mode` to `"tree"` and set the
|
||||
connector's `SetTreeIssuingCAID` to the new row's ID. Restart the
|
||||
server (no new code path activates until the connector reads the
|
||||
updated mode at boot).
|
||||
3. Issue a test cert. The byte-equivalence pin guarantees the wire
|
||||
bytes match the pre-migration output for a 1-level tree.
|
||||
4. Build out the child CAs via `CreateChild` calls. Update
|
||||
`treeIssuingCAID` to the new leaf CA. Test, then ramp.
|
||||
|
||||
If the pin breaks during migration, abort: roll back the
|
||||
`hierarchy_mode` flip and investigate. The byte-equivalence pin is
|
||||
the canary — if it goes red, deeper bugs lurk.
|
||||
|
||||
## API reference
|
||||
|
||||
All endpoints under `/api/v1/issuers/{id}/intermediates` and
|
||||
`/api/v1/intermediates/{id}` are admin-gated. Non-admin Bearer callers
|
||||
get HTTP 403.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/api/v1/issuers/{id}/intermediates` | Register root OR sign child (body discriminator) |
|
||||
| GET | `/api/v1/issuers/{id}/intermediates` | List flat hierarchy for issuer |
|
||||
| GET | `/api/v1/intermediates/{id}` | Single-row detail |
|
||||
| POST | `/api/v1/intermediates/{id}/retire` | Two-phase retirement |
|
||||
|
||||
See `api/openapi.yaml` for full request/response schemas.
|
||||
|
||||
## Observability
|
||||
|
||||
`IntermediateCAMetrics` ships counters dimensioned by `(issuer_id,
|
||||
kind)`:
|
||||
|
||||
- `create_root` — successful CreateRoot calls.
|
||||
- `create_child` — successful CreateChild calls.
|
||||
- `retire_retiring` — `active → retiring` transitions.
|
||||
- `retire_retired` — `retiring → retired` transitions.
|
||||
|
||||
The Prometheus exposer reads the snapshot via
|
||||
`SnapshotIntermediateCA()` from a single instance constructed in
|
||||
`cmd/server/main.go` (the snapshotter is the single source of truth
|
||||
between the service-side recording path and the metrics-side exposing
|
||||
path).
|
||||
|
||||
The audit table receives one row per CreateRoot / CreateChild /
|
||||
Retire transition, scoped to the actor extracted from the API
|
||||
request's auth context.
|
||||
|
||||
## Known limitations
|
||||
|
||||
The following are tracked in `WORKSPACE-ROADMAP.md` as Rank-8 follow-on
|
||||
work — none are required for the v2.1.0 acquisition gate:
|
||||
|
||||
- HSM-backed roots beyond `signer.FileDriver` (PKCS#11 / cloud KMS
|
||||
drivers).
|
||||
- Automated rotation: scheduled re-issuance of sub-CAs ahead of
|
||||
expiry with parallel-validity windows.
|
||||
- Intra-hierarchy CRL chaining: each non-leaf CA publishes a CRL
|
||||
covering its direct children's revocations.
|
||||
- NameConstraints policy templates: declarative templates an operator
|
||||
can pick from instead of hand-rolling the JSON.
|
||||
- D3 dendrogram visualization on the GUI page (today's render is a
|
||||
recursive `<ul>` nested list).
|
||||
@@ -1,12 +1,14 @@
|
||||
# MCP Server Guide
|
||||
|
||||
certctl ships with an MCP (Model Context Protocol) server that lets AI assistants manage your certificate infrastructure through natural language. Ask Claude to "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?" and the MCP server translates that into API calls against your certctl instance.
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This guide covers setup, configuration, and usage with Claude, Cursor, and other MCP-compatible tools.
|
||||
certctl ships with an MCP (Model Context Protocol) server that lets AI assistants manage your certificate infrastructure through natural language. Ask your MCP-compatible AI client to "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?" and the MCP server translates that into API calls against your certctl instance.
|
||||
|
||||
This guide covers setup, configuration, and usage with any MCP-compatible AI client.
|
||||
|
||||
## What Is MCP?
|
||||
|
||||
MCP is an open protocol that connects AI assistants to external tools and data sources. Instead of copying and pasting API responses into a chat window, MCP lets the AI call your tools directly. The certctl MCP server exposes all 78 API endpoints as MCP tools — the AI sees typed schemas describing what each tool does, what parameters it accepts, and what it returns.
|
||||
MCP is an open protocol that connects AI assistants to external tools and data sources. Instead of copying and pasting API responses into a chat window, MCP lets the AI call your tools directly. The certctl MCP server exposes the certctl API as MCP tools (re-derive count via `grep -cE 'mcp\.AddTool\(' internal/mcp/tools.go`) — the AI sees typed schemas describing what each tool does, what parameters it accepts, and what it returns.
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport. It's a stateless HTTP proxy: every MCP tool call becomes an HTTP request to the certctl REST API. No new state, no new database tables, no new attack surface beyond what the API already exposes.
|
||||
|
||||
@@ -14,9 +16,9 @@ The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via st
|
||||
|
||||
You need:
|
||||
|
||||
1. A running certctl server (see [Quick Start](quickstart.md))
|
||||
1. A running certctl server (see [Quick Start](../getting-started/quickstart.md))
|
||||
2. The MCP server binary — either built from source or from a Docker image
|
||||
3. An MCP-compatible AI client (Claude Desktop, Cursor, VS Code with Copilot, etc.)
|
||||
3. An MCP-compatible AI client
|
||||
|
||||
## Building the MCP Server
|
||||
|
||||
@@ -41,9 +43,9 @@ If your certctl server has auth enabled (the default), you must provide the API
|
||||
|
||||
Since v2.2 the certctl control plane is HTTPS-only. If the server cert is self-signed or chained to an internal CA, set `CERTCTL_SERVER_CA_BUNDLE_PATH` so the MCP server can verify the TLS handshake. Never set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true` outside local development — it disables all certificate validation.
|
||||
|
||||
## Setting Up with Claude Desktop
|
||||
## Configuring Your MCP Client
|
||||
|
||||
Add this to your Claude Desktop MCP configuration file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
|
||||
Most MCP clients accept a JSON config block of this shape. Consult your client's documentation for the exact config-file location.
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -60,66 +62,39 @@ Add this to your Claude Desktop MCP configuration file (`~/Library/Application S
|
||||
}
|
||||
```
|
||||
|
||||
Restart Claude Desktop. You should see "certctl" appear in the MCP tools list with 78 available tools.
|
||||
|
||||
## Setting Up with Cursor
|
||||
|
||||
In Cursor, go to Settings → MCP Servers and add:
|
||||
|
||||
```json
|
||||
{
|
||||
"certctl": {
|
||||
"command": "/path/to/certctl-mcp",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/certctl/deploy/test/certs/ca.crt",
|
||||
"CERTCTL_API_KEY": "your-api-key-here"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Setting Up with Claude Code
|
||||
|
||||
Add certctl as an MCP server in your project's `.mcp.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"certctl": {
|
||||
"command": "/path/to/certctl-mcp",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/certctl/deploy/test/certs/ca.crt",
|
||||
"CERTCTL_API_KEY": "your-api-key-here"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
After saving, restart your MCP client. You should see "certctl" appear in its tool list (the available-tools count varies by certctl version; the exact set is enumerated in `internal/mcp/tools.go`).
|
||||
|
||||
## Available Tools
|
||||
|
||||
The MCP server exposes the full REST API organized across 16 resource domains:
|
||||
The MCP server exposes the full REST API organized across 22 resource domains. Re-derive the live count via `grep -cE 'gomcp\.AddTool\(' internal/mcp/tools.go internal/mcp/tools_est.go` (the per-domain numbers below decay between releases — treat them as approximate at point of writing):
|
||||
|
||||
| Domain | Tools | Examples |
|
||||
|--------|-------|---------|
|
||||
| Certificates | 9 | List, get, create, update, archive, versions, renew, deploy, revoke |
|
||||
| CRL & OCSP | 3 | Get JSON CRL, get DER CRL by issuer, check OCSP status |
|
||||
| Certificates | 14 | List, get, create, update, archive, versions, renew, deploy, revoke, bulk-revoke / -renew / -reassign, claim/dismiss discovered |
|
||||
| CRL & OCSP | 2 | Get DER CRL by issuer, check OCSP status |
|
||||
| Issuers | 6 | List, get, create, update, delete, test connection |
|
||||
| Targets | 5 | List, get, create, update, delete |
|
||||
| Agents | 8 | List, get, register, heartbeat, CSR submit, certificate pickup, get work, report job status |
|
||||
| Jobs | 5 | List, get, cancel, approve, reject |
|
||||
| Agents | 9 | List, list retired, get, register, retire, heartbeat, get work, submit CSR, report job status |
|
||||
| Jobs | 5 | List, get, approve, reject, cancel |
|
||||
| Policies | 6 | List, get, create, update, delete, list violations |
|
||||
| Profiles | 5 | List, get, create, update, delete |
|
||||
| Teams | 5 | List, get, create, update, delete |
|
||||
| Owners | 5 | List, get, create, update, delete |
|
||||
| Agent Groups | 6 | List, get, create, update, delete, list members |
|
||||
| Audit | 2 | List events (with filters), get event by ID |
|
||||
| Notifications | 3 | List, get, mark as read |
|
||||
| Notifications | 4 | List, get, mark as read, requeue dead-letter |
|
||||
| Stats | 5 | Summary, certs by status, expiration timeline, job trends, issuance rate |
|
||||
| Metrics | 1 | System metrics (gauges, counters, uptime) |
|
||||
| Digest | 2 | Preview digest, send digest |
|
||||
| Health | 4 | Health check, readiness probe, auth info, auth check |
|
||||
| Approvals | 4 | List, get, approve, reject (issuance approval workflow) |
|
||||
| Health Checks | 8 | List, summary, get, create, update, delete, history, acknowledge |
|
||||
| Renewal Policies | 5 | List, get, create, update, delete |
|
||||
| Network Scan Targets | 6 | List, get, create, update, delete, trigger scan |
|
||||
| Discovery | 4 | List discovered certs, get, list scans, summary |
|
||||
| Intermediate CAs | 4 | List, create, get, retire (admin-gated) |
|
||||
| Verification | 3 | List cert deployments, verify job, get job verification |
|
||||
| EST | 6 | List/admin profiles, get cacerts, csrattrs, simpleenroll, simplereenroll |
|
||||
|
||||
Every tool has typed input parameters with `jsonschema` descriptions, so the AI knows exactly what arguments to provide and what each field means.
|
||||
|
||||
@@ -152,14 +127,14 @@ The AI calls `certctl_create_certificate` with the common name, team ID, and own
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
AI["AI Assistant\n(Claude, Cursor)"]
|
||||
AI["AI Assistant\n(any MCP client)"]
|
||||
MCP["certctl MCP\ncmd/mcp-server/"]
|
||||
SERVER["certctl Server\n:8443"]
|
||||
|
||||
AI <-->|"stdio"| MCP
|
||||
MCP -->|"HTTP + Bearer token"| SERVER
|
||||
|
||||
MCP ~~~ TOOLS["REST API via MCP · 16 domains\nTyped input structs"]
|
||||
MCP ~~~ TOOLS["REST API via MCP · 22 domains\nTyped input structs"]
|
||||
```
|
||||
|
||||
The MCP server is intentionally thin:
|
||||
+4
-2
@@ -1,5 +1,7 @@
|
||||
# ACME Server — Threat Model
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Security posture for the certctl ACME server endpoint
|
||||
(`/acme/profile/<id>/*`). Read this before opening a PR that changes
|
||||
the JWS verifier, the challenge validators, the rate limiter, or the
|
||||
@@ -135,7 +137,7 @@ multicast, IPv4-mapped-IPv6 to a reserved IPv4. See
|
||||
CodeQL alert #23 flags `client.Do(req)` in the SCEP-probe call site
|
||||
as `go/request-forgery` despite the dial-time guard; the analyzer
|
||||
can't trace through a custom `Transport.DialContext`. Operator-
|
||||
acknowledged false positive (CLAUDE.md task #10) — see the SCEP
|
||||
acknowledged false positive (tracked internally) — see the SCEP
|
||||
probe's same-shaped defense for the audit trail.
|
||||
|
||||
## DNS-01 cache poisoning posture
|
||||
@@ -268,7 +270,7 @@ Documented to set scope expectations for security reviewers:
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — operator-facing reference.
|
||||
- [`docs/tls.md`](./tls.md) — TLS posture, including the L-001
|
||||
- [`docs/tls.md`](../../operator/tls.md) — TLS posture, including the L-001
|
||||
table of `InsecureSkipVerify` justifications (TLS-ALPN-01 row).
|
||||
- [`internal/api/acme/jws.go`](../internal/api/acme/jws.go) — verifier
|
||||
source.
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl ACME Server (Built-in)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl ships an RFC 8555 + RFC 9773 ARI ACME server endpoint at
|
||||
`/acme/profile/<profile-id>/*`. Any RFC 8555 client (cert-manager 1.15+,
|
||||
Caddy, Traefik, win-acme, certbot, Posh-ACME) can integrate with certctl
|
||||
@@ -10,9 +12,9 @@ external PKI vendors today.
|
||||
> **Phase status (2026-05-03):** Phase 6 — full operator-facing
|
||||
> reference. The functional surface is complete (Phases 1a-5); this
|
||||
> doc is the canonical procurement-readability reference. New: client-
|
||||
> walkthrough docs for [cert-manager](./acme-cert-manager-walkthrough.md),
|
||||
> [Caddy](./acme-caddy-walkthrough.md), and
|
||||
> [Traefik](./acme-traefik-walkthrough.md); a dedicated
|
||||
> walkthrough docs for [cert-manager](../../migration/acme-from-cert-manager.md),
|
||||
> [Caddy](../../migration/acme-from-caddy.md), and
|
||||
> [Traefik](../../migration/acme-from-traefik.md); a dedicated
|
||||
> [threat model](./acme-server-threat-model.md); a section-by-section
|
||||
> RFC 8555 + RFC 9773 conformance statement; a 5-failure-mode
|
||||
> troubleshooting playbook; a tested-clients version pinning table.
|
||||
@@ -73,7 +75,7 @@ profile rows retain whatever value they were created with.
|
||||
|
||||
When certctl-server uses a self-signed TLS bootstrap cert
|
||||
(`deploy/test/certs/server.crt` is the demo default; see
|
||||
[`docs/tls.md`](./tls.md)), cert-manager 1.15+ will refuse to talk to
|
||||
[`docs/tls.md`](../../operator/tls.md)), cert-manager 1.15+ will refuse to talk to
|
||||
the directory URL unless the certctl root is trusted. The fix lives in
|
||||
`ClusterIssuer.spec.acme.caBundle`:
|
||||
|
||||
@@ -598,7 +600,7 @@ Yes. The endpoints are HTTPS over the certctl-server's listener (port
|
||||
Posh-ACME on a Mac all integrate against
|
||||
`https://<certctl-server>:8443/acme/profile/<profile-id>/directory`.
|
||||
The TLS-trust-bootstrap requirement applies the same way — see the
|
||||
[Caddy walkthrough](./acme-caddy-walkthrough.md) for the OS-trust-store
|
||||
[Caddy walkthrough](../../migration/acme-from-caddy.md) for the OS-trust-store
|
||||
recipe.
|
||||
|
||||
### How do I migrate manually-issued certs to ACME-issued ones?
|
||||
@@ -607,7 +609,7 @@ Not yet automatic. Operators migrating: keep the old `managed_certificates`
|
||||
rows; create new ones via the ACME flow; flip targets one by one. A
|
||||
dedicated bulk-migration tool is on the roadmap (post-2.1.0). Track
|
||||
via the master prompt's roadmap section in
|
||||
`cowork/acme-server-endpoint-prompt.md`.
|
||||
the project's acme-server-endpoint spec.
|
||||
|
||||
### What audit-log events fire on each ACME operation?
|
||||
|
||||
@@ -638,9 +640,9 @@ Read before writing a security review.
|
||||
|
||||
## See also
|
||||
|
||||
- [cert-manager integration walkthrough](./acme-cert-manager-walkthrough.md)
|
||||
- [Caddy integration walkthrough](./acme-caddy-walkthrough.md)
|
||||
- [Traefik integration walkthrough](./acme-traefik-walkthrough.md)
|
||||
- [cert-manager integration walkthrough](../../migration/acme-from-cert-manager.md)
|
||||
- [Caddy integration walkthrough](../../migration/acme-from-caddy.md)
|
||||
- [Traefik integration walkthrough](../../migration/acme-from-traefik.md)
|
||||
- [Threat model](./acme-server-threat-model.md)
|
||||
- [TLS trust bootstrap reference](./tls.md)
|
||||
- [Architecture (control-plane)](./architecture.md)
|
||||
- [TLS trust bootstrap reference](../../operator/tls.md)
|
||||
- [Architecture (control-plane)](../architecture.md)
|
||||
@@ -1,5 +1,7 @@
|
||||
# Async-CA Polling — Operator Reference
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Closes audit fix #5 from the 2026-05-01 issuer-coverage acquisition-readiness audit.
|
||||
|
||||
## What this is
|
||||
@@ -114,5 +116,5 @@ enrollments per tick.
|
||||
|
||||
## Audit blocker reference
|
||||
|
||||
cowork/issuer-coverage-audit-2026-05-01/RESULTS.md, Top-10 fix #5
|
||||
the 2026-05-01 issuer coverage audit, Top-10 fix #5
|
||||
(Part 1.5 finding #4: "No polling backoff for async CAs").
|
||||
@@ -1,5 +1,7 @@
|
||||
# CRL & OCSP — Revocation Status for Relying Parties
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This guide is the operator + relying-party reference for certctl's revocation
|
||||
status surfaces. It covers the wire format, endpoint URLs, configuration knobs,
|
||||
the OCSP responder cert lifecycle, and how to point common consumers
|
||||
@@ -1,5 +1,7 @@
|
||||
# EST (RFC 7030) — Operator Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Status (this document):** EST RFC 7030 hardening master bundle Phases
|
||||
> 1–11 shipped on `master`; this guide is the Phase-12 deliverable
|
||||
> against the bundle. Every behavior described here is exercised by the
|
||||
@@ -7,7 +9,7 @@
|
||||
> `internal/service/est*_test.go`, and (for the libest interop layer)
|
||||
> `deploy/test/est_e2e_test.go` under `//go:build integration`. The
|
||||
> bundle is **V2-free**; per-tenant CA isolation, Conditional-Access
|
||||
> compliance gating, and EST cert-bound usage analytics are documented
|
||||
> device-state gating, and EST cert-bound usage analytics are documented
|
||||
> as V3-Pro deferrals in [V3-Pro deferrals](#v3-pro-deferrals).
|
||||
|
||||
## Contents
|
||||
@@ -502,7 +504,7 @@ arbitrary).
|
||||
EST signs certs using whatever issuer connector the profile binds.
|
||||
The `internal/crypto/signer/` interface (post-2026-04-28) means a
|
||||
future HSM/PKCS#11 driver bundle (parking-lot at
|
||||
`cowork/hsm-pkcs11-driver-prompt.md`) plugs in transparently — the
|
||||
planned) plugs in transparently — the
|
||||
EST handler doesn't change. EST-issued certs benefit from HSM-backed
|
||||
signing automatically once the HSM bundle ships and the operator
|
||||
swaps the local issuer's `FileDriver` for a `PKCS11Driver`.
|
||||
@@ -708,10 +710,10 @@ These capabilities are deferred to V3-Pro (paid tier). They're not
|
||||
oversights — they're the natural follow-on bundles after v2.X.0 GA:
|
||||
|
||||
- **Conditional Access / device-posture gating.** The per-profile
|
||||
ESTService exposes a nil-default compliance-hook seam (mirrors the
|
||||
SCEP/Intune `ComplianceCheck` pattern). V3-Pro plugs in a
|
||||
ESTService exposes a nil-default device-state hook seam (mirrors
|
||||
the SCEP/Intune `DeviceStateCheck` pattern). V3-Pro plugs in a
|
||||
Microsoft Graph or other posture-check callback before issuance;
|
||||
non-compliant devices fail with a typed `est_compliance_failed`
|
||||
failing devices fail with a typed `est_device_state_failed`
|
||||
reason.
|
||||
- **Multi-tenant CA isolation.** V2 has one trust anchor pool per
|
||||
EST profile and one issuer binding. V3-Pro ships per-tenant root
|
||||
@@ -1,12 +1,14 @@
|
||||
# Microsoft Intune SCEP enrollment via certctl
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Status (this document):** Phase 11 of the SCEP RFC 8894 + Intune master
|
||||
> bundle. The behavior described here is shipped on `master` and exercised
|
||||
> end-to-end by `internal/api/handler/scep_intune_e2e_test.go`. The
|
||||
> bundle is V2-free (community edition) — Conditional-Access compliance
|
||||
> gating, native Microsoft Graph integration, and per-tenant trust
|
||||
> anchors are documented under [Limitations](#limitations) as V3-Pro
|
||||
> features.
|
||||
> bundle is V2-free (community edition) — Conditional-Access
|
||||
> device-state gating, native Microsoft Graph integration, and
|
||||
> per-tenant trust anchors are documented under
|
||||
> [Limitations](#limitations) as V3-Pro features.
|
||||
|
||||
## TL;DR
|
||||
|
||||
@@ -99,9 +101,10 @@ PKIMessage with the documented `pkiStatus`/`failInfo` codes (per RFC
|
||||
issuing many DIFFERENT valid challenges for the same device. Default
|
||||
3 enrollments per 24h covers legitimate first-cert + recovery +
|
||||
post-wipe.
|
||||
9. **Optional compliance check** — V3-Pro plug-in seam (nil-default
|
||||
no-op). When set, the gate calls Microsoft Graph's compliance API
|
||||
and short-circuits non-compliant devices with FAILURE+BadRequest.
|
||||
9. **Optional device-state check** — V3-Pro plug-in seam
|
||||
(nil-default no-op). When set, the gate calls Microsoft Graph's
|
||||
device-compliance API and short-circuits failing devices with
|
||||
FAILURE+BadRequest.
|
||||
|
||||
A request that passes all nine gates flows to
|
||||
`processEnrollment`, which builds the issuance request, calls the
|
||||
@@ -243,7 +246,7 @@ common root cause and the operator action.
|
||||
| `rate_limited` | A specific device hitting `429`-equivalent failures | The device exceeded `INTUNE_PER_DEVICE_RATE_LIMIT_24H` (default 3). If legitimate (post-wipe + recovery + first-cert all in 24h), bump the cap. If suspicious, this is the limiter doing its job — investigate the device. |
|
||||
| `unknown_version` | Sudden onset of failures across the entire fleet | Microsoft shipped a new Connector version with a `version` claim certctl doesn't understand. Open an issue on the certctl repo with the failing claim payload (anonymized); the parser dispatcher accepts new versions in ~30 LoC. |
|
||||
| `malformed` | Sporadic, low-volume | Malformed challenge bytes — almost always a network proxy mangling the request body, or the Connector logging itself out mid-handshake. Capture a packet trace; the Connector should re-emit on the next device retry. |
|
||||
| `compliance_failed` | V3-Pro only | The pluggable compliance check returned non-compliant. The audit-log details carries the reason string from Microsoft Graph. V2 deployments never see this counter tick. |
|
||||
| `device_state_failed` | V3-Pro only | The pluggable device-state check rejected the device. The audit-log details carries the reason string from Microsoft Graph. V2 deployments never see this counter tick. |
|
||||
|
||||
## Operational monitoring (SCEP Administration → Intune Monitoring tab)
|
||||
|
||||
@@ -325,10 +328,10 @@ V3-Pro:
|
||||
directly — the Connector already did that. V3-Pro could ship a
|
||||
Graph client that pulls device-compliance state in addition to
|
||||
the challenge claim.
|
||||
- **Conditional Access compliance gating.** The dispatcher exposes a
|
||||
nil-default `ComplianceCheck` hook. V3-Pro plugs in a Microsoft
|
||||
Graph compliance lookup before issuance; non-compliant devices
|
||||
fail with a typed `compliance_failed` failInfo.
|
||||
- **Conditional Access device-state gating.** The dispatcher exposes
|
||||
a nil-default `DeviceStateCheck` hook. V3-Pro plugs in a Microsoft
|
||||
Graph device-compliance lookup before issuance; failing devices
|
||||
exit with a typed `device_state_failed` failInfo.
|
||||
- **Per-tenant trust anchors.** V2 has one trust anchor pool per
|
||||
SCEP profile; V3-Pro could support per-AAD-tenant anchor scoping
|
||||
for MSPs running shared certctl deployments across customers.
|
||||
@@ -371,10 +374,9 @@ the golden-file fixtures in `internal/scep/intune/testdata/`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`legacy-est-scep.md`](legacy-est-scep.md) — the per-profile SCEP
|
||||
setup guide + RFC 8894 reference + mTLS sibling route. Read this
|
||||
first if you're not already running certctl SCEP for non-Intune
|
||||
fleets.
|
||||
- [`scep-server.md`](scep-server.md) — the per-profile SCEP setup
|
||||
guide + RFC 8894 reference + mTLS sibling route. Read this first
|
||||
if you're not already running certctl SCEP for non-Intune fleets.
|
||||
- [`architecture.md`](architecture.md) — overall control-plane
|
||||
architecture; Security Model section calls out the Intune trust
|
||||
anchor as a sensitive operator-configured surface.
|
||||
@@ -1,222 +1,36 @@
|
||||
# Legacy EST / SCEP Clients — TLS 1.2 Reverse-Proxy Runbook
|
||||
# SCEP Server (RFC 8894) — Protocol Reference
|
||||
|
||||
**Audit reference:** Bundle F / M-023. PCI-DSS v4.0 Req 4 §2.2.5; CWE-326.
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl's control plane pins `tls.Config.MinVersion = tls.VersionTLS13`
|
||||
(`cmd/server/tls.go:131`). Some embedded EST (RFC 7030) and SCEP (RFC 8894)
|
||||
clients only speak TLS 1.0/1.1/1.2 — those clients cannot complete the
|
||||
handshake against certctl directly. This runbook documents the supported
|
||||
operator pattern: terminate the legacy TLS version at a front-door reverse
|
||||
proxy and pass the request through to certctl over TLS 1.3.
|
||||
## What this is
|
||||
|
||||
## Why TLS 1.3 minimum
|
||||
certctl ships a native RFC 8894 SCEP server. This reference covers the
|
||||
protocol surface: RA cert + key configuration, capability advertisement,
|
||||
supported messageTypes, multi-profile dispatch, must-staple policy, mTLS
|
||||
sibling routing, and Microsoft Intune dynamic-challenge dispatcher.
|
||||
|
||||
certctl's audit posture, the SOC 2 / PCI-DSS / NIST SP 800-57 compliance
|
||||
mappings, and the M-001 PBKDF2 work factor all assume modern transport
|
||||
crypto. TLS 1.2 with the cipher suites still in the wild has known
|
||||
attack surface (BEAST, POODLE, ROBOT, raccoon — all CVE-categorized);
|
||||
allowing TLS 1.2 directly on the certctl listener would invalidate the
|
||||
guarantee that the server-side encryption chain is the strongest the
|
||||
ecosystem currently supports.
|
||||
For Intune-specific deployment guidance (NDES replacement playbook,
|
||||
Intune SCEP profile field mapping, troubleshooting matrix specific to
|
||||
Intune deployments, Microsoft support statement), see
|
||||
[`scep-intune.md`](scep-intune.md). For the legacy-client TLS 1.2
|
||||
reverse-proxy runbook, see
|
||||
[`docs/operator/legacy-clients-tls-1.2.md`](../../operator/legacy-clients-tls-1.2.md).
|
||||
|
||||
## When this runbook applies
|
||||
## How it works
|
||||
|
||||
You need this if **all three** are true:
|
||||
|
||||
1. You operate certctl with EST or SCEP enabled (`CERTCTL_EST_ENABLED=true`
|
||||
or `CERTCTL_SCEP_ENABLED=true`).
|
||||
2. Your enrolling clients are embedded devices (printers, network
|
||||
appliances, IoT boards, legacy MFPs, point-of-sale terminals) whose TLS
|
||||
stack pre-dates 2018 and only speaks TLS 1.2 or older.
|
||||
3. Replacing those clients is not feasible on a 6-month horizon.
|
||||
|
||||
If your enrolling clients are modern (any current Linux/Windows/macOS
|
||||
host, anything Go-based, anything Rust/Python/Node from 2019 onward),
|
||||
they speak TLS 1.3 natively and this runbook is unnecessary — point them
|
||||
straight at certctl on `:8443`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
Client["legacy EST/SCEP client"]
|
||||
Proxy["nginx / HAProxy<br/>reverse proxy"]
|
||||
Server["certctl :8443"]
|
||||
Client -->|"TLS 1.2/1.3<br/>(allowed TLS 1.2)"| Proxy
|
||||
Proxy -->|"TLS 1.3<br/>(re-encrypts as TLS 1.3)"| Server
|
||||
```
|
||||
|
||||
The reverse proxy:
|
||||
|
||||
- Terminates the legacy-version TLS handshake on the public-facing port.
|
||||
- Forwards the request to certctl over TLS 1.3 on a private network.
|
||||
- (For EST mTLS) forwards the client certificate via an
|
||||
`X-SSL-Client-Cert` header that certctl reads only when the connection
|
||||
arrives from a configured-trusted source IP.
|
||||
|
||||
## nginx config
|
||||
|
||||
```nginx
|
||||
upstream certctl_backend {
|
||||
# Private-network address; not reachable from outside the proxy host.
|
||||
server 10.0.0.10:8443;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name est.example.com;
|
||||
|
||||
# Public-facing legacy listener. ssl_protocols includes TLSv1.2 explicitly.
|
||||
# Keep ssl_ciphers conservative — only the strong AEAD suites that
|
||||
# PCI-DSS Req 4 §2.2.5 still allows under TLS 1.2.
|
||||
ssl_certificate /etc/nginx/certs/est.example.com.fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/est.example.com.key;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
|
||||
ssl_prefer_server_ciphers on;
|
||||
|
||||
# mTLS for EST: optional client cert, verified against the EST CA.
|
||||
ssl_client_certificate /etc/nginx/certs/est-clients-ca.pem;
|
||||
ssl_verify_client optional;
|
||||
|
||||
location ~ ^/\.well-known/(est|pki) {
|
||||
# Forward the client cert (if presented) to certctl over the
|
||||
# private hop. The current certctl implementation IGNORES the
|
||||
# X-SSL-Client-Cert header (header-agnostic by default — see
|
||||
# the certctl-side configuration section below). EST/SCEP
|
||||
# authentication still works correctly because both protocols
|
||||
# carry their own auth (CSR signature for EST, challengePassword
|
||||
# for SCEP) inside the request body.
|
||||
proxy_set_header X-SSL-Client-Cert $ssl_client_escaped_cert;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# The proxy-to-certctl hop is itself TLS 1.3.
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
|
||||
# SCEP endpoints — same pattern, no client-cert requirement
|
||||
# (SCEP authenticates via challengePassword inside the CSR).
|
||||
location ^~ /scep {
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## HAProxy config (alternative)
|
||||
|
||||
```
|
||||
frontend est_legacy
|
||||
bind *:443 ssl crt /etc/haproxy/certs/est.example.com.pem alpn h2,http/1.1 \
|
||||
ssl-min-ver TLSv1.2 \
|
||||
ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384
|
||||
|
||||
acl is_est_path path_beg /.well-known/est
|
||||
acl is_pki_path path_beg /.well-known/pki
|
||||
acl is_scep_path path_beg /scep
|
||||
use_backend certctl_backend if is_est_path or is_pki_path or is_scep_path
|
||||
default_backend certctl_modern
|
||||
|
||||
backend certctl_backend
|
||||
server certctl 10.0.0.10:8443 ssl verify required \
|
||||
ca-file /etc/haproxy/certs/certctl-internal-ca.pem \
|
||||
ssl-min-ver TLSv1.3
|
||||
http-request set-header X-Forwarded-For %[src]
|
||||
http-request set-header X-Forwarded-Proto https
|
||||
```
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
The current implementation is **header-agnostic**: certctl ignores any
|
||||
`X-SSL-Client-Cert` / `X-Forwarded-For` headers from the proxy. EST
|
||||
authentication still happens via in-protocol CSR signature + profile
|
||||
policy (RFC 7030 §3.2.3); SCEP authentication still happens via the
|
||||
`challengePassword` attribute embedded in the CSR (RFC 8894 §3.2). Both
|
||||
mechanisms are inside the request body and survive the reverse-proxy
|
||||
hop without server-side header trust.
|
||||
|
||||
**Why this is the correct default:** trusting a proxy-supplied header
|
||||
for client identity opens a header-spoofing attack surface that requires
|
||||
careful design (CIDR allowlist of trusted proxies, fail-closed defaults,
|
||||
explicit operator opt-in). The Bundle F closure of M-023 ships the
|
||||
TLS-bridge guidance as documentation only; a future commit can extend
|
||||
certctl with proxy-header trust if and when an operator demonstrates a
|
||||
deployment shape that requires it. Until that lands, the runbook above
|
||||
is operationally complete: legacy EST and SCEP clients continue to
|
||||
authenticate via their in-protocol mechanisms, and the reverse proxy is
|
||||
purely a TLS-version bridge.
|
||||
|
||||
If your deployment requires proxy-supplied client identity (e.g., the
|
||||
proxy terminates mTLS and you want certctl to record the client-cert
|
||||
subject in the audit trail beyond what the CSR carries), open an issue
|
||||
and a future commit will add a header-trust contract behind two
|
||||
fail-closed env vars: a CIDR allowlist of trusted proxies, plus an
|
||||
explicit opt-in toggle. Both knobs would be required together; setting
|
||||
only one would fail loud at startup. Until that work ships, the
|
||||
header-agnostic default described above is the only supported
|
||||
configuration.
|
||||
|
||||
## PCI-DSS Req 4 §2.2.5 attestation
|
||||
|
||||
PCI-DSS v4.0 §2.2.5 ("strong cryptography for authentication/transmission
|
||||
of cardholder data") considers TLS 1.2 with strong cipher suites
|
||||
acceptable for the foreseeable future, with the explicit caveat that NIST
|
||||
or the PCI Council may shorten the deprecation window if a TLS 1.2
|
||||
weakness is published. The configuration above:
|
||||
|
||||
- Pins TLS 1.2 + TLS 1.3 only (no SSLv3, TLS 1.0, TLS 1.1).
|
||||
- Uses only AEAD cipher suites with forward secrecy (ECDHE-* with GCM or
|
||||
ChaCha20-Poly1305).
|
||||
- Re-encrypts to TLS 1.3 on the proxy-to-certctl hop.
|
||||
|
||||
This is PCI-DSS Req 4 v4.0 compliant. Auditors looking for the
|
||||
attestation should be pointed at this section + the proxy's TLS config.
|
||||
|
||||
## What this runbook does NOT cover
|
||||
|
||||
- **Replacing the legacy clients.** That's the long-term fix; this
|
||||
runbook is the bridge while you're migrating.
|
||||
- **Network segmentation.** The reverse proxy assumes the proxy-to-certctl
|
||||
hop is on a network that an external attacker can't reach. If it's
|
||||
not, you need a deeper architecture review.
|
||||
- **Client-cert revocation.** EST mTLS revocation is the relying party's
|
||||
responsibility. certctl's EST handler accepts the cert; the proxy can
|
||||
enforce CRL/OCSP via `ssl_crl_path` (nginx) or `crl-file` (HAProxy).
|
||||
|
||||
## When TLS 1.2 itself sunsets
|
||||
|
||||
PCI-DSS, NIST, and major browsers will eventually deprecate TLS 1.2.
|
||||
When that happens, this runbook becomes obsolete; the only path forward
|
||||
will be to replace the legacy clients. Subscribe to RSS feeds at the
|
||||
following sources to catch the deprecation announcement before it
|
||||
becomes a compliance failure:
|
||||
|
||||
- https://www.pcisecuritystandards.org/news_events/
|
||||
- https://nvlpubs.nist.gov/nistpubs/SpecialPublications/ (SP 800-52 revisions)
|
||||
|
||||
## SCEP RFC 8894 native implementation (post-2026-04-29)
|
||||
|
||||
Prior to this bundle, certctl's SCEP server parsed `PKCS#7 SignedData` and
|
||||
treated the encapsulated content as a raw `PKCS#10 CSR` (the file-internal
|
||||
"MVP" comment at `internal/api/handler/scep.go:217` flagged this). That
|
||||
worked for lightweight MDM agents but failed against ChromeOS and most
|
||||
production MDM clients which expect full RFC 8894 wire format:
|
||||
`SignedData` wrapping an `EnvelopedData` encrypting the CSR to the RA
|
||||
cert's public key, with `signerInfo` POPO over the auth-attrs.
|
||||
Prior to the RFC 8894 native implementation, certctl's SCEP server parsed
|
||||
`PKCS#7 SignedData` and treated the encapsulated content as a raw
|
||||
`PKCS#10 CSR` (the file-internal "MVP" path). That worked for lightweight
|
||||
MDM agents but failed against ChromeOS and most production MDM clients
|
||||
which expect full RFC 8894 wire format: `SignedData` wrapping an
|
||||
`EnvelopedData` encrypting the CSR to the RA cert's public key, with
|
||||
`signerInfo` POPO over the auth-attrs.
|
||||
|
||||
The new RFC 8894 path runs FIRST; on any parse failure it falls through
|
||||
to the legacy MVP raw-CSR path so existing operators see no behavior
|
||||
change for their lightweight clients.
|
||||
|
||||
### Required: RA cert + key
|
||||
## Required: RA cert + key
|
||||
|
||||
The RFC 8894 path requires a Registration Authority cert + key pair.
|
||||
Clients encrypt their CSR to the RA cert's public key (RFC 8894 §3.2.2);
|
||||
@@ -255,7 +69,7 @@ validates: file existence, key file mode 0600, cert/key match, cert
|
||||
non-expired, RSA-or-ECDSA public-key algorithm. Failures `os.Exit(1)`
|
||||
with a structured log line identifying the offending profile.
|
||||
|
||||
### Capability advertisement (`GetCACaps`)
|
||||
## Capability advertisement (`GetCACaps`)
|
||||
|
||||
```
|
||||
POSTPKIOperation
|
||||
@@ -272,7 +86,7 @@ ChromeOS specifically looks for `POSTPKIOperation` (non-base64 POST),
|
||||
Older Cisco IOS clients also accept `SHA-256` and `SHA-512` per RFC 8894
|
||||
§3.5.2.
|
||||
|
||||
### Supported messageTypes
|
||||
## Supported messageTypes
|
||||
|
||||
| Type | RFC 8894 § | Behavior |
|
||||
| --- | --- | --- |
|
||||
@@ -281,7 +95,7 @@ Older Cisco IOS clients also accept `SHA-256` and `SHA-512` per RFC 8894
|
||||
| `GetCertInitial` (20) | §3.3.3 | Polling for pending requests. v1 returns `FAILURE+badCertID` because deferred-issuance isn't supported (every PKCSReq either succeeds or fails synchronously). |
|
||||
| `CertRep` (3) | §3.3.2 | Server response — never inbound. |
|
||||
|
||||
### MVP backward-compatibility path
|
||||
## MVP backward-compatibility path
|
||||
|
||||
Lightweight clients that send a stripped `SignedData` containing a raw
|
||||
CSR (no `EnvelopedData` wrapper, no `signerInfo` POPO) keep working: the
|
||||
@@ -291,14 +105,13 @@ the CSR's `challengePassword` attribute the same way as the RFC 8894
|
||||
path. Operators with existing lightweight-client deploys see zero
|
||||
behavior change.
|
||||
|
||||
### Multi-profile dispatch (`/scep/<pathID>`)
|
||||
## Multi-profile dispatch (`/scep/<pathID>`)
|
||||
|
||||
Real enterprise deploys run multiple SCEP endpoints from one certctl
|
||||
instance — corp-laptop CA, IoT CA, server CA — each with its own
|
||||
issuer + RA pair + challenge password. Configure via the indexed env-var
|
||||
form documented in [`features.md`](features.md): set
|
||||
`CERTCTL_SCEP_PROFILES=corp,iot,server` (a comma-separated list of
|
||||
profile names), then for each name supply the per-profile env-vars
|
||||
form: set `CERTCTL_SCEP_PROFILES=corp,iot,server` (a comma-separated list
|
||||
of profile names), then for each name supply the per-profile env-vars
|
||||
prefixed with `CERTCTL_SCEP_PROFILE_<NAME>_` followed by the suffix
|
||||
keys `_ISSUER_ID`, `_PROFILE_ID`, `_CHALLENGE_PASSWORD`, `_RA_CERT_PATH`,
|
||||
`_RA_KEY_PATH`. The `<NAME>` token resolves to the upper-cased profile
|
||||
@@ -310,7 +123,7 @@ The router exposes `/scep/corp`, `/scep/iot`, `/scep/server`. The legacy
|
||||
`CERTCTL_SCEP_PROFILES` is unset). Per-profile preflight validates each
|
||||
RA pair independently; failures log the offending PathID.
|
||||
|
||||
### ChromeOS Admin Console pointer
|
||||
## ChromeOS Admin Console pointer
|
||||
|
||||
In Google Admin Console → Devices → Networks → Certificates, register
|
||||
certctl's `/scep[/<pathID>]` URL as the SCEP server. Enter the challenge
|
||||
@@ -319,7 +132,7 @@ password from `CERTCTL_SCEP_CHALLENGE_PASSWORD` (or per-profile
|
||||
`GetCACert` first to retrieve the RA cert, then enrolls via
|
||||
PKIOperation.
|
||||
|
||||
### RA cert rotation
|
||||
## RA cert rotation
|
||||
|
||||
The RA cert is loaded once at startup and persisted in the handler's
|
||||
struct field; rotation requires a server restart (mirrors the
|
||||
@@ -328,7 +141,7 @@ recommended cadence is annual rotation with a 30-day overlap during
|
||||
which both old + new RA certs are listed in `GetCACert`'s response (set
|
||||
the cert chain accordingly in your sub-CA hierarchy).
|
||||
|
||||
### Must-staple per-profile policy (RFC 7633)
|
||||
## Must-staple per-profile policy (RFC 7633)
|
||||
|
||||
When a `CertificateProfile` has `MustStaple = true`, the local issuer
|
||||
adds the `id-pe-tlsfeature` extension (OID `1.3.6.1.5.5.7.1.24`,
|
||||
@@ -347,7 +160,7 @@ Recommended for: Intune-deployed device certs (modern TLS clients);
|
||||
SCEP profiles serving general / legacy clients (ChromeOS, IoT) should
|
||||
stay `false` until the TLS path is verified.
|
||||
|
||||
### mTLS sibling route (Phase 6.5, opt-in)
|
||||
## mTLS sibling route (Phase 6.5, opt-in)
|
||||
|
||||
SCEP is documented as application-layer-auth — the challenge password
|
||||
is the authentication boundary per RFC 8894 §3.2. But enterprise
|
||||
@@ -408,7 +221,7 @@ challenge+mTLS:
|
||||
|
||||
1. Generate a bootstrap CA + issue a bootstrap cert per device (out
|
||||
of band — typically manufacturing-time, MDM-pushed, or a separate
|
||||
PKI flow).
|
||||
PKI flow).
|
||||
2. Distribute the trust bundle to certctl as the
|
||||
`_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH`.
|
||||
3. Set `_MTLS_ENABLED=true` for the profile, restart certctl.
|
||||
@@ -421,7 +234,7 @@ challenge+mTLS:
|
||||
the password requirement doesn't go away — the password is still
|
||||
the application-layer auth boundary).
|
||||
|
||||
### Microsoft Intune dynamic-challenge dispatcher (Phase 8, opt-in)
|
||||
## Microsoft Intune dynamic-challenge dispatcher (Phase 8, opt-in)
|
||||
|
||||
When SCEP sits behind the Microsoft Intune Certificate Connector, devices
|
||||
present an Intune-issued signed challenge (a JWT-like blob over a JSON
|
||||
@@ -488,7 +301,7 @@ the dispatcher routes Intune-shaped challenges (length > 200 + exactly
|
||||
two dots) to the validator and falls through to the static compare
|
||||
otherwise.
|
||||
|
||||
### Operational notes
|
||||
## Operational notes
|
||||
|
||||
- **Audit:** every enrollment emits an `audit_event` row with action
|
||||
`scep_pkcsreq` (initial) or `scep_renewalreq` (renewal); operators
|
||||
@@ -498,8 +311,10 @@ otherwise.
|
||||
bodies at `CERTCTL_MAX_BODY_SIZE` (default 1MB); SCEP PKIMessages are
|
||||
typically <50KB so the default cap is generous.
|
||||
- **HTTPS-only:** the SCEP endpoint inherits the TLS-1.3-pinned control
|
||||
plane; there is no plaintext fallback.
|
||||
- **For Microsoft Intune deployments, see [`scep-intune.md`](scep-intune.md)** —
|
||||
plane; there is no plaintext fallback. Legacy clients that only speak
|
||||
TLS 1.2 use the reverse-proxy bridge documented at
|
||||
[`docs/operator/legacy-clients-tls-1.2.md`](../../operator/legacy-clients-tls-1.2.md).
|
||||
- **For Microsoft Intune deployments,** see [`scep-intune.md`](scep-intune.md) —
|
||||
architecture, NDES-replacement migration playbook, Intune SCEP profile
|
||||
field mapping, trust-anchor extraction recipe, troubleshooting matrix,
|
||||
operational monitoring, V3-Pro deferrals, and the Microsoft support
|
||||
@@ -508,12 +323,12 @@ otherwise.
|
||||
mTLS sibling-route status, challenge-password-set indicator, and
|
||||
the full SCEP audit log filter), the admin GUI page lives at `/scep`
|
||||
with three tabs: **Profiles** (default), **Intune Monitoring**,
|
||||
**Recent Activity**. See `scep-intune.md::Operational monitoring`
|
||||
for the Intune-specific tab inside it.
|
||||
**Recent Activity**. See the operational-monitoring section in
|
||||
[`scep-intune.md`](scep-intune.md) for the Intune-specific tab.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`tls.md`](tls.md) — the certctl-internal TLS configuration (HTTPS-only
|
||||
control plane, MinVersion pin)
|
||||
- [`security.md`](security.md) — overall security posture
|
||||
- [`database-tls.md`](database-tls.md) — Postgres TLS opt-in (Bundle B / M-018)
|
||||
- [`scep-intune.md`](scep-intune.md) — Microsoft Intune deployment guide
|
||||
- [`est.md`](est.md) — EST RFC 7030 server reference
|
||||
- [`docs/operator/legacy-clients-tls-1.2.md`](../../operator/legacy-clients-tls-1.2.md) — TLS 1.2 reverse-proxy runbook for legacy SCEP clients
|
||||
- [`docs/reference/architecture.md`](../architecture.md) — system design including SCEP server placement
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user