mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:21:30 +00:00
docs(acme-server): operator-facing reference + threat model + cert-manager walkthrough (Phase 6/7)
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.
What ships:
- docs/acme-server.md final pass: Auth-mode decision tree (when to
use trust_authenticated vs challenge), RFC 8555 + RFC 9773
conformance statement (section-by-section table of implemented
plus procurement-honest 'not implemented' rows for EAB / multi-
level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
(5 failure modes — badNonce / unknownAuthority / HTTP-01
connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
canonical fix for each), Version pinning + tested clients table
(cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
against-LE, can-I-use-from-outside-K8s, migration story, audit-
log catalog), See-also cross-link block.
- docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
certctl → Certificate flow, with YAML blocks byte-equal to
deploy/test/acme-integration/{clusterissuer-trust-authenticated,
certificate-test}.yaml to prevent doc/test drift.
- docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
options (OS trust store + Caddy pki.ca block).
- docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
.caServer + serversTransport.rootCAs configuration.
- docs/acme-server-threat-model.md: Threat surface map + JWS forgery
resistance (alg-confusion / HS256 substitution / replayed nonce /
URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
(pre-dial check + per-dial check + per-redirect check + body cap +
bounded redirects), DNS-01 cache-poisoning posture (default Google
Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
tuning, Audit trail catalog, Out-of-scope threats list.
- docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
top-level 'ACME Server (Built-in)' section between Issuer Connector
and Target Connector — distinguishes the consumer-side ACME
(existing) from the new server-side ACME via env-var-prefix
call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).
DoD verification:
- All 5 docs files exist with the structure prescribed by the
Phase 6 prompt.
- Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
to an actual lookup in internal/config/config.go (verified by
'grep -oE | sort -u | diff' returning empty).
- Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
byte-equal to the corresponding file in deploy/test/acme-integration/
(verified with 'diff' against awk-extracted YAML blocks).
- docs/connectors.md has the cross-link subsection with all 4 new
docs referenced.
- cowork/CLAUDE.md Architecture Decisions has the new ACME-server
bullet documenting per-profile URL family + per-profile
acme_auth_mode + Phase 4-5-6 progression.
- cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
the ACME-Server rollup spanning Phases 1a-6.
- cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
- 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.
Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
This commit is contained in:
@@ -0,0 +1,172 @@
|
||||
# Caddy Integration Walkthrough
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Caddy 2.7+. Target audience: operator running Caddy on a VM
|
||||
or container who wants Caddy to ACME-issue from certctl instead of
|
||||
Let's Encrypt.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
Step 2.
|
||||
- Caddy 2.7.x or later. `caddy version` should show 2.7.0+.
|
||||
- Network reachability: Caddy → certctl-server's HTTPS listener (port
|
||||
8443 by default).
|
||||
- The certctl bootstrap CA, in PEM form, captured for the trust
|
||||
configuration below. Capture exactly the same way as the cert-manager
|
||||
walkthrough Step 3 — use `cat deploy/test/certs/ca.crt`.
|
||||
|
||||
## Step 1 — Configure Caddy
|
||||
|
||||
Caddy's ACME issuer is configured per-site (or globally) via the
|
||||
`acme_ca` directive in a Caddyfile, or via the `tls.acme_ca` field
|
||||
in JSON config. The directive points at the directory URL:
|
||||
|
||||
```
|
||||
{
|
||||
email ops@example.com
|
||||
}
|
||||
|
||||
example.com {
|
||||
tls {
|
||||
acme_ca https://certctl.example.com:8443/acme/profile/prof-test/directory
|
||||
issuer acme
|
||||
}
|
||||
reverse_proxy localhost:8080
|
||||
}
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `acme_ca` must point at the directory URL (ending in `/directory`),
|
||||
not just the base. Caddy uses the directory document to discover
|
||||
the new-account / new-order URLs, exactly the same way cert-manager
|
||||
does.
|
||||
- `issuer acme` is the default; included here for clarity. Caddy can
|
||||
also be configured with `issuer zerossl` or `issuer internal`; for
|
||||
certctl integration, `acme` is the correct issuer.
|
||||
- Caddy auto-discovers `tls-alpn-01` first when port 443 is bound to
|
||||
Caddy, then falls back to HTTP-01. For `trust_authenticated` mode
|
||||
profiles, both work without solver round-trips.
|
||||
|
||||
## Step 2 — Trust the certctl bootstrap CA
|
||||
|
||||
Caddy validates the certctl-server's TLS chain before any ACME call,
|
||||
the same way cert-manager does. Two options for trust:
|
||||
|
||||
### Option A — OS trust store (preferred for VMs)
|
||||
|
||||
```
|
||||
sudo cp deploy/test/certs/ca.crt /usr/local/share/ca-certificates/certctl-bootstrap.crt
|
||||
sudo update-ca-certificates
|
||||
sudo systemctl restart caddy
|
||||
```
|
||||
|
||||
Caddy honors the system trust store via the Go runtime's
|
||||
`crypto/x509` defaults. After `update-ca-certificates`, Caddy's HTTPS
|
||||
client trusts certctl's self-signed root and the directory call
|
||||
succeeds.
|
||||
|
||||
### Option B — Caddy `tls.cas` (for containerized deployments)
|
||||
|
||||
```
|
||||
{
|
||||
pki {
|
||||
ca certctl_bootstrap {
|
||||
root_cert_file /etc/caddy/certctl-bootstrap.crt
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
example.com {
|
||||
tls {
|
||||
acme_ca https://certctl.example.com:8443/acme/profile/prof-test/directory
|
||||
ca certctl_bootstrap
|
||||
issuer acme
|
||||
}
|
||||
reverse_proxy localhost:8080
|
||||
}
|
||||
```
|
||||
|
||||
The `pki.ca` block registers a named CA Caddy can reference; the
|
||||
`tls.ca certctl_bootstrap` line in the site block scopes that trust
|
||||
to ACME calls for this site only. This is the right pattern for
|
||||
multi-tenant Caddy deployments where some sites trust certctl + others
|
||||
don't.
|
||||
|
||||
## Step 3 — Reload Caddy
|
||||
|
||||
```
|
||||
caddy validate --config /etc/caddy/Caddyfile
|
||||
sudo systemctl reload caddy
|
||||
```
|
||||
|
||||
Caddy reloads atomically; in-flight requests complete on the old
|
||||
config while new requests use the new ACME issuer. On the next
|
||||
`example.com` request, Caddy hits certctl's directory URL, registers
|
||||
an account, submits a new-order, and finalizes — typically completing
|
||||
in under 5 seconds for `trust_authenticated` mode.
|
||||
|
||||
## Step 4 — Verify
|
||||
|
||||
```
|
||||
caddy list-certificates
|
||||
# example.com (issuer=certctl.example.com): CN=example.com, valid until 2026-06-30
|
||||
```
|
||||
|
||||
The cert is in Caddy's certificate cache (`$XDG_DATA_HOME/caddy/certificates/`
|
||||
by default). Inspect:
|
||||
|
||||
```
|
||||
openssl x509 -in ~/.local/share/caddy/certificates/acme-v02.api.letsencrypt.org-directory/example.com/example.com.crt -noout -subject -issuer -dates
|
||||
# subject= CN=example.com
|
||||
# issuer= CN=certctl test internal CA
|
||||
```
|
||||
|
||||
(Path layout is Caddy-version-dependent; check `caddy environ` for the
|
||||
canonical data dir.)
|
||||
|
||||
On the certctl side, the operator's audit log captures the issuance
|
||||
event:
|
||||
|
||||
```
|
||||
psql -c "SELECT actor, action, resource_id FROM audit_events
|
||||
WHERE actor LIKE 'acme:%' ORDER BY created_at DESC LIMIT 5;"
|
||||
```
|
||||
|
||||
## Common failure modes
|
||||
|
||||
- **Caddy logs `tls: failed to verify certificate: x509: certificate
|
||||
signed by unknown authority`** → certctl bootstrap CA is not in
|
||||
Caddy's trust path. Re-do Step 2; verify with `curl --cacert
|
||||
/etc/caddy/certctl-bootstrap.crt https://certctl.example.com:8443/acme/profile/prof-test/directory`.
|
||||
- **Caddy logs `urn:ietf:params:acme:error:rateLimited`** → certctl
|
||||
per-account orders/hour limit hit (default 100/hr). Tune via
|
||||
`CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR` if you have
|
||||
legitimately high throughput.
|
||||
- **Caddy logs `urn:ietf:params:acme:error:rejectedIdentifier`** →
|
||||
the SAN list includes an identifier the certctl profile policy
|
||||
rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](./acme-server.md#certificate-readyfalse-with-rejectedidentifier).
|
||||
- **`badNonce` in Caddy logs** → clock skew or multi-replica certctl
|
||||
without sticky sessions; same fix as the cert-manager walkthrough.
|
||||
|
||||
## Cleanup
|
||||
|
||||
```
|
||||
caddy stop
|
||||
# remove the certctl-specific block from your Caddyfile
|
||||
sudo systemctl reload caddy
|
||||
# Optional: delete cached certs from the certctl directory namespace.
|
||||
rm -rf ~/.local/share/caddy/certificates/certctl.example.com-*
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
K8s-native equivalent.
|
||||
- [Caddy upstream ACME docs](https://caddyserver.com/docs/automatic-https#acme-issuer)
|
||||
— verify behavior pinned here against Caddy 2.7.x semantics.
|
||||
@@ -0,0 +1,254 @@
|
||||
# cert-manager Integration Walkthrough
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through cert-manager 1.15+. Target audience: Kubernetes operator who
|
||||
has never deployed certctl before and wants a working
|
||||
`Certificate` → `Secret` flow on their cluster in under 30 minutes.
|
||||
|
||||
The Phase 5 integration test (`make acme-cert-manager-test`) automates
|
||||
exactly the recipe below. The YAML snippets in this doc are byte-equal
|
||||
to the files under `deploy/test/acme-integration/` — re-running the
|
||||
test from a fresh clone produces the same results documented here.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- A Kubernetes cluster (kind / k3d / EKS / GKE / AKS / on-prem). For
|
||||
local trial, `kind v0.20+` works exactly the way the Phase 5 test
|
||||
uses it. The kind config lives at
|
||||
[`deploy/test/acme-integration/kind-config.yaml`](../deploy/test/acme-integration/kind-config.yaml).
|
||||
- `kubectl` v1.27+, `helm` v3.13+.
|
||||
- `cert-manager` v1.15.0 installed in the `cert-manager` namespace.
|
||||
If absent, run:
|
||||
|
||||
```
|
||||
bash deploy/test/acme-integration/cert-manager-install.sh
|
||||
```
|
||||
|
||||
which is the same idempotent installer the integration test uses.
|
||||
- A certctl Helm chart published to a registry your cluster can pull
|
||||
from. The Phase 5 test uses an `image.tag=test` placeholder; production
|
||||
deployments use the actual image tag for your release line.
|
||||
|
||||
## Step 1 — Deploy certctl-server
|
||||
|
||||
```
|
||||
helm install certctl-test deploy/helm/certctl/ \
|
||||
--set acmeServer.enabled=true \
|
||||
--set acmeServer.defaultProfileId=prof-test \
|
||||
--set image.tag=test
|
||||
kubectl wait --for=condition=Available --timeout=3m deployment/certctl-test
|
||||
```
|
||||
|
||||
`acmeServer.enabled=true` flips the `CERTCTL_ACME_SERVER_ENABLED`
|
||||
env var which gates the ACME route registration.
|
||||
`acmeServer.defaultProfileId` sets `CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID`
|
||||
so the `/acme/*` shorthand path mirrors the per-profile path family.
|
||||
|
||||
## Step 2 — Create the certctl profile
|
||||
|
||||
The ACME server requires a `certificate_profiles` row to bind issuance
|
||||
to. Create one via the certctl API or GUI; for the simplest case set
|
||||
`acme_auth_mode='trust_authenticated'`:
|
||||
|
||||
```
|
||||
curl -X POST https://certctl-test.default.svc.cluster.local:8443/api/profiles \
|
||||
-H 'Content-Type: application/json' \
|
||||
-H "Authorization: Bearer $CERTCTL_API_KEY" \
|
||||
-d '{
|
||||
"id": "prof-test",
|
||||
"name": "ACME test profile",
|
||||
"issuer_id": "iss-internal-ca",
|
||||
"max_ttl_seconds": 7776000,
|
||||
"acme_auth_mode": "trust_authenticated"
|
||||
}'
|
||||
```
|
||||
|
||||
Auth-mode tradeoffs are covered in
|
||||
[`docs/acme-server.md` § Auth-mode decision tree](./acme-server.md#auth-mode-decision-tree).
|
||||
For first-time deployments, `trust_authenticated` is the right default.
|
||||
|
||||
## Step 3 — Capture the certctl bootstrap CA
|
||||
|
||||
cert-manager validates the certctl-server's TLS chain before sending
|
||||
any account / order / finalize JWS. With certctl's self-signed
|
||||
bootstrap cert (the demo default at `deploy/test/certs/server.crt`),
|
||||
cert-manager rejects the directory URL with
|
||||
`x509: certificate signed by unknown authority` unless you feed the
|
||||
bootstrap CA in.
|
||||
|
||||
```
|
||||
cat deploy/test/certs/ca.crt | base64 -w0
|
||||
```
|
||||
|
||||
Capture the output for Step 4. This is **the** single biggest first-
|
||||
time-deploy footgun on the cert-manager integration path. The reference
|
||||
recipe lives in
|
||||
[`docs/acme-server.md` § TLS trust bootstrap](./acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
||||
|
||||
## Step 4 — Apply the ClusterIssuer
|
||||
|
||||
```yaml
|
||||
# Phase 5 — sample ClusterIssuer for the certctl trust_authenticated
|
||||
# auth mode (RFC 8555 §6 + certctl auth_mode=trust_authenticated, where
|
||||
# the JWS-authenticated ACME account is trusted to issue any identifier
|
||||
# the profile policy permits — no per-identifier ownership challenges).
|
||||
#
|
||||
# Use this as the starting template for any internal-PKI rollout.
|
||||
# Replace the caBundle placeholder with the base64-encoded PEM of the
|
||||
# certctl-server's self-signed bootstrap root, then `kubectl apply`.
|
||||
#
|
||||
# Generate the caBundle via:
|
||||
# cat deploy/test/certs/ca.crt | base64 -w0
|
||||
# (See certctl/docs/acme-server.md "TLS trust bootstrap" section for the
|
||||
# end-to-end walkthrough — this is the single biggest first-time-deploy
|
||||
# footgun on cert-manager, captured as audit fix #9.)
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: certctl-test-trust
|
||||
spec:
|
||||
acme:
|
||||
email: test@example.com
|
||||
# Replace 'certctl-test' with your release name + adjust the
|
||||
# profile path segment. Default profile path:
|
||||
# https://<service>.<namespace>.svc.cluster.local:8443/acme/profile/<profile-id>/directory
|
||||
server: https://certctl-test.default.svc.cluster.local:8443/acme/profile/prof-test/directory
|
||||
# caBundle: Audit fix #9. cert-manager validates the ACME server's
|
||||
# TLS chain before submitting any account/order/finalize. With a
|
||||
# self-signed bootstrap root, the ClusterIssuer MUST carry the root
|
||||
# explicitly via this field.
|
||||
caBundle: |
|
||||
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCi4uLgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
|
||||
privateKeySecretRef:
|
||||
name: certctl-test-trust-account-key
|
||||
solvers:
|
||||
# In trust_authenticated mode the solver is unused at the
|
||||
# validation step but cert-manager still requires at least one
|
||||
# solver in the spec. http01-via-ingress-nginx is the cheapest
|
||||
# placeholder shape that round-trips correctly through cert-
|
||||
# manager's validation webhooks.
|
||||
- http01:
|
||||
ingress:
|
||||
class: nginx
|
||||
```
|
||||
|
||||
This block is byte-equal to
|
||||
[`deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml`](../deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml).
|
||||
Replace the `caBundle` placeholder with the base64 string from Step 3.
|
||||
The full reference YAML lives at
|
||||
[`deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml`](../deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml).
|
||||
|
||||
```
|
||||
kubectl apply -f deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml
|
||||
kubectl wait --for=condition=Ready --timeout=2m clusterissuer/certctl-test-trust
|
||||
```
|
||||
|
||||
The solver block is a placeholder under `trust_authenticated` mode —
|
||||
cert-manager 1.15 still requires at least one solver in the spec, but
|
||||
certctl auto-resolves authzs without a solver round-trip. The
|
||||
http01-ingress-nginx shape validates against cert-manager's webhook
|
||||
without needing an actual ingress controller deployed.
|
||||
|
||||
For `challenge` mode profiles, swap to
|
||||
[`deploy/test/acme-integration/clusterissuer-challenge.yaml`](../deploy/test/acme-integration/clusterissuer-challenge.yaml)
|
||||
— same shape, but the solver is now load-bearing and you need
|
||||
ingress-nginx (or your chosen ingress class) actually deployed for
|
||||
HTTP-01 to work.
|
||||
|
||||
## Step 5 — Apply the Certificate
|
||||
|
||||
```yaml
|
||||
# Phase 5 — Certificate resource the integration test applies and
|
||||
# waits for. The certctl-test-trust ClusterIssuer (trust_authenticated
|
||||
# mode) issues the cert without any solver round-trip; the resulting
|
||||
# Secret 'test-com-tls' is asserted to carry tls.crt + tls.key.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: test-com
|
||||
namespace: default
|
||||
spec:
|
||||
secretName: test-com-tls
|
||||
commonName: test.example.com
|
||||
dnsNames:
|
||||
- test.example.com
|
||||
- www.test.example.com
|
||||
issuerRef:
|
||||
name: certctl-test-trust
|
||||
kind: ClusterIssuer
|
||||
duration: 720h # 30d
|
||||
renewBefore: 240h # 10d
|
||||
```
|
||||
|
||||
This block is byte-equal to
|
||||
[`deploy/test/acme-integration/certificate-test.yaml`](../deploy/test/acme-integration/certificate-test.yaml).
|
||||
|
||||
```
|
||||
kubectl apply -f deploy/test/acme-integration/certificate-test.yaml
|
||||
kubectl wait --for=condition=Ready --timeout=3m certificate/test-com
|
||||
```
|
||||
|
||||
cert-manager creates an `Order`, the ACME flow runs against certctl,
|
||||
and the resulting Secret is populated.
|
||||
|
||||
## Step 6 — Verify
|
||||
|
||||
```
|
||||
kubectl get certificate test-com -o wide
|
||||
# NAME READY SECRET ISSUER STATUS AGE
|
||||
# test-com True test-com-tls certctl-test-trust Certificate is up to date and has not expired 42s
|
||||
|
||||
kubectl get secret test-com-tls -o yaml | yq '.data."tls.crt"' | base64 -d | openssl x509 -noout -subject -issuer -dates
|
||||
# subject= CN=test.example.com
|
||||
# issuer= CN=certctl test internal CA
|
||||
# notBefore=... notAfter=...
|
||||
```
|
||||
|
||||
Both the cert-manager `Certificate` resource and the underlying Secret
|
||||
are populated. The actor on the certctl side is `acme:<account-id>`,
|
||||
which you can correlate via the `audit_events` table:
|
||||
|
||||
```
|
||||
psql -c "SELECT created_at, action, resource_type, resource_id
|
||||
FROM audit_events
|
||||
WHERE actor LIKE 'acme:%'
|
||||
ORDER BY created_at DESC LIMIT 10;"
|
||||
```
|
||||
|
||||
## Common failure modes
|
||||
|
||||
These are operator-side; full troubleshooting reference is in
|
||||
[`docs/acme-server.md` § Troubleshooting](./acme-server.md#troubleshooting).
|
||||
|
||||
- `400 Bad Request: badNonce` → clock skew between certctl-server and
|
||||
cert-manager, or a multi-replica certctl fleet without sticky
|
||||
sessions.
|
||||
- `x509: certificate signed by unknown authority` → missing or stale
|
||||
`caBundle`. Re-run Step 3, paste the fresh value.
|
||||
- `connection refused` from the HTTP-01 validator → ingress controller
|
||||
not deployed, OR your network blocks port 80 inbound to the solver
|
||||
Ingress.
|
||||
- `Ready=False` with `rejectedIdentifier` → CSR has a SAN your profile
|
||||
policy doesn't permit. Decode the `subproblems` array of the RFC
|
||||
7807 problem doc.
|
||||
|
||||
## Cleanup
|
||||
|
||||
```
|
||||
kubectl delete -f deploy/test/acme-integration/certificate-test.yaml
|
||||
kubectl delete -f deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml
|
||||
helm uninstall certctl-test
|
||||
# Optional: delete the certctl profile via API.
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md) —
|
||||
security posture.
|
||||
- [`docs/acme-caddy-walkthrough.md`](./acme-caddy-walkthrough.md) —
|
||||
Caddy-side recipe.
|
||||
- [`docs/acme-traefik-walkthrough.md`](./acme-traefik-walkthrough.md) —
|
||||
Traefik-side recipe.
|
||||
- [`deploy/test/acme-integration/`](../deploy/test/acme-integration/) —
|
||||
Phase 5 integration test (the same recipe, automated).
|
||||
@@ -0,0 +1,278 @@
|
||||
# ACME Server — Threat Model
|
||||
|
||||
Security posture for the certctl ACME server endpoint
|
||||
(`/acme/profile/<id>/*`). Read this before opening a PR that changes
|
||||
the JWS verifier, the challenge validators, the rate limiter, or the
|
||||
GC sweeper.
|
||||
|
||||
The threat model lives in this dedicated doc (rather than `docs/acme-server.md`)
|
||||
because security-review reviewers want a single concentrated reference.
|
||||
Production deployments under audit should treat this doc as the
|
||||
canonical answer to "how does certctl resist X?"
|
||||
|
||||
## Threat surface map
|
||||
|
||||
The ACME server has four ingress surfaces:
|
||||
|
||||
1. **JWS-authenticated POST endpoints** — new-account, new-order,
|
||||
finalize, key-change, revoke-cert, account update, order POST-as-GET.
|
||||
Authenticated by an ECDSA / RSA / EdDSA signature over the request.
|
||||
2. **Unauthenticated GET endpoints** — directory, new-nonce, ARI
|
||||
(renewal-info). Read-only; no authn.
|
||||
3. **Outbound challenge validators** — HTTP-01, DNS-01, TLS-ALPN-01.
|
||||
The certctl-server initiates outbound calls to operator-provided
|
||||
identifiers (the SAN list of the requested cert).
|
||||
4. **Scheduler-driven GC sweeper** — internal-only; no inbound surface.
|
||||
|
||||
Threat actors:
|
||||
|
||||
- **External Internet attacker** — no certctl credentials; can hit
|
||||
unauthenticated endpoints + observe TLS metadata.
|
||||
- **Authenticated ACME account holder (low-trust)** — has a valid
|
||||
account on a profile but should be bounded by profile policy +
|
||||
rate limits.
|
||||
- **On-path attacker** between certctl-server and a challenge target
|
||||
(HTTP-01 / DNS-01 / TLS-ALPN-01).
|
||||
- **Compromised cert holder** — has the private key of a previously-
|
||||
issued cert and wants to revoke/exfiltrate.
|
||||
- **Malicious operator with profile-write access** — can change a
|
||||
profile's `acme_auth_mode` or policy, but is the trusted boundary
|
||||
per certctl's threat model. Out of scope here; covered by certctl's
|
||||
RBAC + audit log.
|
||||
|
||||
## JWS forgery resistance
|
||||
|
||||
The verifier (`internal/api/acme/jws.go`) accepts only the closed
|
||||
allow-list `{RS256, ES256, EdDSA}`. The allow-list is passed to
|
||||
`jose.ParseSigned` so go-jose rejects every other algorithm at parse
|
||||
time, before any signature work.
|
||||
|
||||
Specific attacks blocked:
|
||||
|
||||
- **Algorithm confusion (`alg: none`)** — RFC 7515 §6.1's classic
|
||||
unauthenticated-fallback. Not in allow-list; rejected at parse.
|
||||
- **HS256 substitution (alg-confusion via symmetric)** — symmetric
|
||||
algs aren't in the allow-list; rejected at parse.
|
||||
- **Replayed nonce** — every JWS carries a nonce consumed via
|
||||
`acme_nonces.UPDATE … WHERE used = FALSE` (a single statement;
|
||||
Postgres row-locking serializes the writes). A second consume of
|
||||
the same nonce sees `RowsAffected=0` and the verifier returns
|
||||
`badNonce`.
|
||||
- **URL spoofing** — the protected-header `url` field MUST match the
|
||||
request URL exactly (RFC 8555 §6.4); a JWS signed for one URL
|
||||
cannot be replayed against another.
|
||||
- **Multi-signature JWS** — RFC 8555 §6.2 forbids; the verifier
|
||||
rejects `len(jws.Signatures) != 1` explicitly.
|
||||
- **kid-vs-jwk confusion** — exactly one MUST be present per RFC 8555
|
||||
§6.2; both-present and neither-present are rejected.
|
||||
- **kid round-trip mismatch** — the verifier's `AccountKID` closure
|
||||
computes the canonical kid URL for the resolved account-id and
|
||||
compares to the inbound `kid`; cross-profile replay is rejected
|
||||
because the canonical URL differs.
|
||||
|
||||
The doubly-signed key-rollover JWS (RFC 8555 §7.3.5, Phase 4) gets
|
||||
its own dedicated verifier in `internal/api/acme/keychange.go`.
|
||||
Inner-only invariants enforced: MUST use `jwk` not `kid`, payload
|
||||
`account` MUST equal outer `kid`, payload `oldKey` MUST canonicalize-
|
||||
equal the registered key (RFC 7638 thumbprint, constant-time
|
||||
compare), inner `url` MUST equal outer `url`.
|
||||
|
||||
## Nonce store integrity
|
||||
|
||||
Nonces are persisted in PostgreSQL (`acme_nonces` table; migration
|
||||
000025) with a TTL set by `CERTCTL_ACME_SERVER_NONCE_TTL` (default
|
||||
5 min). The Phase 5 GC sweeper deletes used / expired rows every 1
|
||||
minute by default.
|
||||
|
||||
Why DB-backed and not in-memory:
|
||||
|
||||
- **Survives restart** — a multi-replica certctl-server fleet behind
|
||||
a load balancer can issue a nonce on replica A and consume it on
|
||||
replica B. In-memory state would force sticky sessions globally,
|
||||
which the operator can't guarantee in all topologies.
|
||||
- **Atomic consume** — a single `UPDATE ... WHERE used = FALSE`
|
||||
statement is the consume primitive; Postgres row-locking guarantees
|
||||
exactly one of two concurrent consumes wins.
|
||||
- **Expiry-bounded** — even if the GC sweeper were disabled, the
|
||||
nonce TTL is enforced at consume time
|
||||
(`AND expires_at > NOW()` in the UPDATE).
|
||||
|
||||
A nonce-store-side compromise would let an attacker forge nonces.
|
||||
Mitigation: the nonce table is in the same Postgres instance certctl
|
||||
already trusts; a DB compromise is broader than ACME-specific.
|
||||
|
||||
## HTTP-01 SSRF resistance
|
||||
|
||||
The HTTP-01 validator (Phase 3, `internal/api/acme/validators.go`)
|
||||
fetches `http://<identifier>/.well-known/acme-challenge/<token>`
|
||||
where the identifier is operator/client-controlled. Without
|
||||
mitigation, this is a textbook SSRF surface — internal services on
|
||||
RFC1918 / link-local / cloud-metadata addresses would be reachable.
|
||||
|
||||
Mitigations (defense in depth):
|
||||
|
||||
1. **Pre-dial check** — `validation.ValidateSafeURL` rejects URLs
|
||||
whose host parses as a literal reserved IP. Cheap early bail.
|
||||
2. **Per-dial check** — `validation.SafeHTTPDialContext` is installed
|
||||
on the `http.Transport`. Every dial re-resolves DNS, rejects
|
||||
reserved IPs, and **pins the resolved IP** (`net.JoinHostPort(ips[0],
|
||||
port)`) so a racing DNS rebinding cannot substitute a different IP
|
||||
between resolve and connect.
|
||||
3. **Per-redirect check** — Go's HTTP client re-dials on 3xx; the
|
||||
`DialContext` runs again, applying the same SSRF guards.
|
||||
4. **Body cap** — the validator's `io.LimitReader` caps response
|
||||
bodies at 16 KiB. A misbehaving target cannot DoS the validator
|
||||
pool with a multi-GB response.
|
||||
5. **Bounded redirects** — the validator caps redirects at 10 (Go
|
||||
default). A redirect-loop target is bounded.
|
||||
|
||||
Reserved IP set: loopback (127.0.0.0/8 + ::1), link-local
|
||||
(169.254.0.0/16 + fe80::/10), all RFC1918 (10/8, 172.16/12, 192.168/16),
|
||||
cloud-metadata literals (169.254.169.254 explicitly), broadcast,
|
||||
multicast, IPv4-mapped-IPv6 to a reserved IPv4. See
|
||||
`internal/validation/ssrf.go::isReservedIPForDial` for the full set.
|
||||
|
||||
CodeQL alert #23 flags `client.Do(req)` in the SCEP-probe call site
|
||||
as `go/request-forgery` despite the dial-time guard; the analyzer
|
||||
can't trace through a custom `Transport.DialContext`. Operator-
|
||||
acknowledged false positive (CLAUDE.md task #10) — see the SCEP
|
||||
probe's same-shaped defense for the audit trail.
|
||||
|
||||
## DNS-01 cache poisoning posture
|
||||
|
||||
The DNS-01 validator queries
|
||||
`_acme-challenge.<domain>` against a single resolver configured by
|
||||
`CERTCTL_ACME_SERVER_DNS01_RESOLVER` (default `8.8.8.8:53`).
|
||||
|
||||
Threat: an operator running a private resolver (typical in air-gapped
|
||||
deployments) inherits that resolver's cache-poisoning posture. A
|
||||
poisoned resolver could attest a TXT record the legitimate domain
|
||||
owner never published, allowing an attacker who controls the
|
||||
resolver to forge ACME challenges.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- Default `8.8.8.8:53` is Google Public DNS — DNSSEC-validating,
|
||||
operationally hardened, well-monitored.
|
||||
- Operators choosing a private resolver own the cache-poisoning
|
||||
posture. The doc explicitly flags this in
|
||||
`docs/acme-server.md` § Configuration.
|
||||
- DNSSEC-validation is **not** enforced by the validator itself —
|
||||
the validator trusts the resolver's answer. Operators wanting
|
||||
strict DNSSEC validation should use a DNSSEC-validating resolver
|
||||
(e.g. `1.1.1.1` or a self-hosted Unbound).
|
||||
|
||||
## TLS-ALPN-01 challenge interception
|
||||
|
||||
RFC 8737 §3 explicitly says the validator MUST NOT verify the
|
||||
challenge target's certificate chain — the proof lives in the
|
||||
embedded `id-pe-acmeIdentifier` extension (OID 1.3.6.1.5.5.7.1.31)
|
||||
of the cert presented during the TLS handshake, not in the chain
|
||||
itself.
|
||||
|
||||
Implementation: `internal/api/acme/validators.go::TLSALPN01Validator`
|
||||
sets `tls.Config.InsecureSkipVerify = true` with a dedicated
|
||||
`//nolint:gosec` annotation citing RFC 8737 §3 and the L-001
|
||||
documentation row in `docs/tls.md`.
|
||||
|
||||
What this means for on-path attackers:
|
||||
|
||||
- An on-path attacker between certctl-server and the challenge target
|
||||
CAN intercept the TLS handshake and present a forged cert. The
|
||||
proof is the embedded extension byte-equality, which the attacker
|
||||
cannot generate without the account key — so interception alone
|
||||
doesn't grant cert issuance.
|
||||
- An attacker who has the account key already controls the account
|
||||
per RFC 8555; the TLS-ALPN-01 validator's interception window adds
|
||||
no incremental capability.
|
||||
|
||||
The integrity property TLS-ALPN-01 actually provides: the challenge
|
||||
target proves possession of the account-key-derived key authorization
|
||||
on a TLS connection bound to the requested identifier (port 443 of
|
||||
the SAN). Operators wanting CA/Browser-Forum-style WebPKI strictness
|
||||
should run a dedicated public-trust CA, not certctl.
|
||||
|
||||
## Rate-limit tuning
|
||||
|
||||
Phase 5 in-memory token buckets with per-(action, key) isolation.
|
||||
Defaults:
|
||||
|
||||
- `RATE_LIMIT_ORDERS_PER_HOUR=100` per account.
|
||||
- `RATE_LIMIT_CONCURRENT_ORDERS=5` per account (pending/ready/processing).
|
||||
- `RATE_LIMIT_KEY_CHANGE_PER_HOUR=5` per account.
|
||||
- `RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR=60` per challenge-id.
|
||||
|
||||
Tuning:
|
||||
|
||||
- **Too loose** → enables abuse vectors. A compromised account could
|
||||
burn DB-row throughput; a runaway client could fill the validator
|
||||
pool.
|
||||
- **Too tight** → legitimate flake-out. cert-manager's exponential
|
||||
backoff after a `rateLimited` problem is conservative; a 1-hour
|
||||
cooldown is a long time for an operator hitting an unexpected limit.
|
||||
|
||||
Defaults are intentionally conservative on the loose-side — 100/hour
|
||||
is generous for any plausible per-account fleet (a 50k-cert
|
||||
deployment renewing at the 1/3-validity mark consumes ~12
|
||||
orders/year/cert ≈ 600k orders/year ≈ 70 orders/hour even spread
|
||||
evenly across accounts). Tighter limits are appropriate for
|
||||
deployments with many low-trust accounts.
|
||||
|
||||
The buckets are in-memory + per-replica. A 3-replica certctl-server
|
||||
fleet effectively has 3× the configured per-account throughput
|
||||
because each replica's bucket fills independently. For deployments
|
||||
where this matters operationally, the right answer is a shared rate-
|
||||
limit store (Redis / Postgres-backed); not blocking for current
|
||||
threat model where same-account requests typically pin to the same
|
||||
replica via session affinity.
|
||||
|
||||
## Audit trail
|
||||
|
||||
Every ACME state mutation writes a row to `audit_events`. Actor strings
|
||||
distinguish the auth path:
|
||||
|
||||
- `acme:<account-id>` — kid-path requests (the requesting account
|
||||
signed the JWS).
|
||||
- `acme-cert-key:<serial>` — jwk-path revoke (the cert's own private
|
||||
key signed the JWS).
|
||||
- `acme-system:gc` — scheduler-driven sweeps (no client request).
|
||||
|
||||
Operators querying by actor prefix can reconstruct the full history
|
||||
of any ACME-issued cert. See
|
||||
`docs/acme-server.md` § FAQ "What audit-log events fire" for the
|
||||
event-name catalog.
|
||||
|
||||
## Out-of-scope threats
|
||||
|
||||
Documented to set scope expectations for security reviewers:
|
||||
|
||||
- **DDoS at the TLS layer** — the certctl-server's TLS listener +
|
||||
upstream load balancer / WAF handle this. The ACME-specific rate
|
||||
limits don't substitute for upstream DDoS protection.
|
||||
- **cert-manager-side compromise** — if cert-manager is compromised,
|
||||
it has both the account key and the private keys of every issued
|
||||
cert. Out of certctl's trust boundary; operators run cert-manager
|
||||
with the same care they'd run any other secret-bearing operator.
|
||||
- **Compromised certctl-server filesystem** — the bootstrap CA key
|
||||
lives at `deploy/test/certs/ca.key` (or the operator-managed
|
||||
equivalent). A filesystem compromise is broader than ACME-specific
|
||||
and is covered by certctl's HSM / signer-driver architecture (see
|
||||
`docs/architecture.md` "Signer abstraction").
|
||||
- **Postgres compromise** — the nonce table, account JWKs, and
|
||||
audit log all live in the same Postgres instance. A DB compromise
|
||||
is broader than ACME-specific and is the operator's responsibility
|
||||
to mitigate via standard DB-hardening practices.
|
||||
- **Supply-chain attacks against go-jose / lib/pq** — handled by
|
||||
Dependabot + the `make verify` security gate; not ACME-specific.
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — operator-facing reference.
|
||||
- [`docs/tls.md`](./tls.md) — TLS posture, including the L-001
|
||||
table of `InsecureSkipVerify` justifications (TLS-ALPN-01 row).
|
||||
- [`internal/api/acme/jws.go`](../internal/api/acme/jws.go) — verifier
|
||||
source.
|
||||
- [`internal/api/acme/validators.go`](../internal/api/acme/validators.go)
|
||||
— challenge validator pool.
|
||||
- [`internal/validation/ssrf.go`](../internal/validation/ssrf.go) —
|
||||
SSRF-defense primitives.
|
||||
+268
-10
@@ -7,15 +7,16 @@ as an ACME issuer with no certctl-side modification — closing the
|
||||
"deploy a certctl agent on every K8s node" friction that costs deals to
|
||||
external PKI vendors today.
|
||||
|
||||
> **Phase status (2026-05-03):** Phase 5 — production hardening +
|
||||
> cert-manager integration test. Per-account rate limits applied at
|
||||
> 3 entry points (orders/hour, key-change/hour, challenge-respond/hour)
|
||||
> + a per-account concurrent-orders cap; a 1-minute scheduler loop
|
||||
> sweeps expired nonces / authzs / orders. A kind-driven cert-manager
|
||||
> integration test (gated by `KIND_AVAILABLE`) verifies the full
|
||||
> happy-path against a real cert-manager 1.15+ deployment. RFC
|
||||
> conformance is verified via lego against the same stack. Track
|
||||
> shipped phases via `git log --grep='acme-server:'`.
|
||||
> **Phase status (2026-05-03):** Phase 6 — full operator-facing
|
||||
> reference. The functional surface is complete (Phases 1a-5); this
|
||||
> doc is the canonical procurement-readability reference. New: client-
|
||||
> walkthrough docs for [cert-manager](./acme-cert-manager-walkthrough.md),
|
||||
> [Caddy](./acme-caddy-walkthrough.md), and
|
||||
> [Traefik](./acme-traefik-walkthrough.md); a dedicated
|
||||
> [threat model](./acme-server-threat-model.md); a section-by-section
|
||||
> RFC 8555 + RFC 9773 conformance statement; a 5-failure-mode
|
||||
> troubleshooting playbook; a tested-clients version pinning table.
|
||||
> Track shipped phases via `git log --grep='acme-server:'`.
|
||||
|
||||
## Configuration
|
||||
|
||||
@@ -105,6 +106,41 @@ the `caBundle` requirement is flagged here in Phase 1a's docs because
|
||||
operators hit it the moment they try to point a real ACME client at
|
||||
certctl.
|
||||
|
||||
## Auth-mode decision tree
|
||||
|
||||
Use `trust_authenticated` when:
|
||||
|
||||
- The certctl deployment serves **internal-only PKI** (intranet certs,
|
||||
service-mesh certs, IoT bootstrap). Identifiers in your CSRs are
|
||||
controlled by your infrastructure, not by the public Internet.
|
||||
- You don't have HTTP/DNS reachability **from certctl-server back to
|
||||
the ACME client's solver** (e.g., the client lives in an isolated
|
||||
network segment certctl-server can't reach).
|
||||
- You want the simplest cert-manager integration: cert-manager submits
|
||||
a CSR, certctl issues; no out-of-band ownership proof.
|
||||
- You're issuing under your own root CA whose trust is operator-managed
|
||||
(NOT WebPKI). Public CAs cannot use this mode — RFC 8555 §8 ownership
|
||||
proof is non-negotiable for public-trust roots.
|
||||
|
||||
Use `challenge` when:
|
||||
|
||||
- The deployment is **public-trust-style PKI** — even if your root is
|
||||
privately operated, you want CA/Browser Forum-style ownership-proof
|
||||
semantics so a stolen account key can't be used to issue for arbitrary
|
||||
identifiers.
|
||||
- You have HTTP-01 / DNS-01 / TLS-ALPN-01 reachability from the
|
||||
certctl-server to the ACME client's solver. (HTTP-01 needs port 80
|
||||
ingress to the client; DNS-01 needs DNS recursion; TLS-ALPN-01 needs
|
||||
port 443 ingress.)
|
||||
- You want defense-in-depth: an account-key compromise costs the
|
||||
attacker nothing without also compromising the solver-side
|
||||
infrastructure.
|
||||
|
||||
A single certctl-server can run both modes simultaneously — the auth
|
||||
mode is a per-profile column on `certificate_profiles.acme_auth_mode`,
|
||||
read at request time. Operators flip a profile's mode via SQL or the
|
||||
profile API, and the next order picks up the new mode without restart.
|
||||
|
||||
## Endpoints
|
||||
|
||||
Routes registered in `internal/api/router/router.go::RegisterHandlers`:
|
||||
@@ -143,6 +179,49 @@ After Phase 4, the full RFC 8555 + RFC 9773 surface is live. RFC 8739
|
||||
(short-lived certs) and EAB enforcement remain follow-up work; cert-
|
||||
manager + boulder-tested clients work today against the surface above.
|
||||
|
||||
## RFC 8555 + RFC 9773 conformance statement
|
||||
|
||||
Honest disclosure of what's implemented, where, and what's not. Procurement
|
||||
engineers running gap analyses against cert-manager + Let's Encrypt's
|
||||
conformance posture should read this section before anything else.
|
||||
|
||||
### Implemented
|
||||
|
||||
| Section | Surface | Phase | First commit |
|
||||
|---------|---------|-------|--------------|
|
||||
| RFC 8555 §6.2 | JWS auth + RS256/ES256/EdDSA allow-list | 1b | `27bd660` |
|
||||
| RFC 8555 §6.3 | POST-as-GET | 1b | `27bd660` |
|
||||
| RFC 8555 §6.4 | URL-header binding to request URL | 1b | `27bd660` |
|
||||
| RFC 8555 §6.5 | Replay-Nonce + DB-backed nonce store | 1a | `e146b00` |
|
||||
| RFC 8555 §6.7 | RFC 7807 problem documents | 1a | `e146b00` |
|
||||
| RFC 8555 §7.1 | Directory | 1a | `e146b00` |
|
||||
| RFC 8555 §7.2 | new-nonce HEAD + GET | 1a | `e146b00` |
|
||||
| RFC 8555 §7.3 | new-account + idempotent re-registration | 1b | `27bd660` |
|
||||
| RFC 8555 §7.3.2 + §7.3.6 | account update + deactivation | 1b | `27bd660` |
|
||||
| RFC 8555 §7.3.5 | doubly-signed key rollover | 4 | `0299e4a` |
|
||||
| RFC 8555 §7.4 | new-order + finalize + cert download | 2 | `4ee486e` |
|
||||
| RFC 8555 §7.5 | authz POST-as-GET | 2 | `4ee486e` |
|
||||
| RFC 8555 §7.5.1 | challenge response | 3 | `7e22204` |
|
||||
| RFC 8555 §7.6 | revoke-cert (kid + jwk paths) | 4 | `0299e4a` |
|
||||
| RFC 8555 §8.3 | HTTP-01 challenge validator | 3 | `7e22204` |
|
||||
| RFC 8555 §8.4 | DNS-01 challenge validator | 3 | `7e22204` |
|
||||
| RFC 8737 | TLS-ALPN-01 challenge validator | 3 | `7e22204` |
|
||||
| RFC 9773 | ACME Renewal Information (ARI) | 4 | `0299e4a` |
|
||||
|
||||
### Not implemented (procurement-honest)
|
||||
|
||||
| Spec area | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| RFC 8555 §7.3.4 — External Account Binding (EAB) | **Not implemented.** | Advertised in directory `meta.externalAccountRequired` but enforcement is a follow-up. Operators relying on EAB for account-creation gating should layer an upstream WAF. |
|
||||
| RFC 8555 §8.4 + §7.4 — Wildcard with `*.` prefix > 1 level | **Not implemented.** | Single-level wildcards (e.g. `*.example.com`) work end-to-end. Multi-level wildcards (`*.*.example.com`) are RFC-spec-ambiguous and rejected at the identifier-validation layer. |
|
||||
| RFC 8738 — Short-lived certs | **Not implemented.** | Operators wanting <7-day validity tune the bound issuer's TTL directly via `CertificateProfile.MaxTTLSeconds`; the ACME wire shape doesn't expose a separate notion. |
|
||||
| Cross-CA proxying | **Not implemented.** | Each profile binds to one issuer. Multi-CA federation (one ACME account → multi-CA selection per identifier) is roadmap. |
|
||||
| RFC 8555 §6.7 — `accountDoesNotExist` problem with hint URL | Partial. | Sentinel returns `accountDoesNotExist`; the optional hint URL embedding the `kid` is not emitted. cert-manager doesn't consume it. |
|
||||
|
||||
If a procurement-side gap analysis turns up something not in either
|
||||
table above, the answer is "we don't know yet" — operator-side issues
|
||||
welcome.
|
||||
|
||||
## Finalize routing through `CertificateService.Create` (Phase 2 architecture)
|
||||
|
||||
The finalize path mirrors how every other certctl issuance surface
|
||||
@@ -214,7 +293,7 @@ at `internal/service/certificate.go:131`).
|
||||
| 3 | live | HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (challenge mode end-to-end) |
|
||||
| 4 | live | key rollover (RFC 8555 §7.3.5) + revoke-cert (§7.6) + ARI (RFC 9773) |
|
||||
| 5 | live | rate limits + GC sweeper + kind-driven cert-manager integration test + lego conformance harness + k6 ACME-flow scenario |
|
||||
| 6 | not yet | full operator-facing reference + walkthroughs + threat model |
|
||||
| 6 | live | full operator-facing reference + walkthroughs (cert-manager / Caddy / Traefik) + threat model + RFC-8555 conformance statement + troubleshooting + version pinning |
|
||||
|
||||
Track shipped phases via `git log --grep='acme-server:' --oneline`.
|
||||
|
||||
@@ -386,3 +465,182 @@ surface (directory + new-nonce + ARI) at 100 VUs × 5m. JWS-signed
|
||||
flows are out of scope for k6 (no JWS support); they're covered by
|
||||
the lego conformance harness above. Baseline numbers + thresholds in
|
||||
`deploy/test/loadtest/README.md`.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
The five failure modes operators hit most often + the canonical fix
|
||||
for each.
|
||||
|
||||
### `cert-manager logs: 400 Bad Request: badNonce`
|
||||
|
||||
**Cause:** Either a nonce was replayed (a buggy client retries the
|
||||
same JWS), the cert-manager + certctl-server clocks differ by more
|
||||
than `CERTCTL_ACME_SERVER_NONCE_TTL` (default 5 min), or the
|
||||
nonce-store row was reaped between issuance and use.
|
||||
|
||||
**Fix:** First check NTP on both sides. If clocks are healthy,
|
||||
lengthen `CERTCTL_ACME_SERVER_NONCE_TTL` to 10m or 15m. If the
|
||||
problem persists, check for a multi-replica certctl-server fleet
|
||||
without sticky session affinity — the nonce DB row lives on one
|
||||
replica; if the JWS POST hits a different replica before replication
|
||||
catches up, you observe spurious `badNonce`. Solution: pin client
|
||||
sessions to a single replica via load-balancer cookie / `kid`-hash
|
||||
routing, OR shorten replication lag if your DB is the bottleneck.
|
||||
|
||||
### `cert-manager logs: x509: certificate signed by unknown authority`
|
||||
|
||||
**Cause:** cert-manager refuses to talk to the directory URL because
|
||||
its TLS chain doesn't terminate at a root in cert-manager's trust
|
||||
store. certctl-server's bootstrap cert (Phase 1a, `deploy/test/certs/server.crt`)
|
||||
is self-signed.
|
||||
|
||||
**Fix:** Add the `caBundle` field to your `ClusterIssuer.spec.acme` —
|
||||
see the [TLS trust bootstrap](#tls-trust-bootstrap-read-this-before-configuring-cert-manager)
|
||||
section above for the 3-step recipe. This is **the** single biggest
|
||||
first-time-deploy footgun on the cert-manager integration path.
|
||||
|
||||
### HTTP-01 validator returns `connection refused`
|
||||
|
||||
**Cause:** The HTTP-01 solver's Ingress / Service is not reachable
|
||||
from certctl-server's network. Common subcases: (a) the cert-manager
|
||||
http-solver pod is on a private network certctl-server can't reach;
|
||||
(b) a firewall blocks port 80 inbound to the solver's address; (c)
|
||||
the Ingress class annotation doesn't match an installed ingress
|
||||
controller; (d) your DNS still points at an old IP.
|
||||
|
||||
**Fix:** From the certctl-server pod, `curl -v
|
||||
http://<identifier>/.well-known/acme-challenge/<token>` and read the
|
||||
network error. If the curl fails the same way, the network path is
|
||||
the issue. If curl works but the validator fails, check the validator
|
||||
log lines — the SSRF guard rejects reserved IPs (RFC1918, link-local,
|
||||
cloud-metadata 169.254.169.254). Public-trust style profiles that
|
||||
need to reach RFC1918 solvers must be moved to `trust_authenticated`
|
||||
mode OR the solver must be exposed on a routable address.
|
||||
|
||||
### DNS-01 validator returns `NXDOMAIN`
|
||||
|
||||
**Cause:** DNS provider hasn't propagated the `_acme-challenge.<domain>`
|
||||
TXT record yet. Most providers have a 30s-2m propagation lag. cert-manager
|
||||
retries by default, but Phase-5 rate limits (default 60/hour per
|
||||
challenge-id) can truncate the retry budget.
|
||||
|
||||
**Fix:** Verify TXT propagation with `dig +short TXT _acme-challenge.<domain>
|
||||
@<your-resolver>`. If the answer is empty, the issue is upstream. If
|
||||
it's populated but certctl reports NXDOMAIN, check
|
||||
`CERTCTL_ACME_SERVER_DNS01_RESOLVER` (default `8.8.8.8:53`) is
|
||||
reachable from certctl-server's network egress. Operators on isolated
|
||||
networks need a private resolver; configure accordingly + own the
|
||||
cache-poisoning posture (see [threat
|
||||
model](./acme-server-threat-model.md)).
|
||||
|
||||
### Certificate Ready=False with `rejectedIdentifier`
|
||||
|
||||
**Cause:** The CSR includes an identifier (CommonName or SAN) that the
|
||||
bound certificate profile's policy rejects. certctl runs syntactic +
|
||||
profile-policy validation **before** order creation; the order never
|
||||
reaches the database.
|
||||
|
||||
**Fix:** The reject reason is in the `subproblems` array of the RFC
|
||||
8555 §6.7 problem document. Decode the JSON, look at `subproblems[].detail`,
|
||||
and adjust either the CSR or the profile policy. Common causes:
|
||||
SAN-not-in-`AllowedIdentifierWildcards`, EKU-not-in-`AllowedEKUs`,
|
||||
TTL-exceeds-`MaxTTLSeconds`. Validation logic lives in
|
||||
`internal/api/acme/identifier.go::ValidateIdentifiers` +
|
||||
`internal/domain/profile.go` — read those if the profile-policy rule
|
||||
isn't obvious.
|
||||
|
||||
## Version pinning + tested clients
|
||||
|
||||
certctl's ACME server is tested against the following client versions.
|
||||
Other versions probably work; these are the ones the integration suite
|
||||
exercises end-to-end.
|
||||
|
||||
| Client | Tested version | Where it's pinned |
|
||||
|--------|----------------|-------------------|
|
||||
| cert-manager | 1.15.0 | `deploy/test/acme-integration/cert-manager-install.sh::CERT_MANAGER_VERSION` |
|
||||
| lego (RFC 8555 conformance harness) | v4.x latest | `deploy/test/acme-integration/conformance-lego.sh` (operator installs via `go install github.com/go-acme/lego/v4/cmd/lego@latest`) |
|
||||
| kind (cluster bootstrap) | v0.20+ | `deploy/test/acme-integration/kind-config.yaml` schema requirement |
|
||||
| Caddy | 2.7.x | Phase 6 walkthrough (`docs/acme-caddy-walkthrough.md`) |
|
||||
| Traefik | 3.0+ | Phase 6 walkthrough (`docs/acme-traefik-walkthrough.md`) |
|
||||
|
||||
Operators reporting issues with untested-version clients should include
|
||||
the client version + the precise wire-level error (curl-captured request
|
||||
+ response body) so we can pin a regression test if applicable.
|
||||
|
||||
## FAQ
|
||||
|
||||
### Why two auth modes? Isn't `challenge` strictly more secure?
|
||||
|
||||
`challenge` is strictly more secure for **public-trust** PKI — RFC 8555
|
||||
§8 ownership proof is the entire point of cert-manager + Let's Encrypt.
|
||||
For **internal PKI**, the threat model is different: the network itself
|
||||
is the security boundary (mTLS service mesh, firewalled VPC, identifier-
|
||||
namespace controlled by the operator). Forcing every internal cert to
|
||||
go through a solver round-trip adds operational toil with no security
|
||||
gain. `trust_authenticated` is the certctl-specific mode that
|
||||
acknowledges this — the ACME account is the proof, not the solver.
|
||||
|
||||
### How does this differ from `cert-manager → Let's Encrypt with certctl as a separate step`?
|
||||
|
||||
Two integrations vs one. With certctl as the ACME endpoint, cert-manager
|
||||
does its native flow (Certificate → Order → CSR → Secret) and certctl
|
||||
mints the cert directly, recording it under its own
|
||||
`managed_certificates` table with full audit + renewal-policy + bulk-
|
||||
revocation surface. With Let's Encrypt as the ACME endpoint, you have
|
||||
to run a separate cert-manager-uploads-to-certctl webhook OR maintain
|
||||
two parallel cert tracks. The native-ACME-server path is operationally
|
||||
simpler.
|
||||
|
||||
### Can I use ACME endpoints from outside the K8s cluster?
|
||||
|
||||
Yes. The endpoints are HTTPS over the certctl-server's listener (port
|
||||
8443 by default). Caddy on a VM, win-acme on a Windows server, or
|
||||
Posh-ACME on a Mac all integrate against
|
||||
`https://<certctl-server>:8443/acme/profile/<profile-id>/directory`.
|
||||
The TLS-trust-bootstrap requirement applies the same way — see the
|
||||
[Caddy walkthrough](./acme-caddy-walkthrough.md) for the OS-trust-store
|
||||
recipe.
|
||||
|
||||
### How do I migrate manually-issued certs to ACME-issued ones?
|
||||
|
||||
Not yet automatic. Operators migrating: keep the old `managed_certificates`
|
||||
rows; create new ones via the ACME flow; flip targets one by one. A
|
||||
dedicated bulk-migration tool is on the roadmap (post-2.1.0). Track
|
||||
via the master prompt's roadmap section in
|
||||
`cowork/acme-server-endpoint-prompt.md`.
|
||||
|
||||
### What audit-log events fire on each ACME operation?
|
||||
|
||||
Every state mutation writes an `audit_events` row. Actor strings:
|
||||
`acme:<account-id>` for kid-path requests; `acme-cert-key:<serial>`
|
||||
for jwk-path revoke; `acme-system:gc` for scheduler-driven sweeps.
|
||||
Event-name catalog:
|
||||
|
||||
| Event name | Fired by | Resource type |
|
||||
|------------|----------|---------------|
|
||||
| `acme_account_created` | new-account | `acme_account` |
|
||||
| `acme_account_contact_updated` | account update | `acme_account` |
|
||||
| `acme_account_deactivated` | account deactivate | `acme_account` |
|
||||
| `acme_account_key_rolled` | key-change | `acme_account` |
|
||||
| `acme_order_created` | new-order | `acme_order` |
|
||||
| `acme_order_finalized` | finalize | `acme_order` |
|
||||
| `acme_challenge_processing` | challenge-respond (dispatch) | `acme_challenge` |
|
||||
| `acme_challenge_completed` | validator callback | `acme_challenge` |
|
||||
| `certificate_revoked` | revoke-cert (routes through `RevocationSvc`) | `certificate` |
|
||||
|
||||
Querying by actor prefix (`actor LIKE 'acme:%'`) reconstructs the full
|
||||
history of any ACME-issued cert.
|
||||
|
||||
### Is there a threat model document?
|
||||
|
||||
Yes — [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md).
|
||||
Read before writing a security review.
|
||||
|
||||
## See also
|
||||
|
||||
- [cert-manager integration walkthrough](./acme-cert-manager-walkthrough.md)
|
||||
- [Caddy integration walkthrough](./acme-caddy-walkthrough.md)
|
||||
- [Traefik integration walkthrough](./acme-traefik-walkthrough.md)
|
||||
- [Threat model](./acme-server-threat-model.md)
|
||||
- [TLS trust bootstrap reference](./tls.md)
|
||||
- [Architecture (control-plane)](./architecture.md)
|
||||
|
||||
@@ -0,0 +1,198 @@
|
||||
# Traefik Integration Walkthrough
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Traefik 3.0+. Target audience: operator running Traefik (in
|
||||
Kubernetes or on a VM) who wants to use certctl as their ACME source
|
||||
of truth instead of Let's Encrypt.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
Step 2.
|
||||
- Traefik 3.0+ (the v2 API surface for ACME is also supported but the
|
||||
`serversTransport.rootCAs` reference below is v3-shaped).
|
||||
- The certctl bootstrap CA, in PEM form, captured the same way as the
|
||||
cert-manager walkthrough Step 3.
|
||||
|
||||
## Step 1 — Configure Traefik static config
|
||||
|
||||
Traefik's ACME issuer is a `certificatesResolver` in the static config
|
||||
(file or CLI flags or env vars). The relevant fields:
|
||||
|
||||
```yaml
|
||||
# /etc/traefik/traefik.yml (or wherever your static config lives)
|
||||
|
||||
certificatesResolvers:
|
||||
certctl:
|
||||
acme:
|
||||
caServer: https://certctl.example.com:8443/acme/profile/prof-test/directory
|
||||
email: ops@example.com
|
||||
storage: /etc/traefik/acme-certctl.json
|
||||
httpChallenge:
|
||||
entryPoint: web
|
||||
# OR for trust_authenticated mode profiles:
|
||||
# tlsChallenge: {}
|
||||
|
||||
# certctl uses a self-signed bootstrap cert; Traefik needs the CA
|
||||
# explicitly via serversTransport.rootCAs to call the directory URL.
|
||||
serversTransports:
|
||||
default:
|
||||
rootCAs:
|
||||
- /etc/traefik/certctl-bootstrap.crt
|
||||
|
||||
# Apply the serversTransport globally so every outbound HTTPS call —
|
||||
# including ACME directory + finalize — trusts the certctl CA.
|
||||
api:
|
||||
insecure: false
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: ":80"
|
||||
websecure:
|
||||
address: ":443"
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `caServer` must point at the directory URL (ending in `/directory`).
|
||||
- `httpChallenge.entryPoint: web` requires Traefik's `web` entryPoint
|
||||
(port 80) to be reachable from certctl-server's HTTP-01 validator.
|
||||
For `trust_authenticated` mode profiles, this is a no-op formality —
|
||||
certctl auto-resolves authzs, so the solver round-trip never happens.
|
||||
- `tlsChallenge: {}` is the alternative that uses TLS-ALPN-01 (RFC 8737)
|
||||
via Traefik's `websecure` (port 443) entryPoint. Either works under
|
||||
`challenge` mode; only the default-of-`tlsChallenge` is recommended
|
||||
for `trust_authenticated` mode.
|
||||
|
||||
## Step 2 — Trust the certctl bootstrap CA
|
||||
|
||||
Two options:
|
||||
|
||||
### Option A — `serversTransport.rootCAs` (preferred)
|
||||
|
||||
```
|
||||
sudo cp deploy/test/certs/ca.crt /etc/traefik/certctl-bootstrap.crt
|
||||
sudo systemctl reload traefik
|
||||
```
|
||||
|
||||
`serversTransports.default.rootCAs` (shown in Step 1 above) tells
|
||||
Traefik's outbound HTTPS client to trust the supplied PEM in addition
|
||||
to the system trust store. This is the right pattern for containerized
|
||||
Traefik where you don't want to install OS-level trust roots.
|
||||
|
||||
### Option B — OS trust store
|
||||
|
||||
For Traefik running directly on a VM, `update-ca-certificates`-style
|
||||
installation works the same way as the Caddy walkthrough Option A.
|
||||
The `serversTransport.rootCAs` field is unnecessary in that case.
|
||||
|
||||
## Step 3 — Reference the resolver from a router
|
||||
|
||||
Per-router (dynamic config):
|
||||
|
||||
```yaml
|
||||
# /etc/traefik/dynamic/example-com.yml
|
||||
|
||||
http:
|
||||
routers:
|
||||
example-com:
|
||||
rule: "Host(`example.com`)"
|
||||
entryPoints: [websecure]
|
||||
tls:
|
||||
certResolver: certctl
|
||||
service: example-com-backend
|
||||
services:
|
||||
example-com-backend:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://localhost:8080"
|
||||
```
|
||||
|
||||
Or, in Kubernetes via `IngressRoute` (Traefik CRD):
|
||||
|
||||
```yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: IngressRoute
|
||||
metadata:
|
||||
name: example-com
|
||||
spec:
|
||||
entryPoints: [websecure]
|
||||
routes:
|
||||
- match: Host(`example.com`)
|
||||
kind: Rule
|
||||
services:
|
||||
- name: example-com-backend
|
||||
port: 8080
|
||||
tls:
|
||||
certResolver: certctl
|
||||
```
|
||||
|
||||
## Step 4 — Reload Traefik
|
||||
|
||||
```
|
||||
sudo systemctl reload traefik
|
||||
# OR kubectl rollout restart deployment/traefik (if you changed the static config via ConfigMap).
|
||||
```
|
||||
|
||||
On the first request to `example.com`, Traefik hits certctl's directory
|
||||
URL, registers an account, submits a new-order, and finalizes. The cert
|
||||
is persisted to `/etc/traefik/acme-certctl.json` (or its in-cluster
|
||||
PVC equivalent).
|
||||
|
||||
## Step 5 — Verify
|
||||
|
||||
```
|
||||
curl -kvI https://example.com 2>&1 | grep -E 'subject|issuer'
|
||||
# subject: CN=example.com
|
||||
# issuer: CN=certctl test internal CA
|
||||
```
|
||||
|
||||
The cert is signed by certctl's bound issuer (per the `prof-test`
|
||||
profile's `issuer_id`).
|
||||
|
||||
On the certctl side, the audit log captures the issuance:
|
||||
|
||||
```
|
||||
psql -c "SELECT actor, action, resource_id FROM audit_events
|
||||
WHERE actor LIKE 'acme:%' ORDER BY created_at DESC LIMIT 5;"
|
||||
```
|
||||
|
||||
## Common failure modes
|
||||
|
||||
- **Traefik logs `unable to obtain ACME certificate ... x509: certificate
|
||||
signed by unknown authority`** → `serversTransport.rootCAs` is not
|
||||
pointing at the certctl bootstrap CA, OR the file was rotated and
|
||||
Traefik hasn't reloaded. Verify with
|
||||
`curl --cacert /etc/traefik/certctl-bootstrap.crt
|
||||
https://certctl.example.com:8443/acme/profile/prof-test/directory`.
|
||||
- **Traefik logs `urn:ietf:params:acme:error:rateLimited`** → tune
|
||||
`CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR` on the certctl
|
||||
side, OR reduce Traefik's parallel-cert-acquisition concurrency.
|
||||
- **`acme: error: 400 :: POST :: ... :: badNonce`** → clock skew or
|
||||
multi-replica certctl without sticky sessions; same fix as the
|
||||
cert-manager walkthrough.
|
||||
- **Storage file `acme-certctl.json` shows persistent failures** —
|
||||
Traefik retains failed-acquisition state. After fixing the
|
||||
underlying cause, delete the storage file and reload:
|
||||
`rm /etc/traefik/acme-certctl.json && systemctl reload traefik`.
|
||||
|
||||
## Cleanup
|
||||
|
||||
```
|
||||
# Remove the certResolver from any router / IngressRoute consuming it.
|
||||
sudo systemctl reload traefik
|
||||
# Delete the persisted ACME storage:
|
||||
sudo rm /etc/traefik/acme-certctl.json
|
||||
# Or in K8s: drop the resolver from the static-config ConfigMap.
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
cert-manager equivalent.
|
||||
- [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
|
||||
verify behavior pinned here against Traefik 3.0+ semantics.
|
||||
+65
-14
@@ -19,7 +19,8 @@ Connectors extend certctl to integrate with external systems for certificate iss
|
||||
- [Revocation Across Issuers](#revocation-across-issuers)
|
||||
- [EST Integration (GetCACertPEM)](#est-integration-getcacertpem)
|
||||
- [Building a Custom Issuer](#building-a-custom-issuer)
|
||||
3. [Target Connector](#target-connector)
|
||||
3. [ACME Server (Built-in)](#acme-server-built-in)
|
||||
4. [Target Connector](#target-connector)
|
||||
- [Interface](#interface-1)
|
||||
- [Built-in: NGINX](#built-in-nginx)
|
||||
- [Built-in: Apache httpd](#built-in-apache-httpd)
|
||||
@@ -34,28 +35,28 @@ Connectors extend certctl to integrate with external systems for certificate iss
|
||||
- [Windows Certificate Store](#windows-certificate-store)
|
||||
- [Java Keystore (JKS / PKCS#12)](#java-keystore-jks--pkcs12)
|
||||
- [Kubernetes Secrets](#kubernetes-secrets)
|
||||
4. [Notifier Connector](#notifier-connector)
|
||||
5. [Notifier Connector](#notifier-connector)
|
||||
- [Interface](#interface-2)
|
||||
5. [Registering a Connector](#registering-a-connector)
|
||||
6. [Registering a Connector](#registering-a-connector)
|
||||
- [IssuerConnectorAdapter](#issuerconnectoradapter)
|
||||
- [Notifier Registration](#notifier-registration)
|
||||
6. [Testing Connectors](#testing-connectors)
|
||||
7. [Testing Connectors](#testing-connectors)
|
||||
- [Unit Tests](#unit-tests)
|
||||
- [Integration Tests](#integration-tests)
|
||||
7. [Best Practices](#best-practices)
|
||||
8. [Agent Discovery Scanner](#agent-discovery-scanner)
|
||||
8. [Best Practices](#best-practices)
|
||||
9. [Agent Discovery Scanner](#agent-discovery-scanner)
|
||||
- [Configuration](#configuration)
|
||||
- [How It Works](#how-it-works)
|
||||
- [API Endpoints](#api-endpoints)
|
||||
- [Use Cases](#use-cases)
|
||||
9. [Network Certificate Scanner (M21)](#network-certificate-scanner-m21)
|
||||
- [Configuration](#configuration-1)
|
||||
- [Creating Scan Targets](#creating-scan-targets)
|
||||
- [How It Works](#how-it-works-1)
|
||||
- [API Endpoints](#api-endpoints-1)
|
||||
- [Scheduler Integration](#scheduler-integration)
|
||||
- [Use Cases](#use-cases-1)
|
||||
10. [What's Next](#whats-next)
|
||||
10. [Network Certificate Scanner (M21)](#network-certificate-scanner-m21)
|
||||
- [Configuration](#configuration-1)
|
||||
- [Creating Scan Targets](#creating-scan-targets)
|
||||
- [How It Works](#how-it-works-1)
|
||||
- [API Endpoints](#api-endpoints-1)
|
||||
- [Scheduler Integration](#scheduler-integration)
|
||||
- [Use Cases](#use-cases-1)
|
||||
11. [What's Next](#whats-next)
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -712,6 +713,56 @@ func (v *VaultIssuer) IssueCertificate(ctx context.Context, req issuer.IssuanceR
|
||||
// ... implement RenewCertificate, RevokeCertificate, GetOrderStatus
|
||||
```
|
||||
|
||||
## ACME Server (Built-in)
|
||||
|
||||
certctl ships a built-in RFC 8555 + RFC 9773 ARI ACME **server**
|
||||
endpoint at `/acme/profile/<profile-id>/*`. Any RFC 8555 client
|
||||
(cert-manager 1.15+, Caddy, Traefik, win-acme, certbot, Posh-ACME)
|
||||
integrates with certctl as an ACME issuer with no certctl-side
|
||||
modification — closing the "deploy a certctl agent on every K8s node"
|
||||
friction that costs deals to external PKI vendors.
|
||||
|
||||
This is **distinct** from the [ACME consumer
|
||||
connector](#built-in-acme-v2-lets-encrypt-sectigo-zerossl) above. The
|
||||
consumer side is `certctl → external CA over ACME`; the server side
|
||||
is `external client → certctl over ACME`. Operators deploying both
|
||||
should namespace env vars carefully: consumer uses `CERTCTL_ACME_*`
|
||||
(`DIRECTORY_URL`, `EMAIL`, `CHALLENGE_TYPE`); server uses
|
||||
`CERTCTL_ACME_SERVER_*` (`ENABLED`, `DEFAULT_PROFILE_ID`, `NONCE_TTL`,
|
||||
…).
|
||||
|
||||
Two auth modes per profile (`certificate_profiles.acme_auth_mode`):
|
||||
|
||||
- **`trust_authenticated`** (default for internal PKI). The JWS-
|
||||
authenticated ACME account is trusted to issue for any identifier
|
||||
the profile policy permits; no out-of-band ownership proof. The
|
||||
most common certctl use case — internal-PKI fleets where the
|
||||
network itself is the trust boundary.
|
||||
- **`challenge`**. Full HTTP-01 + DNS-01 + TLS-ALPN-01 validation per
|
||||
RFC 8555 §8 + RFC 8737. Required for public-trust-style PKI where
|
||||
account-key compromise must not cost issuance authority.
|
||||
|
||||
Routes through `service.CertificateService.Create` so policy + audit
|
||||
+ metrics + bulk-revocation + cloud-discovery all apply uniformly to
|
||||
ACME-issued certs (just as they do to API-issued, agent-issued, EST-
|
||||
issued, SCEP-issued certs).
|
||||
|
||||
See:
|
||||
|
||||
- [ACME Server Reference](./acme-server.md) — env-var reference,
|
||||
endpoints, auth-mode decision tree, RFC 8555 conformance statement,
|
||||
troubleshooting, FAQ.
|
||||
- [cert-manager Walkthrough](./acme-cert-manager-walkthrough.md) — kind
|
||||
→ cert-manager → certctl-server → Certificate flow.
|
||||
- [Caddy Walkthrough](./acme-caddy-walkthrough.md) — Caddyfile `acme_ca`
|
||||
+ trust configuration.
|
||||
- [Traefik Walkthrough](./acme-traefik-walkthrough.md) — `certificatesResolvers`
|
||||
+ `serversTransport.rootCAs`.
|
||||
- [Threat Model](./acme-server-threat-model.md) — JWS forgery
|
||||
resistance, nonce store integrity, HTTP-01 SSRF, DNS-01 cache
|
||||
posture, TLS-ALPN-01 chain-not-validated rationale, rate-limit
|
||||
tuning, audit trail.
|
||||
|
||||
## Target Connector
|
||||
|
||||
Target connectors deploy certificates to infrastructure systems. They run on agents, not on the control plane.
|
||||
|
||||
Reference in New Issue
Block a user