Files
certctl/docs/acme-traefik-walkthrough.md
T
shankar0123 39f065dda4 docs(acme-server): operator-facing reference + threat model + cert-manager walkthrough (Phase 6/7)
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.

What ships:
  - docs/acme-server.md final pass: Auth-mode decision tree (when to
    use trust_authenticated vs challenge), RFC 8555 + RFC 9773
    conformance statement (section-by-section table of implemented
    plus procurement-honest 'not implemented' rows for EAB / multi-
    level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
    (5 failure modes — badNonce / unknownAuthority / HTTP-01
    connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
    canonical fix for each), Version pinning + tested clients table
    (cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
    3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
    against-LE, can-I-use-from-outside-K8s, migration story, audit-
    log catalog), See-also cross-link block.
  - docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
    certctl → Certificate flow, with YAML blocks byte-equal to
    deploy/test/acme-integration/{clusterissuer-trust-authenticated,
    certificate-test}.yaml to prevent doc/test drift.
  - docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
    options (OS trust store + Caddy pki.ca block).
  - docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
    .caServer + serversTransport.rootCAs configuration.
  - docs/acme-server-threat-model.md: Threat surface map + JWS forgery
    resistance (alg-confusion / HS256 substitution / replayed nonce /
    URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
    Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
    (pre-dial check + per-dial check + per-redirect check + body cap +
    bounded redirects), DNS-01 cache-poisoning posture (default Google
    Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
    chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
    tuning, Audit trail catalog, Out-of-scope threats list.
  - docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
    top-level 'ACME Server (Built-in)' section between Issuer Connector
    and Target Connector — distinguishes the consumer-side ACME
    (existing) from the new server-side ACME via env-var-prefix
    call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).

DoD verification:
  - All 5 docs files exist with the structure prescribed by the
    Phase 6 prompt.
  - Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
    to an actual lookup in internal/config/config.go (verified by
    'grep -oE | sort -u | diff' returning empty).
  - Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
    byte-equal to the corresponding file in deploy/test/acme-integration/
    (verified with 'diff' against awk-extracted YAML blocks).
  - docs/connectors.md has the cross-link subsection with all 4 new
    docs referenced.
  - cowork/CLAUDE.md Architecture Decisions has the new ACME-server
    bullet documenting per-profile URL family + per-profile
    acme_auth_mode + Phase 4-5-6 progression.
  - cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
    the ACME-Server rollup spanning Phases 1a-6.
  - cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
  - 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.

Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.

Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
2026-05-03 19:58:15 +00:00

199 lines
6.3 KiB
Markdown

# Traefik Integration Walkthrough
End-to-end recipe for issuing certs from a certctl-server deployment
through Traefik 3.0+. Target audience: operator running Traefik (in
Kubernetes or on a VM) who wants to use certctl as their ACME source
of truth instead of Let's Encrypt.
## Prereqs
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
and at least one profile whose `acme_auth_mode` is set. Profile
setup is identical to the cert-manager walkthrough — see
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
Step 2.
- Traefik 3.0+ (the v2 API surface for ACME is also supported but the
`serversTransport.rootCAs` reference below is v3-shaped).
- The certctl bootstrap CA, in PEM form, captured the same way as the
cert-manager walkthrough Step 3.
## Step 1 — Configure Traefik static config
Traefik's ACME issuer is a `certificatesResolver` in the static config
(file or CLI flags or env vars). The relevant fields:
```yaml
# /etc/traefik/traefik.yml (or wherever your static config lives)
certificatesResolvers:
certctl:
acme:
caServer: https://certctl.example.com:8443/acme/profile/prof-test/directory
email: ops@example.com
storage: /etc/traefik/acme-certctl.json
httpChallenge:
entryPoint: web
# OR for trust_authenticated mode profiles:
# tlsChallenge: {}
# certctl uses a self-signed bootstrap cert; Traefik needs the CA
# explicitly via serversTransport.rootCAs to call the directory URL.
serversTransports:
default:
rootCAs:
- /etc/traefik/certctl-bootstrap.crt
# Apply the serversTransport globally so every outbound HTTPS call —
# including ACME directory + finalize — trusts the certctl CA.
api:
insecure: false
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
```
Notes:
- `caServer` must point at the directory URL (ending in `/directory`).
- `httpChallenge.entryPoint: web` requires Traefik's `web` entryPoint
(port 80) to be reachable from certctl-server's HTTP-01 validator.
For `trust_authenticated` mode profiles, this is a no-op formality —
certctl auto-resolves authzs, so the solver round-trip never happens.
- `tlsChallenge: {}` is the alternative that uses TLS-ALPN-01 (RFC 8737)
via Traefik's `websecure` (port 443) entryPoint. Either works under
`challenge` mode; only the default-of-`tlsChallenge` is recommended
for `trust_authenticated` mode.
## Step 2 — Trust the certctl bootstrap CA
Two options:
### Option A — `serversTransport.rootCAs` (preferred)
```
sudo cp deploy/test/certs/ca.crt /etc/traefik/certctl-bootstrap.crt
sudo systemctl reload traefik
```
`serversTransports.default.rootCAs` (shown in Step 1 above) tells
Traefik's outbound HTTPS client to trust the supplied PEM in addition
to the system trust store. This is the right pattern for containerized
Traefik where you don't want to install OS-level trust roots.
### Option B — OS trust store
For Traefik running directly on a VM, `update-ca-certificates`-style
installation works the same way as the Caddy walkthrough Option A.
The `serversTransport.rootCAs` field is unnecessary in that case.
## Step 3 — Reference the resolver from a router
Per-router (dynamic config):
```yaml
# /etc/traefik/dynamic/example-com.yml
http:
routers:
example-com:
rule: "Host(`example.com`)"
entryPoints: [websecure]
tls:
certResolver: certctl
service: example-com-backend
services:
example-com-backend:
loadBalancer:
servers:
- url: "http://localhost:8080"
```
Or, in Kubernetes via `IngressRoute` (Traefik CRD):
```yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: example-com
spec:
entryPoints: [websecure]
routes:
- match: Host(`example.com`)
kind: Rule
services:
- name: example-com-backend
port: 8080
tls:
certResolver: certctl
```
## Step 4 — Reload Traefik
```
sudo systemctl reload traefik
# OR kubectl rollout restart deployment/traefik (if you changed the static config via ConfigMap).
```
On the first request to `example.com`, Traefik hits certctl's directory
URL, registers an account, submits a new-order, and finalizes. The cert
is persisted to `/etc/traefik/acme-certctl.json` (or its in-cluster
PVC equivalent).
## Step 5 — Verify
```
curl -kvI https://example.com 2>&1 | grep -E 'subject|issuer'
# subject: CN=example.com
# issuer: CN=certctl test internal CA
```
The cert is signed by certctl's bound issuer (per the `prof-test`
profile's `issuer_id`).
On the certctl side, the audit log captures the issuance:
```
psql -c "SELECT actor, action, resource_id FROM audit_events
WHERE actor LIKE 'acme:%' ORDER BY created_at DESC LIMIT 5;"
```
## Common failure modes
- **Traefik logs `unable to obtain ACME certificate ... x509: certificate
signed by unknown authority`** → `serversTransport.rootCAs` is not
pointing at the certctl bootstrap CA, OR the file was rotated and
Traefik hasn't reloaded. Verify with
`curl --cacert /etc/traefik/certctl-bootstrap.crt
https://certctl.example.com:8443/acme/profile/prof-test/directory`.
- **Traefik logs `urn:ietf:params:acme:error:rateLimited`** → tune
`CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR` on the certctl
side, OR reduce Traefik's parallel-cert-acquisition concurrency.
- **`acme: error: 400 :: POST :: ... :: badNonce`** → clock skew or
multi-replica certctl without sticky sessions; same fix as the
cert-manager walkthrough.
- **Storage file `acme-certctl.json` shows persistent failures** —
Traefik retains failed-acquisition state. After fixing the
underlying cause, delete the storage file and reload:
`rm /etc/traefik/acme-certctl.json && systemctl reload traefik`.
## Cleanup
```
# Remove the certResolver from any router / IngressRoute consuming it.
sudo systemctl reload traefik
# Delete the persisted ACME storage:
sudo rm /etc/traefik/acme-certctl.json
# Or in K8s: drop the resolver from the static-config ConfigMap.
```
## See also
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
cert-manager equivalent.
- [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
verify behavior pinned here against Traefik 3.0+ semantics.