mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:21:30 +00:00
39f065dda4
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.
What ships:
- docs/acme-server.md final pass: Auth-mode decision tree (when to
use trust_authenticated vs challenge), RFC 8555 + RFC 9773
conformance statement (section-by-section table of implemented
plus procurement-honest 'not implemented' rows for EAB / multi-
level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
(5 failure modes — badNonce / unknownAuthority / HTTP-01
connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
canonical fix for each), Version pinning + tested clients table
(cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
against-LE, can-I-use-from-outside-K8s, migration story, audit-
log catalog), See-also cross-link block.
- docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
certctl → Certificate flow, with YAML blocks byte-equal to
deploy/test/acme-integration/{clusterissuer-trust-authenticated,
certificate-test}.yaml to prevent doc/test drift.
- docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
options (OS trust store + Caddy pki.ca block).
- docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
.caServer + serversTransport.rootCAs configuration.
- docs/acme-server-threat-model.md: Threat surface map + JWS forgery
resistance (alg-confusion / HS256 substitution / replayed nonce /
URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
(pre-dial check + per-dial check + per-redirect check + body cap +
bounded redirects), DNS-01 cache-poisoning posture (default Google
Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
tuning, Audit trail catalog, Out-of-scope threats list.
- docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
top-level 'ACME Server (Built-in)' section between Issuer Connector
and Target Connector — distinguishes the consumer-side ACME
(existing) from the new server-side ACME via env-var-prefix
call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).
DoD verification:
- All 5 docs files exist with the structure prescribed by the
Phase 6 prompt.
- Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
to an actual lookup in internal/config/config.go (verified by
'grep -oE | sort -u | diff' returning empty).
- Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
byte-equal to the corresponding file in deploy/test/acme-integration/
(verified with 'diff' against awk-extracted YAML blocks).
- docs/connectors.md has the cross-link subsection with all 4 new
docs referenced.
- cowork/CLAUDE.md Architecture Decisions has the new ACME-server
bullet documenting per-profile URL family + per-profile
acme_auth_mode + Phase 4-5-6 progression.
- cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
the ACME-Server rollup spanning Phases 1a-6.
- cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
- 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.
Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
199 lines
6.3 KiB
Markdown
199 lines
6.3 KiB
Markdown
# Traefik Integration Walkthrough
|
|
|
|
End-to-end recipe for issuing certs from a certctl-server deployment
|
|
through Traefik 3.0+. Target audience: operator running Traefik (in
|
|
Kubernetes or on a VM) who wants to use certctl as their ACME source
|
|
of truth instead of Let's Encrypt.
|
|
|
|
## Prereqs
|
|
|
|
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
|
and at least one profile whose `acme_auth_mode` is set. Profile
|
|
setup is identical to the cert-manager walkthrough — see
|
|
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
|
Step 2.
|
|
- Traefik 3.0+ (the v2 API surface for ACME is also supported but the
|
|
`serversTransport.rootCAs` reference below is v3-shaped).
|
|
- The certctl bootstrap CA, in PEM form, captured the same way as the
|
|
cert-manager walkthrough Step 3.
|
|
|
|
## Step 1 — Configure Traefik static config
|
|
|
|
Traefik's ACME issuer is a `certificatesResolver` in the static config
|
|
(file or CLI flags or env vars). The relevant fields:
|
|
|
|
```yaml
|
|
# /etc/traefik/traefik.yml (or wherever your static config lives)
|
|
|
|
certificatesResolvers:
|
|
certctl:
|
|
acme:
|
|
caServer: https://certctl.example.com:8443/acme/profile/prof-test/directory
|
|
email: ops@example.com
|
|
storage: /etc/traefik/acme-certctl.json
|
|
httpChallenge:
|
|
entryPoint: web
|
|
# OR for trust_authenticated mode profiles:
|
|
# tlsChallenge: {}
|
|
|
|
# certctl uses a self-signed bootstrap cert; Traefik needs the CA
|
|
# explicitly via serversTransport.rootCAs to call the directory URL.
|
|
serversTransports:
|
|
default:
|
|
rootCAs:
|
|
- /etc/traefik/certctl-bootstrap.crt
|
|
|
|
# Apply the serversTransport globally so every outbound HTTPS call —
|
|
# including ACME directory + finalize — trusts the certctl CA.
|
|
api:
|
|
insecure: false
|
|
|
|
entryPoints:
|
|
web:
|
|
address: ":80"
|
|
websecure:
|
|
address: ":443"
|
|
```
|
|
|
|
Notes:
|
|
|
|
- `caServer` must point at the directory URL (ending in `/directory`).
|
|
- `httpChallenge.entryPoint: web` requires Traefik's `web` entryPoint
|
|
(port 80) to be reachable from certctl-server's HTTP-01 validator.
|
|
For `trust_authenticated` mode profiles, this is a no-op formality —
|
|
certctl auto-resolves authzs, so the solver round-trip never happens.
|
|
- `tlsChallenge: {}` is the alternative that uses TLS-ALPN-01 (RFC 8737)
|
|
via Traefik's `websecure` (port 443) entryPoint. Either works under
|
|
`challenge` mode; only the default-of-`tlsChallenge` is recommended
|
|
for `trust_authenticated` mode.
|
|
|
|
## Step 2 — Trust the certctl bootstrap CA
|
|
|
|
Two options:
|
|
|
|
### Option A — `serversTransport.rootCAs` (preferred)
|
|
|
|
```
|
|
sudo cp deploy/test/certs/ca.crt /etc/traefik/certctl-bootstrap.crt
|
|
sudo systemctl reload traefik
|
|
```
|
|
|
|
`serversTransports.default.rootCAs` (shown in Step 1 above) tells
|
|
Traefik's outbound HTTPS client to trust the supplied PEM in addition
|
|
to the system trust store. This is the right pattern for containerized
|
|
Traefik where you don't want to install OS-level trust roots.
|
|
|
|
### Option B — OS trust store
|
|
|
|
For Traefik running directly on a VM, `update-ca-certificates`-style
|
|
installation works the same way as the Caddy walkthrough Option A.
|
|
The `serversTransport.rootCAs` field is unnecessary in that case.
|
|
|
|
## Step 3 — Reference the resolver from a router
|
|
|
|
Per-router (dynamic config):
|
|
|
|
```yaml
|
|
# /etc/traefik/dynamic/example-com.yml
|
|
|
|
http:
|
|
routers:
|
|
example-com:
|
|
rule: "Host(`example.com`)"
|
|
entryPoints: [websecure]
|
|
tls:
|
|
certResolver: certctl
|
|
service: example-com-backend
|
|
services:
|
|
example-com-backend:
|
|
loadBalancer:
|
|
servers:
|
|
- url: "http://localhost:8080"
|
|
```
|
|
|
|
Or, in Kubernetes via `IngressRoute` (Traefik CRD):
|
|
|
|
```yaml
|
|
apiVersion: traefik.io/v1alpha1
|
|
kind: IngressRoute
|
|
metadata:
|
|
name: example-com
|
|
spec:
|
|
entryPoints: [websecure]
|
|
routes:
|
|
- match: Host(`example.com`)
|
|
kind: Rule
|
|
services:
|
|
- name: example-com-backend
|
|
port: 8080
|
|
tls:
|
|
certResolver: certctl
|
|
```
|
|
|
|
## Step 4 — Reload Traefik
|
|
|
|
```
|
|
sudo systemctl reload traefik
|
|
# OR kubectl rollout restart deployment/traefik (if you changed the static config via ConfigMap).
|
|
```
|
|
|
|
On the first request to `example.com`, Traefik hits certctl's directory
|
|
URL, registers an account, submits a new-order, and finalizes. The cert
|
|
is persisted to `/etc/traefik/acme-certctl.json` (or its in-cluster
|
|
PVC equivalent).
|
|
|
|
## Step 5 — Verify
|
|
|
|
```
|
|
curl -kvI https://example.com 2>&1 | grep -E 'subject|issuer'
|
|
# subject: CN=example.com
|
|
# issuer: CN=certctl test internal CA
|
|
```
|
|
|
|
The cert is signed by certctl's bound issuer (per the `prof-test`
|
|
profile's `issuer_id`).
|
|
|
|
On the certctl side, the audit log captures the issuance:
|
|
|
|
```
|
|
psql -c "SELECT actor, action, resource_id FROM audit_events
|
|
WHERE actor LIKE 'acme:%' ORDER BY created_at DESC LIMIT 5;"
|
|
```
|
|
|
|
## Common failure modes
|
|
|
|
- **Traefik logs `unable to obtain ACME certificate ... x509: certificate
|
|
signed by unknown authority`** → `serversTransport.rootCAs` is not
|
|
pointing at the certctl bootstrap CA, OR the file was rotated and
|
|
Traefik hasn't reloaded. Verify with
|
|
`curl --cacert /etc/traefik/certctl-bootstrap.crt
|
|
https://certctl.example.com:8443/acme/profile/prof-test/directory`.
|
|
- **Traefik logs `urn:ietf:params:acme:error:rateLimited`** → tune
|
|
`CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR` on the certctl
|
|
side, OR reduce Traefik's parallel-cert-acquisition concurrency.
|
|
- **`acme: error: 400 :: POST :: ... :: badNonce`** → clock skew or
|
|
multi-replica certctl without sticky sessions; same fix as the
|
|
cert-manager walkthrough.
|
|
- **Storage file `acme-certctl.json` shows persistent failures** —
|
|
Traefik retains failed-acquisition state. After fixing the
|
|
underlying cause, delete the storage file and reload:
|
|
`rm /etc/traefik/acme-certctl.json && systemctl reload traefik`.
|
|
|
|
## Cleanup
|
|
|
|
```
|
|
# Remove the certResolver from any router / IngressRoute consuming it.
|
|
sudo systemctl reload traefik
|
|
# Delete the persisted ACME storage:
|
|
sudo rm /etc/traefik/acme-certctl.json
|
|
# Or in K8s: drop the resolver from the static-config ConfigMap.
|
|
```
|
|
|
|
## See also
|
|
|
|
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
|
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
|
cert-manager equivalent.
|
|
- [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
|
|
verify behavior pinned here against Traefik 3.0+ semantics.
|