mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:01:30 +00:00
39f065dda4
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.
What ships:
- docs/acme-server.md final pass: Auth-mode decision tree (when to
use trust_authenticated vs challenge), RFC 8555 + RFC 9773
conformance statement (section-by-section table of implemented
plus procurement-honest 'not implemented' rows for EAB / multi-
level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
(5 failure modes — badNonce / unknownAuthority / HTTP-01
connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
canonical fix for each), Version pinning + tested clients table
(cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
against-LE, can-I-use-from-outside-K8s, migration story, audit-
log catalog), See-also cross-link block.
- docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
certctl → Certificate flow, with YAML blocks byte-equal to
deploy/test/acme-integration/{clusterissuer-trust-authenticated,
certificate-test}.yaml to prevent doc/test drift.
- docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
options (OS trust store + Caddy pki.ca block).
- docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
.caServer + serversTransport.rootCAs configuration.
- docs/acme-server-threat-model.md: Threat surface map + JWS forgery
resistance (alg-confusion / HS256 substitution / replayed nonce /
URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
(pre-dial check + per-dial check + per-redirect check + body cap +
bounded redirects), DNS-01 cache-poisoning posture (default Google
Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
tuning, Audit trail catalog, Out-of-scope threats list.
- docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
top-level 'ACME Server (Built-in)' section between Issuer Connector
and Target Connector — distinguishes the consumer-side ACME
(existing) from the new server-side ACME via env-var-prefix
call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).
DoD verification:
- All 5 docs files exist with the structure prescribed by the
Phase 6 prompt.
- Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
to an actual lookup in internal/config/config.go (verified by
'grep -oE | sort -u | diff' returning empty).
- Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
byte-equal to the corresponding file in deploy/test/acme-integration/
(verified with 'diff' against awk-extracted YAML blocks).
- docs/connectors.md has the cross-link subsection with all 4 new
docs referenced.
- cowork/CLAUDE.md Architecture Decisions has the new ACME-server
bullet documenting per-profile URL family + per-profile
acme_auth_mode + Phase 4-5-6 progression.
- cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
the ACME-Server rollup spanning Phases 1a-6.
- cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
- 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.
Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
255 lines
9.9 KiB
Markdown
255 lines
9.9 KiB
Markdown
# cert-manager Integration Walkthrough
|
|
|
|
End-to-end recipe for issuing certs from a certctl-server deployment
|
|
through cert-manager 1.15+. Target audience: Kubernetes operator who
|
|
has never deployed certctl before and wants a working
|
|
`Certificate` → `Secret` flow on their cluster in under 30 minutes.
|
|
|
|
The Phase 5 integration test (`make acme-cert-manager-test`) automates
|
|
exactly the recipe below. The YAML snippets in this doc are byte-equal
|
|
to the files under `deploy/test/acme-integration/` — re-running the
|
|
test from a fresh clone produces the same results documented here.
|
|
|
|
## Prereqs
|
|
|
|
- A Kubernetes cluster (kind / k3d / EKS / GKE / AKS / on-prem). For
|
|
local trial, `kind v0.20+` works exactly the way the Phase 5 test
|
|
uses it. The kind config lives at
|
|
[`deploy/test/acme-integration/kind-config.yaml`](../deploy/test/acme-integration/kind-config.yaml).
|
|
- `kubectl` v1.27+, `helm` v3.13+.
|
|
- `cert-manager` v1.15.0 installed in the `cert-manager` namespace.
|
|
If absent, run:
|
|
|
|
```
|
|
bash deploy/test/acme-integration/cert-manager-install.sh
|
|
```
|
|
|
|
which is the same idempotent installer the integration test uses.
|
|
- A certctl Helm chart published to a registry your cluster can pull
|
|
from. The Phase 5 test uses an `image.tag=test` placeholder; production
|
|
deployments use the actual image tag for your release line.
|
|
|
|
## Step 1 — Deploy certctl-server
|
|
|
|
```
|
|
helm install certctl-test deploy/helm/certctl/ \
|
|
--set acmeServer.enabled=true \
|
|
--set acmeServer.defaultProfileId=prof-test \
|
|
--set image.tag=test
|
|
kubectl wait --for=condition=Available --timeout=3m deployment/certctl-test
|
|
```
|
|
|
|
`acmeServer.enabled=true` flips the `CERTCTL_ACME_SERVER_ENABLED`
|
|
env var which gates the ACME route registration.
|
|
`acmeServer.defaultProfileId` sets `CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID`
|
|
so the `/acme/*` shorthand path mirrors the per-profile path family.
|
|
|
|
## Step 2 — Create the certctl profile
|
|
|
|
The ACME server requires a `certificate_profiles` row to bind issuance
|
|
to. Create one via the certctl API or GUI; for the simplest case set
|
|
`acme_auth_mode='trust_authenticated'`:
|
|
|
|
```
|
|
curl -X POST https://certctl-test.default.svc.cluster.local:8443/api/profiles \
|
|
-H 'Content-Type: application/json' \
|
|
-H "Authorization: Bearer $CERTCTL_API_KEY" \
|
|
-d '{
|
|
"id": "prof-test",
|
|
"name": "ACME test profile",
|
|
"issuer_id": "iss-internal-ca",
|
|
"max_ttl_seconds": 7776000,
|
|
"acme_auth_mode": "trust_authenticated"
|
|
}'
|
|
```
|
|
|
|
Auth-mode tradeoffs are covered in
|
|
[`docs/acme-server.md` § Auth-mode decision tree](./acme-server.md#auth-mode-decision-tree).
|
|
For first-time deployments, `trust_authenticated` is the right default.
|
|
|
|
## Step 3 — Capture the certctl bootstrap CA
|
|
|
|
cert-manager validates the certctl-server's TLS chain before sending
|
|
any account / order / finalize JWS. With certctl's self-signed
|
|
bootstrap cert (the demo default at `deploy/test/certs/server.crt`),
|
|
cert-manager rejects the directory URL with
|
|
`x509: certificate signed by unknown authority` unless you feed the
|
|
bootstrap CA in.
|
|
|
|
```
|
|
cat deploy/test/certs/ca.crt | base64 -w0
|
|
```
|
|
|
|
Capture the output for Step 4. This is **the** single biggest first-
|
|
time-deploy footgun on the cert-manager integration path. The reference
|
|
recipe lives in
|
|
[`docs/acme-server.md` § TLS trust bootstrap](./acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
|
|
|
## Step 4 — Apply the ClusterIssuer
|
|
|
|
```yaml
|
|
# Phase 5 — sample ClusterIssuer for the certctl trust_authenticated
|
|
# auth mode (RFC 8555 §6 + certctl auth_mode=trust_authenticated, where
|
|
# the JWS-authenticated ACME account is trusted to issue any identifier
|
|
# the profile policy permits — no per-identifier ownership challenges).
|
|
#
|
|
# Use this as the starting template for any internal-PKI rollout.
|
|
# Replace the caBundle placeholder with the base64-encoded PEM of the
|
|
# certctl-server's self-signed bootstrap root, then `kubectl apply`.
|
|
#
|
|
# Generate the caBundle via:
|
|
# cat deploy/test/certs/ca.crt | base64 -w0
|
|
# (See certctl/docs/acme-server.md "TLS trust bootstrap" section for the
|
|
# end-to-end walkthrough — this is the single biggest first-time-deploy
|
|
# footgun on cert-manager, captured as audit fix #9.)
|
|
apiVersion: cert-manager.io/v1
|
|
kind: ClusterIssuer
|
|
metadata:
|
|
name: certctl-test-trust
|
|
spec:
|
|
acme:
|
|
email: test@example.com
|
|
# Replace 'certctl-test' with your release name + adjust the
|
|
# profile path segment. Default profile path:
|
|
# https://<service>.<namespace>.svc.cluster.local:8443/acme/profile/<profile-id>/directory
|
|
server: https://certctl-test.default.svc.cluster.local:8443/acme/profile/prof-test/directory
|
|
# caBundle: Audit fix #9. cert-manager validates the ACME server's
|
|
# TLS chain before submitting any account/order/finalize. With a
|
|
# self-signed bootstrap root, the ClusterIssuer MUST carry the root
|
|
# explicitly via this field.
|
|
caBundle: |
|
|
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCi4uLgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
|
|
privateKeySecretRef:
|
|
name: certctl-test-trust-account-key
|
|
solvers:
|
|
# In trust_authenticated mode the solver is unused at the
|
|
# validation step but cert-manager still requires at least one
|
|
# solver in the spec. http01-via-ingress-nginx is the cheapest
|
|
# placeholder shape that round-trips correctly through cert-
|
|
# manager's validation webhooks.
|
|
- http01:
|
|
ingress:
|
|
class: nginx
|
|
```
|
|
|
|
This block is byte-equal to
|
|
[`deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml`](../deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml).
|
|
Replace the `caBundle` placeholder with the base64 string from Step 3.
|
|
The full reference YAML lives at
|
|
[`deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml`](../deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml).
|
|
|
|
```
|
|
kubectl apply -f deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml
|
|
kubectl wait --for=condition=Ready --timeout=2m clusterissuer/certctl-test-trust
|
|
```
|
|
|
|
The solver block is a placeholder under `trust_authenticated` mode —
|
|
cert-manager 1.15 still requires at least one solver in the spec, but
|
|
certctl auto-resolves authzs without a solver round-trip. The
|
|
http01-ingress-nginx shape validates against cert-manager's webhook
|
|
without needing an actual ingress controller deployed.
|
|
|
|
For `challenge` mode profiles, swap to
|
|
[`deploy/test/acme-integration/clusterissuer-challenge.yaml`](../deploy/test/acme-integration/clusterissuer-challenge.yaml)
|
|
— same shape, but the solver is now load-bearing and you need
|
|
ingress-nginx (or your chosen ingress class) actually deployed for
|
|
HTTP-01 to work.
|
|
|
|
## Step 5 — Apply the Certificate
|
|
|
|
```yaml
|
|
# Phase 5 — Certificate resource the integration test applies and
|
|
# waits for. The certctl-test-trust ClusterIssuer (trust_authenticated
|
|
# mode) issues the cert without any solver round-trip; the resulting
|
|
# Secret 'test-com-tls' is asserted to carry tls.crt + tls.key.
|
|
apiVersion: cert-manager.io/v1
|
|
kind: Certificate
|
|
metadata:
|
|
name: test-com
|
|
namespace: default
|
|
spec:
|
|
secretName: test-com-tls
|
|
commonName: test.example.com
|
|
dnsNames:
|
|
- test.example.com
|
|
- www.test.example.com
|
|
issuerRef:
|
|
name: certctl-test-trust
|
|
kind: ClusterIssuer
|
|
duration: 720h # 30d
|
|
renewBefore: 240h # 10d
|
|
```
|
|
|
|
This block is byte-equal to
|
|
[`deploy/test/acme-integration/certificate-test.yaml`](../deploy/test/acme-integration/certificate-test.yaml).
|
|
|
|
```
|
|
kubectl apply -f deploy/test/acme-integration/certificate-test.yaml
|
|
kubectl wait --for=condition=Ready --timeout=3m certificate/test-com
|
|
```
|
|
|
|
cert-manager creates an `Order`, the ACME flow runs against certctl,
|
|
and the resulting Secret is populated.
|
|
|
|
## Step 6 — Verify
|
|
|
|
```
|
|
kubectl get certificate test-com -o wide
|
|
# NAME READY SECRET ISSUER STATUS AGE
|
|
# test-com True test-com-tls certctl-test-trust Certificate is up to date and has not expired 42s
|
|
|
|
kubectl get secret test-com-tls -o yaml | yq '.data."tls.crt"' | base64 -d | openssl x509 -noout -subject -issuer -dates
|
|
# subject= CN=test.example.com
|
|
# issuer= CN=certctl test internal CA
|
|
# notBefore=... notAfter=...
|
|
```
|
|
|
|
Both the cert-manager `Certificate` resource and the underlying Secret
|
|
are populated. The actor on the certctl side is `acme:<account-id>`,
|
|
which you can correlate via the `audit_events` table:
|
|
|
|
```
|
|
psql -c "SELECT created_at, action, resource_type, resource_id
|
|
FROM audit_events
|
|
WHERE actor LIKE 'acme:%'
|
|
ORDER BY created_at DESC LIMIT 10;"
|
|
```
|
|
|
|
## Common failure modes
|
|
|
|
These are operator-side; full troubleshooting reference is in
|
|
[`docs/acme-server.md` § Troubleshooting](./acme-server.md#troubleshooting).
|
|
|
|
- `400 Bad Request: badNonce` → clock skew between certctl-server and
|
|
cert-manager, or a multi-replica certctl fleet without sticky
|
|
sessions.
|
|
- `x509: certificate signed by unknown authority` → missing or stale
|
|
`caBundle`. Re-run Step 3, paste the fresh value.
|
|
- `connection refused` from the HTTP-01 validator → ingress controller
|
|
not deployed, OR your network blocks port 80 inbound to the solver
|
|
Ingress.
|
|
- `Ready=False` with `rejectedIdentifier` → CSR has a SAN your profile
|
|
policy doesn't permit. Decode the `subproblems` array of the RFC
|
|
7807 problem doc.
|
|
|
|
## Cleanup
|
|
|
|
```
|
|
kubectl delete -f deploy/test/acme-integration/certificate-test.yaml
|
|
kubectl delete -f deploy/test/acme-integration/clusterissuer-trust-authenticated.yaml
|
|
helm uninstall certctl-test
|
|
# Optional: delete the certctl profile via API.
|
|
```
|
|
|
|
## See also
|
|
|
|
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
|
- [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md) —
|
|
security posture.
|
|
- [`docs/acme-caddy-walkthrough.md`](./acme-caddy-walkthrough.md) —
|
|
Caddy-side recipe.
|
|
- [`docs/acme-traefik-walkthrough.md`](./acme-traefik-walkthrough.md) —
|
|
Traefik-side recipe.
|
|
- [`deploy/test/acme-integration/`](../deploy/test/acme-integration/) —
|
|
Phase 5 integration test (the same recipe, automated).
|