37 KiB
certctl ACME Server (Built-in)
Last reviewed: 2026-05-05
certctl ships an RFC 8555 + RFC 9773 ARI ACME server endpoint at
/acme/profile/<profile-id>/*. Any RFC 8555 client (cert-manager 1.15+,
Caddy, Traefik, win-acme, certbot, Posh-ACME) can integrate with certctl
as an ACME issuer with no certctl-side modification — closing the
"deploy a certctl agent on every K8s node" friction that costs deals to
external PKI vendors today.
Phase status (2026-05-03): Phase 6 — full operator-facing reference. The functional surface is complete (Phases 1a-5); this doc is the canonical procurement-readability reference. New: client- walkthrough docs for cert-manager, Caddy, and Traefik; a dedicated threat model; a section-by-section RFC 8555 + RFC 9773 conformance statement; a 5-failure-mode troubleshooting playbook; a tested-clients version pinning table. Track shipped phases via
git log --grep='acme-server:'.
Configuration
All ACME-server config uses the CERTCTL_ACME_SERVER_* env-var prefix
(distinct from CERTCTL_ACME_* which configures the consumer-side
issuer connector). The struct definition lives in
internal/config/config.go::ACMEServerConfig.
| Env var | Default | Phase | Description |
|---|---|---|---|
CERTCTL_ACME_SERVER_ENABLED |
false |
1a | Master enable flag. Phase 1a's handler is constructed unconditionally so the registry shape stays stable; routes are registered in internal/api/router/router.go::RegisterHandlers regardless. Operators flip this on after configuring per-profile auth_mode. |
CERTCTL_ACME_SERVER_DEFAULT_AUTH_MODE |
trust_authenticated |
1a | Default value for certificate_profiles.acme_auth_mode on newly-created profiles. Existing profiles retain their stored value. Per-profile column is the source of truth at request time. |
CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID |
"" |
1a | When set, /acme/* shorthand mirrors /acme/profile/<DefaultProfileID>/* for single-profile deployments. When empty, requests to the shorthand return RFC 7807 + RFC 8555 §6.7 userActionRequired. |
CERTCTL_ACME_SERVER_NONCE_TTL |
5m |
1a | How long an issued ACME nonce remains valid before the JWS verifier (Phase 1b) returns urn:ietf:params:acme:error:badNonce per RFC 8555 §6.5.1. Tune up if cert-manager + certctl clocks frequently skew. |
CERTCTL_ACME_SERVER_TOS_URL |
"" |
1a | Optional meta.termsOfService URL in the directory document. |
CERTCTL_ACME_SERVER_WEBSITE |
"" |
1a | Optional meta.website URL in the directory document. |
CERTCTL_ACME_SERVER_CAA_IDENTITIES |
(empty) | 1a | Comma-separated meta.caaIdentities list. |
CERTCTL_ACME_SERVER_EAB_REQUIRED |
false |
1a | meta.externalAccountRequired advertisement. EAB enforcement is a follow-up; Phase 1a only advertises. |
CERTCTL_ACME_SERVER_ORDER_TTL |
24h |
2 | Reserved field, parsed in Phase 1a so operators can set it ahead of Phase 2's order endpoints. |
CERTCTL_ACME_SERVER_AUTHZ_TTL |
24h |
2 | Reserved. |
CERTCTL_ACME_SERVER_HTTP01_CONCURRENCY |
10 |
3 | Reserved. |
CERTCTL_ACME_SERVER_DNS01_RESOLVER |
8.8.8.8:53 |
3 | Reserved. |
CERTCTL_ACME_SERVER_DNS01_CONCURRENCY |
10 |
3 | Reserved. |
CERTCTL_ACME_SERVER_TLSALPN01_CONCURRENCY |
10 |
3 | Reserved. |
CERTCTL_ACME_SERVER_ARI_ENABLED |
true |
4 | Toggles the RFC 9773 ARI surface — both the renewalInfo URL in the directory document and the GET /renewal-info/<cert-id> handler. Set to false to drop ARI from the directory; ACME clients fall back to static renewal scheduling. |
CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL |
6h |
4 | Server-policy Retry-After value the ARI handler emits on a 200 response. RFC 9773 §4.2 leaves this server-policy. Tighten to 1h for short-lived certs; loosen to 24h for standard 90-day certs. |
CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR |
100 |
5 | Per-account orders/hour cap. 0 disables. Hits return RFC 7807 + RFC 8555 §6.7 urn:ietf:params:acme:error:rateLimited with Retry-After. In-memory token-bucket; restart wipes the counter (eventual-consistency caps are acceptable). |
CERTCTL_ACME_SERVER_RATE_LIMIT_CONCURRENT_ORDERS |
5 |
5 | Per-account cap on simultaneously-active orders (status in pending/ready/processing). 0 disables. Same RFC 7807 + RFC 8555 §6.7 problem shape as the per-hour cap. |
CERTCTL_ACME_SERVER_RATE_LIMIT_KEY_CHANGE_PER_HOUR |
5 |
5 | Per-account key-rollover cap. 0 disables. Default 5/hour: rollovers should be rare; a flood is an attack signal. |
CERTCTL_ACME_SERVER_RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR |
60 |
5 | Per-challenge-id respond cap. 0 disables. Defends against retry storms from a misbehaving client. Keyed by challenge-id (not account-id) so a flood against one challenge doesn't drain the account's whole budget. |
CERTCTL_ACME_SERVER_GC_INTERVAL |
1m |
5 | Tick interval for the ACME GC scheduler loop. On each tick: (1) DELETE used / expired nonces; (2) UPDATE pending authzs whose expires_at < NOW() to expired; (3) UPDATE pending/ready/processing orders whose expires_at < NOW() to invalid. Each sweep is a single SQL statement; the loop is idempotent + bounded by a 1m per-sweep timeout. 0 disables the loop. |
Per-profile auth mode
Two modes per certificate_profiles.acme_auth_mode:
trust_authenticated(default for internal PKI). The JWS- authenticated ACME account is trusted to issue certs for any identifier the profile policy allows; there is no per-identifier ownership proof. The most common certctl use case.challenge. Full HTTP-01 + DNS-01 + TLS-ALPN-01 validation per RFC 8555 §8. Required when certctl is exposing public-trust-style PKI.
A single certctl-server can serve both modes simultaneously — the mode is read from the bound profile's column at request time, not cached at server start. Operators can flip a profile's mode via SQL and the next order picks up the new mode without restart.
The CERTCTL_ACME_SERVER_DEFAULT_AUTH_MODE env var sets the default
value for newly-created profiles (e.g. via the certctl API). Existing
profile rows retain whatever value they were created with.
TLS trust bootstrap (read this before configuring cert-manager)
When certctl-server uses a self-signed TLS bootstrap cert
(deploy/test/certs/server.crt is the demo default; see
docs/tls.md), cert-manager 1.15+ will refuse to talk to
the directory URL unless the certctl root is trusted. The fix lives in
ClusterIssuer.spec.acme.caBundle:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: certctl-test
spec:
acme:
server: https://certctl.example.com:8443/acme/profile/prof-corp/directory
email: ops@example.com
caBundle: |
LS0tLS1CRUdJTi... # base64-encoded PEM of certctl's self-signed root
privateKeySecretRef:
name: certctl-test-account-key
solvers:
- http01:
ingress:
class: nginx
The caBundle value is the base64-encoded PEM of the root that signed
your certctl-server's TLS certificate. Extract it from your operator
bootstrap (e.g. cat deploy/test/certs/ca.crt | base64 -w0).
This is the single biggest first-time-deploy footgun on the cert-manager
integration path. The full cert-manager walkthrough lands in Phase 6;
the caBundle requirement is flagged here in Phase 1a's docs because
operators hit it the moment they try to point a real ACME client at
certctl.
Auth-mode decision tree
Use trust_authenticated when:
- The certctl deployment serves internal-only PKI (intranet certs, service-mesh certs, IoT bootstrap). Identifiers in your CSRs are controlled by your infrastructure, not by the public Internet.
- You don't have HTTP/DNS reachability from certctl-server back to the ACME client's solver (e.g., the client lives in an isolated network segment certctl-server can't reach).
- You want the simplest cert-manager integration: cert-manager submits a CSR, certctl issues; no out-of-band ownership proof.
- You're issuing under your own root CA whose trust is operator-managed (NOT WebPKI). Public CAs cannot use this mode — RFC 8555 §8 ownership proof is non-negotiable for public-trust roots.
Use challenge when:
- The deployment is public-trust-style PKI — even if your root is privately operated, you want CA/Browser Forum-style ownership-proof semantics so a stolen account key can't be used to issue for arbitrary identifiers.
- You have HTTP-01 / DNS-01 / TLS-ALPN-01 reachability from the certctl-server to the ACME client's solver. (HTTP-01 needs port 80 ingress to the client; DNS-01 needs DNS recursion; TLS-ALPN-01 needs port 443 ingress.)
- You want defense-in-depth: an account-key compromise costs the attacker nothing without also compromising the solver-side infrastructure.
A single certctl-server can run both modes simultaneously — the auth
mode is a per-profile column on certificate_profiles.acme_auth_mode,
read at request time. Operators flip a profile's mode via SQL or the
profile API, and the next order picks up the new mode without restart.
Endpoints
Routes registered in internal/api/router/router.go::RegisterHandlers:
| Method | Path | RFC ref | Auth | Description |
|---|---|---|---|---|
| GET | /acme/profile/{id}/directory |
RFC 8555 §7.1.1 | unauth | Per-profile directory document. |
| HEAD | /acme/profile/{id}/new-nonce |
RFC 8555 §7.2 | unauth | Returns 200 + Replay-Nonce header. |
| GET | /acme/profile/{id}/new-nonce |
RFC 8555 §7.2 | unauth | Returns 204 + Replay-Nonce header. |
| POST | /acme/profile/{id}/new-account |
RFC 8555 §7.3 | JWS jwk | Register a new account; idempotent re-registration of an existing JWK returns the existing row. |
| POST | /acme/profile/{id}/account/{acc_id} |
RFC 8555 §7.3.2 + §7.3.6 | JWS kid | Update contact list, deactivate, or POST-as-GET (RFC 8555 §6.3) to fetch the account. |
| POST | /acme/profile/{id}/new-order |
RFC 8555 §7.4 | JWS kid | Submit an order; identifier validation runs before order creation. |
| POST | /acme/profile/{id}/order/{ord_id} |
RFC 8555 §7.4 | JWS kid | POST-as-GET fetch of an order's current state. |
| POST | /acme/profile/{id}/order/{ord_id}/finalize |
RFC 8555 §7.4 | JWS kid | Submit the CSR + finalize. Issues + persists managed cert row + version. |
| POST | /acme/profile/{id}/authz/{authz_id} |
RFC 8555 §7.5 | JWS kid | POST-as-GET fetch of an authorization. |
| POST | /acme/profile/{id}/challenge/{chall_id} |
RFC 8555 §7.5.1 | JWS kid | Submit a challenge for validation. Dispatches to a bounded-concurrency worker pool; clients poll authz for the eventual result. |
| POST | /acme/profile/{id}/cert/{cert_id} |
RFC 8555 §7.4.2 | JWS kid | POST-as-GET cert chain download (PEM). |
| POST | /acme/profile/{id}/key-change |
RFC 8555 §7.3.5 | JWS kid (outer) + jwk (inner) | Doubly-signed account-key rollover. |
| POST | /acme/profile/{id}/revoke-cert |
RFC 8555 §7.6 | JWS kid OR jwk | Revoke a cert via the issuing account's key OR the cert's own private key. Routes through the certctl revocation pipeline. |
| GET | /acme/profile/{id}/renewal-info/{cert_id} |
RFC 9773 | unauth | Fetch the suggested renewal window for a cert (cert-id is base64url(AKI).base64url(serial) per RFC 9773 §4.1). Response carries Retry-After. |
| GET | /acme/directory |
RFC 8555 §7.1.1 | unauth | Shorthand path; mirrors per-profile when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID is set. |
| HEAD | /acme/new-nonce |
RFC 8555 §7.2 | unauth | Shorthand. |
| GET | /acme/new-nonce |
RFC 8555 §7.2 | unauth | Shorthand. |
| POST | /acme/new-account |
RFC 8555 §7.3 | JWS jwk | Shorthand. |
| POST | /acme/account/{acc_id} |
RFC 8555 §7.3.2 + §7.3.6 | JWS kid | Shorthand. |
| POST | /acme/new-order |
RFC 8555 §7.4 | JWS kid | Shorthand. |
| POST | /acme/order/{ord_id} |
RFC 8555 §7.4 | JWS kid | Shorthand. |
| POST | /acme/order/{ord_id}/finalize |
RFC 8555 §7.4 | JWS kid | Shorthand. |
| POST | /acme/authz/{authz_id} |
RFC 8555 §7.5 | JWS kid | Shorthand. |
| POST | /acme/cert/{cert_id} |
RFC 8555 §7.4.2 | JWS kid | Shorthand. |
| POST | /acme/key-change |
RFC 8555 §7.3.5 | JWS kid (outer) + jwk (inner) | Shorthand. |
| POST | /acme/revoke-cert |
RFC 8555 §7.6 | JWS kid OR jwk | Shorthand. |
| GET | /acme/renewal-info/{cert_id} |
RFC 9773 | unauth | Shorthand. |
After Phase 4, the full RFC 8555 + RFC 9773 surface is live. RFC 8739 (short-lived certs) and EAB enforcement remain follow-up work; cert- manager + boulder-tested clients work today against the surface above.
RFC 8555 + RFC 9773 conformance statement
Honest disclosure of what's implemented, where, and what's not. Procurement engineers running gap analyses against cert-manager + Let's Encrypt's conformance posture should read this section before anything else.
Implemented
| Section | Surface | Phase | First commit |
|---|---|---|---|
| RFC 8555 §6.2 | JWS auth + RS256/ES256/EdDSA allow-list | 1b | 27bd660 |
| RFC 8555 §6.3 | POST-as-GET | 1b | 27bd660 |
| RFC 8555 §6.4 | URL-header binding to request URL | 1b | 27bd660 |
| RFC 8555 §6.5 | Replay-Nonce + DB-backed nonce store | 1a | e146b00 |
| RFC 8555 §6.7 | RFC 7807 problem documents | 1a | e146b00 |
| RFC 8555 §7.1 | Directory | 1a | e146b00 |
| RFC 8555 §7.2 | new-nonce HEAD + GET | 1a | e146b00 |
| RFC 8555 §7.3 | new-account + idempotent re-registration | 1b | 27bd660 |
| RFC 8555 §7.3.2 + §7.3.6 | account update + deactivation | 1b | 27bd660 |
| RFC 8555 §7.3.5 | doubly-signed key rollover | 4 | 0299e4a |
| RFC 8555 §7.4 | new-order + finalize + cert download | 2 | 4ee486e |
| RFC 8555 §7.5 | authz POST-as-GET | 2 | 4ee486e |
| RFC 8555 §7.5.1 | challenge response | 3 | 7e22204 |
| RFC 8555 §7.6 | revoke-cert (kid + jwk paths) | 4 | 0299e4a |
| RFC 8555 §8.3 | HTTP-01 challenge validator | 3 | 7e22204 |
| RFC 8555 §8.4 | DNS-01 challenge validator | 3 | 7e22204 |
| RFC 8737 | TLS-ALPN-01 challenge validator | 3 | 7e22204 |
| RFC 9773 | ACME Renewal Information (ARI) | 4 | 0299e4a |
Not implemented (procurement-honest)
| Spec area | Status | Notes |
|---|---|---|
| RFC 8555 §7.3.4 — External Account Binding (EAB) | Not implemented. | Advertised in directory meta.externalAccountRequired but enforcement is a follow-up. Operators relying on EAB for account-creation gating should layer an upstream WAF. |
RFC 8555 §8.4 + §7.4 — Wildcard with *. prefix > 1 level |
Not implemented. | Single-level wildcards (e.g. *.example.com) work end-to-end. Multi-level wildcards (*.*.example.com) are RFC-spec-ambiguous and rejected at the identifier-validation layer. |
| RFC 8738 — Short-lived certs | Not implemented. | Operators wanting <7-day validity tune the bound issuer's TTL directly via CertificateProfile.MaxTTLSeconds; the ACME wire shape doesn't expose a separate notion. |
| Cross-CA proxying | Not implemented. | Each profile binds to one issuer. Multi-CA federation (one ACME account → multi-CA selection per identifier) is roadmap. |
RFC 8555 §6.7 — accountDoesNotExist problem with hint URL |
Partial. | Sentinel returns accountDoesNotExist; the optional hint URL embedding the kid is not emitted. cert-manager doesn't consume it. |
If a procurement-side gap analysis turns up something not in either table above, the answer is "we don't know yet" — operator-side issues welcome.
Finalize routing through CertificateService.Create (Phase 2 architecture)
The finalize path mirrors how every other certctl issuance surface (EST, SCEP, agent, REST API) routes through the canonical pipeline:
- JWS-verify the request (
internal/api/acme/jws.go). - Validate the CSR's DNS-name set equals the order's identifier set
exactly (case-folded). Mismatches return RFC 8555
urn:ietf:params:acme:error:badCSR. - Update the order row to
status=processing(s.tx.WithinTx+auditService.RecordEventWithTx— atomic with audit row). - Issue the cert via the bound profile's
IssuerConnectoradapter (sameIssueCertificate(ctx, commonName, sans, csrPEM, ekus, maxTTLSeconds, mustStaple)call EST/SCEP/agent take). - Insert the
managed_certificatesrow viaservice.CertificateService.Create(ctx, *ManagedCertificate, actor). Source is stampeddomain.CertificateSourceACMEso operators can bulk-revoke ACME-issued certs by filtering onSource=ACME. - Insert the
certificate_versionsrow + transition the order tostatus=validwithcertificate_idset (one finalWithinTxcovering both writes + the audit row).
This means RenewalPolicy, CertificateProfile, per-issuer-type Prometheus metrics, audit rows, and revocation-pipeline integration all apply uniformly to ACME-issued certs via the same code path that already serves EST/SCEP/agent/REST issuance.
The atomicity boundary: there is a brief window between step 5 (cert
exists) and step 6 (order shows valid) where the order row still says
processing. Phase 5's GC scheduler reconciles. The actor string on
audit rows is acme:<account-id>.
JWS verification (Phase 1b)
Every JWS-authenticated POST runs through the verifier at
internal/api/acme/jws.go::VerifyJWS. The verifier enforces:
- The JWS parses as a flattened single-signature object (multi-sig is rejected per RFC 8555 §6.2).
- The signature algorithm is in the closed allow-list
{RS256, ES256, EdDSA}per RFC 8555 §6.2 —none,HS256, and every other alg are refused at parse time. - The protected header carries exactly one of
kid(registered account) orjwk(new-account flow); endpoints declare which they require. - The protected header
urlmatches the inbound request URL exactly. - The protected header
nonceis consumed against theacme_noncesstore; missing / replayed / expired nonces returnurn:ietf:params:acme:error:badNonceper RFC 8555 §6.5.1. - On the
kidpath: the kid URL round-trips against the canonical per-profile shape, the referenced account exists, and its status isvalid. Deactivated / revoked accounts cannot authenticate. - The signature verifies against the resolved key (registered account's stored JWK on the kid path; embedded jwk on the jwk path).
Every state-mutating account operation (create, contact update,
deactivate) writes its acme_accounts row and an audit_events row
inside one repository.Transactor.WithinTx call — the canonical
certctl atomicity contract (matches service.CertificateService.Create
at internal/service/certificate.go:131).
Phases (cross-reference)
| Phase | Status | Surface |
|---|---|---|
| 1a | live | directory + new-nonce + per-profile routing |
| 1b | live | new-account + account/{id} + JWS verifier (RFC 7515 + go-jose v4) |
| 2 | live | orders + authzs + finalize + cert download (trust_authenticated mode end-to-end) |
| 3 | live | HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (challenge mode end-to-end) |
| 4 | live | key rollover (RFC 8555 §7.3.5) + revoke-cert (§7.6) + ARI (RFC 9773) |
| 5 | live | rate limits + GC sweeper + kind-driven cert-manager integration test + lego conformance harness + k6 ACME-flow scenario |
| 6 | live | full operator-facing reference + walkthroughs (cert-manager / Caddy / Traefik) + threat model + RFC-8555 conformance statement + troubleshooting + version pinning |
Track shipped phases via git log --grep='acme-server:' --oneline.
Operational notes (Phase 1a)
-
Schema:
migrations/000025_acme_server.up.sqladds 5 ACME tables- the
certificate_profiles.acme_auth_modecolumn. Phase 1a actively uses onlyacme_nonces. The full schema ships now so the migration is stable and Phases 1b-4 don't need additionalCREATE TABLEmigrations.
- the
-
Replay protection: nonces are persisted in
acme_nonces(NOT in-memory). They survive server restart, which is required for the RFC 8555 §6.5 replay defense to hold against a multi-replica certctl-server fleet behind a load balancer. -
Metrics: the service layer exposes per-op atomic counters via
service.ACMEService.Metrics().Snapshot():certctl_acme_directory_totalcertctl_acme_directory_failures_totalcertctl_acme_new_nonce_totalcertctl_acme_new_nonce_failures_total
Phase 1b will extend with
new_accountcounters; Phase 2 with order / finalize / cert; Phase 3 with per-challenge-type counters. -
Audit: Phase 1a is read-mostly (directory + nonce). Phase 1b's account-creation path will route through the canonical
s.tx.WithinTx(...)+auditService.RecordEventWithTx(...)pattern so every account state mutation is paired with anaudit_eventsrow.
Phase 4 — key rollover, revocation, ARI
How do I rotate my ACME account key?
RFC 8555 §7.3.5 defines a doubly-signed JWS for the rollover. The OUTER
JWS is signed by the OLD account key (kid path); its payload IS the
INNER JWS, which is signed by the NEW account key (jwk path). cert-
manager and lego do this for you transparently — lego renew --key-rotate
or the cert-manager Issuer.spec.acme.privateKeySecretRef rollover.
Server-side validation:
- Outer JWS verifies against the registered account's current key.
- Inner JWS verifies against the embedded NEW jwk (proves possession).
- Inner payload
accountmatches outerkid. - Inner payload
oldKeythumbprint-equals the registered key. - Inner protected
urlequals outer protectedurl. - New JWK thumbprint not already registered against the same profile.
SELECT … FOR UPDATEon the account row serializes concurrent rollovers; the loser sees the winner's new thumbprint and is told to retry (409).
How do I revoke an ACME-issued cert?
Two auth paths per RFC 8555 §7.6:
- kid path: sign with your account key. The server checks the
account "owns" the cert via
acme_orders.certificate_idlookup. - jwk path: sign with the cert's own private key. The server extracts the cert's public key, computes the JWK, and asserts it matches the embedded jwk thumbprint.
Either path routes through service.RevocationSvc.RevokeCertificateWithActor
— the same pipeline the GUI revoke button, bulk-revocation, and the
ACME-consumer issuer use. So the cert-row update + revocation row + audit
row are all atomic in one WithinTx, the issuer is best-effort
notified, and the OCSP response cache is invalidated.
Reason codes follow RFC 5280 §5.3.1; codes 8 (removeFromCRL) and 10
(aACompromise) are not in certctl's domain.ValidRevocationReasons
set so they clamp to unspecified.
What is ARI?
RFC 9773 ACME Renewal Information. Clients GET
/acme/profile/<id>/renewal-info/<cert-id> (unauthenticated) and
receive a JSON document with suggestedWindow.start and .end —
the server's recommendation for when to renew. The response also
carries Retry-After (RFC 9773 §4.2) hinting at the next-poll cadence.
Cert-id format is base64url(authorityKeyIdentifier).base64url(serial)
per RFC 9773 §4.1.
Window math:
- Cert with a bound renewal policy: window starts at
notAfter - RenewalWindowDays, ends atnotAfter - RenewalWindowDays/2. So a 30-day window cert with notAfter 2026-06-30 emits start=2026-05-31, end=2026-06-15. Boulder-shape default that lets cert-manager schedule inside our renewal window. - No policy: window is the last 33% of validity.
- Past expiry: window is "now" → "now + 24h" (renew immediately).
Disable ARI globally with CERTCTL_ACME_SERVER_ARI_ENABLED=false. The
URL drops out of the directory; the route is still registered but
returns 404 — clients fall back to static renewal scheduling.
Phase 5 — operational guidance
Rate limiting
Production deployments serving multiple ACME profiles or fleets should keep the default rate limits in place. The four caps:
RATE_LIMIT_ORDERS_PER_HOUR(100) — per-account new-order cap. A cert-manager Certificate that auto-renews at the 1/3 mark of its validity (90-day cert → ~30-day renewal) consumes ~12 orders/year per managed Certificate. 100/hour is generous for any plausible fleet.RATE_LIMIT_CONCURRENT_ORDERS(5) — per-account cap on pending/ready/processing orders. Stops a runaway client from starving DB-row throughput. Tune up only if you observe legitimate bursts.RATE_LIMIT_KEY_CHANGE_PER_HOUR(5) — rollovers are rare; a flood is an attack signal. Tune down to 1/hour if your operator procedure mandates manual rollovers only.RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR(60) — per-challenge cap, defends against retry storms.
Hits return RFC 8555 §6.7 rateLimited Problem with a Retry-After
header. cert-manager 1.15+ honors the header; lego too. Older clients
may not — that's the client's problem, not certctl's.
The buckets are in-memory + per-replica. A 3-replica certctl- server fleet behind a load balancer effectively has 3× the configured throughput (each replica's bucket fills independently). For deployments where this matters operationally, the right answer is a shared rate-limit store — that's a follow-up; not blocking for the current threat model where same-account requests typically pin to the same replica via session affinity.
GC sweeper
The scheduler runs the GC sweep every GC_INTERVAL (default 1m). Each
sweep is three independent SQL statements:
DELETE FROM acme_nonces WHERE used = TRUE OR expires_at < NOW().UPDATE acme_authorizations SET status='expired' WHERE status='pending' AND expires_at < NOW().UPDATE acme_orders SET status='invalid', error=... WHERE status IN ('pending','ready','processing') AND expires_at < NOW().
Each statement is bounded by a 1-minute per-sweep timeout. A failing
sweep is logged + retried on the next tick; a tick that overruns its
budget is skipped (the existing-tick atomic-Bool guard prevents
overlap). Counts are exposed via certctl_acme_gc_* Prometheus
metrics.
cert-manager integration test
make acme-cert-manager-test brings up a kind cluster, installs
cert-manager 1.15.0, helm-deploys certctl-server with
acmeServer.enabled=true, and verifies a Certificate resource issues
end-to-end. Skipped in CI by default (kind is too heavy for per-PR);
operators run locally on workstation. See
deploy/test/acme-integration/ for the YAML + Go test harness.
lego RFC conformance harness
make acme-rfc-conformance-test drives lego v4 against a hermetic
certctl-server stack, exercising register → new-order → finalize.
Operators run this when shipping behavior changes to the ACME surface
to confirm a real third-party client still works.
k6 ACME flows scenario
deploy/test/loadtest/k6/acme_flow.js exercises the unauthenticated
surface (directory + new-nonce + ARI) at 100 VUs × 5m. JWS-signed
flows are out of scope for k6 (no JWS support); they're covered by
the lego conformance harness above. Baseline numbers + thresholds in
deploy/test/loadtest/README.md.
Troubleshooting
The five failure modes operators hit most often + the canonical fix for each.
cert-manager logs: 400 Bad Request: badNonce
Cause: Either a nonce was replayed (a buggy client retries the
same JWS), the cert-manager + certctl-server clocks differ by more
than CERTCTL_ACME_SERVER_NONCE_TTL (default 5 min), or the
nonce-store row was reaped between issuance and use.
Fix: First check NTP on both sides. If clocks are healthy,
lengthen CERTCTL_ACME_SERVER_NONCE_TTL to 10m or 15m. If the
problem persists, check for a multi-replica certctl-server fleet
without sticky session affinity — the nonce DB row lives on one
replica; if the JWS POST hits a different replica before replication
catches up, you observe spurious badNonce. Solution: pin client
sessions to a single replica via load-balancer cookie / kid-hash
routing, OR shorten replication lag if your DB is the bottleneck.
cert-manager logs: x509: certificate signed by unknown authority
Cause: cert-manager refuses to talk to the directory URL because
its TLS chain doesn't terminate at a root in cert-manager's trust
store. certctl-server's bootstrap cert (Phase 1a, deploy/test/certs/server.crt)
is self-signed.
Fix: Add the caBundle field to your ClusterIssuer.spec.acme —
see the TLS trust bootstrap
section above for the 3-step recipe. This is the single biggest
first-time-deploy footgun on the cert-manager integration path.
HTTP-01 validator returns connection refused
Cause: The HTTP-01 solver's Ingress / Service is not reachable from certctl-server's network. Common subcases: (a) the cert-manager http-solver pod is on a private network certctl-server can't reach; (b) a firewall blocks port 80 inbound to the solver's address; (c) the Ingress class annotation doesn't match an installed ingress controller; (d) your DNS still points at an old IP.
Fix: From the certctl-server pod, curl -v http://<identifier>/.well-known/acme-challenge/<token> and read the
network error. If the curl fails the same way, the network path is
the issue. If curl works but the validator fails, check the validator
log lines — the SSRF guard rejects reserved IPs (RFC1918, link-local,
cloud-metadata 169.254.169.254). Public-trust style profiles that
need to reach RFC1918 solvers must be moved to trust_authenticated
mode OR the solver must be exposed on a routable address.
DNS-01 validator returns NXDOMAIN
Cause: DNS provider hasn't propagated the _acme-challenge.<domain>
TXT record yet. Most providers have a 30s-2m propagation lag. cert-manager
retries by default, but Phase-5 rate limits (default 60/hour per
challenge-id) can truncate the retry budget.
Fix: Verify TXT propagation with dig +short TXT _acme-challenge.<domain> @<your-resolver>. If the answer is empty, the issue is upstream. If
it's populated but certctl reports NXDOMAIN, check
CERTCTL_ACME_SERVER_DNS01_RESOLVER (default 8.8.8.8:53) is
reachable from certctl-server's network egress. Operators on isolated
networks need a private resolver; configure accordingly + own the
cache-poisoning posture (see threat
model).
Certificate Ready=False with rejectedIdentifier
Cause: The CSR includes an identifier (CommonName or SAN) that the bound certificate profile's policy rejects. certctl runs syntactic + profile-policy validation before order creation; the order never reaches the database.
Fix: The reject reason is in the subproblems array of the RFC
8555 §6.7 problem document. Decode the JSON, look at subproblems[].detail,
and adjust either the CSR or the profile policy. Common causes:
SAN-not-in-AllowedIdentifierWildcards, EKU-not-in-AllowedEKUs,
TTL-exceeds-MaxTTLSeconds. Validation logic lives in
internal/api/acme/identifier.go::ValidateIdentifiers +
internal/domain/profile.go — read those if the profile-policy rule
isn't obvious.
Version pinning + tested clients
certctl's ACME server is tested against the following client versions. Other versions probably work; these are the ones the integration suite exercises end-to-end.
| Client | Tested version | Where it's pinned |
|---|---|---|
| cert-manager | 1.15.0 | deploy/test/acme-integration/cert-manager-install.sh::CERT_MANAGER_VERSION |
| lego (RFC 8555 conformance harness) | v4.x latest | deploy/test/acme-integration/conformance-lego.sh (operator installs via go install github.com/go-acme/lego/v4/cmd/lego@latest) |
| kind (cluster bootstrap) | v0.20+ | deploy/test/acme-integration/kind-config.yaml schema requirement |
| Caddy | 2.7.x | Phase 6 walkthrough (docs/acme-caddy-walkthrough.md) |
| Traefik | 3.0+ | Phase 6 walkthrough (docs/acme-traefik-walkthrough.md) |
Operators reporting issues with untested-version clients should include the client version + the precise wire-level error (curl-captured request
- response body) so we can pin a regression test if applicable.
FAQ
Why two auth modes? Isn't challenge strictly more secure?
challenge is strictly more secure for public-trust PKI — RFC 8555
§8 ownership proof is the entire point of cert-manager + Let's Encrypt.
For internal PKI, the threat model is different: the network itself
is the security boundary (mTLS service mesh, firewalled VPC, identifier-
namespace controlled by the operator). Forcing every internal cert to
go through a solver round-trip adds operational toil with no security
gain. trust_authenticated is the certctl-specific mode that
acknowledges this — the ACME account is the proof, not the solver.
How does this differ from cert-manager → Let's Encrypt with certctl as a separate step?
Two integrations vs one. With certctl as the ACME endpoint, cert-manager
does its native flow (Certificate → Order → CSR → Secret) and certctl
mints the cert directly, recording it under its own
managed_certificates table with full audit + renewal-policy + bulk-
revocation surface. With Let's Encrypt as the ACME endpoint, you have
to run a separate cert-manager-uploads-to-certctl webhook OR maintain
two parallel cert tracks. The native-ACME-server path is operationally
simpler.
Can I use ACME endpoints from outside the K8s cluster?
Yes. The endpoints are HTTPS over the certctl-server's listener (port
8443 by default). Caddy on a VM, win-acme on a Windows server, or
Posh-ACME on a Mac all integrate against
https://<certctl-server>:8443/acme/profile/<profile-id>/directory.
The TLS-trust-bootstrap requirement applies the same way — see the
Caddy walkthrough for the OS-trust-store
recipe.
How do I migrate manually-issued certs to ACME-issued ones?
Not yet automatic. Operators migrating: keep the old managed_certificates
rows; create new ones via the ACME flow; flip targets one by one. A
dedicated bulk-migration tool is on the roadmap (post-2.1.0). Track
via the master prompt's roadmap section in
the project's acme-server-endpoint spec.
What audit-log events fire on each ACME operation?
Every state mutation writes an audit_events row. Actor strings:
acme:<account-id> for kid-path requests; acme-cert-key:<serial>
for jwk-path revoke; acme-system:gc for scheduler-driven sweeps.
Event-name catalog:
| Event name | Fired by | Resource type |
|---|---|---|
acme_account_created |
new-account | acme_account |
acme_account_contact_updated |
account update | acme_account |
acme_account_deactivated |
account deactivate | acme_account |
acme_account_key_rolled |
key-change | acme_account |
acme_order_created |
new-order | acme_order |
acme_order_finalized |
finalize | acme_order |
acme_challenge_processing |
challenge-respond (dispatch) | acme_challenge |
acme_challenge_completed |
validator callback | acme_challenge |
certificate_revoked |
revoke-cert (routes through RevocationSvc) |
certificate |
Querying by actor prefix (actor LIKE 'acme:%') reconstructs the full
history of any ACME-issued cert.
Is there a threat model document?
Yes — docs/acme-server-threat-model.md.
Read before writing a security review.