Files
certctl/docs/acme-server.md
T
shankar0123 ec88a61274 acme-server: foundation — directory + new-nonce + per-profile routing (Phase 1a/7)
First slice of the RFC 8555 ACME server endpoint (master plan at
cowork/acme-server-endpoint-prompt.md, per-phase prompts at
cowork/acme-server-prompts/). This commit lands the smallest viable
end-to-end deployable slice: an ACME client running

  curl -sk https://certctl/acme/profile/<id>/directory
  curl -sk -I https://certctl/acme/profile/<id>/new-nonce

successfully fetches the directory document and a Replay-Nonce.
Account creation, JWS verification, orders, challenges, and
revocation are all out of scope for this phase and arrive in Phases
1b–4.

Closes the Rank 1 LHF from the 2026-05-03 Infisical deep-research
(cowork/infisical-deep-research-results.md). Pre-fix, certctl was an
ACME consumer only — no /acme/directory endpoint, no JWS verifier,
no challenge validators. K8s customers running cert-manager could
not point at certctl as an ACME issuer; they had to deploy a certctl
agent on every node.

What ships:
  - internal/api/acme/{directory,nonce,errors}.go (+ tests).
  - internal/api/handler/acme.go + acme_handler_test.go.
  - internal/repository/postgres/acme.go (nonce ops only — Phase 1b
    extends with account CRUD; Phases 2-4 extend with order / authz /
    challenge CRUD).
  - internal/service/acme.go (BuildDirectory + IssueNonce stubs;
    Phase 1b adds VerifyJWS / NewAccount / etc.).
  - migrations/000025_acme_server.{up,down}.sql ships the full 5-table
    ACME schema (acme_accounts / acme_orders / acme_authorizations /
    acme_challenges / acme_nonces) PLUS the per-profile
    certificate_profiles.acme_auth_mode column. Phase 1a actively
    uses only acme_nonces; remaining tables are empty until Phases
    1b-4 plug in.
  - internal/config/config.go: ACMEServerConfig struct + ACMEServer
    field on Config. Env vars use CERTCTL_ACME_SERVER_* prefix to
    avoid colliding with the existing consumer-side ACMEConfig at
    config.go:1746 (CERTCTL_ACME_DIRECTORY_URL / PROFILE /
    CHALLENGE_TYPE etc.). Phase 1a wires Enabled +
    DefaultAuthMode + DefaultProfileID + NonceTTL + DirectoryMeta;
    Order/Authz TTLs + per-challenge-type concurrency caps + DNS01
    resolver are reserved fields parsed in 1a so operators can set
    them ahead of Phases 2/3.
  - cmd/server/main.go: wire ACMEHandler into the HandlerRegistry
    literal alongside the existing certificate / EST / SCEP / etc.
    handlers.
  - internal/api/router/router.go: HandlerRegistry.ACME field + 6
    Register calls (3 per-profile + 3 shorthand).
  - internal/api/router/openapi_parity_test.go: 6 new entries in
    SpecParityExceptions. ACME is a wire-protocol surface (JWS-signed
    JSON over HTTPS per RFC 7515) whose semantics are dictated by
    RFC 8555 + RFC 9773 rather than by an OpenAPI document, same
    precedent as SCEP/EST. The canonical reference is
    docs/acme-server.md.
  - docs/acme-server.md: Phase-1a-shaped reference. Configuration
    table for every CERTCTL_ACME_SERVER_* env var. Per-profile
    auth-mode decision tree skeleton. TLS trust bootstrap section
    flagging cert-manager's ClusterIssuer.spec.acme.caBundle
    requirement (the single biggest first-time-deploy footgun;
    the full cert-manager walkthrough lands in Phase 6 but the
    requirement is documented up front).

Architecture decisions baked in:
  - URL family is /acme/profile/<id>/* (per-profile, canonical) with
    /acme/* shorthand active when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID
    is set. Path matches existing per-profile precedent in EST + SCEP.
  - Auth mode is per-profile (acme_auth_mode column on
    certificate_profiles), NOT server-wide. One certctl-server can
    serve trust_authenticated for an internal-PKI profile and
    challenge for a public-trust-style profile simultaneously. The
    column is read at request time, not cached at server start —
    operators flipping a profile's mode via SQL take effect on the
    next order without restart.
  - Nonces are DB-backed (acme_nonces table). Survive server restart.
    The RFC 8555 §6.5 replay defense requires the store to outlast
    the client's nonce caching window; an in-memory-only nonce
    store would lose every in-flight order on restart.
  - Per-op atomic counters on service.ACMEService.Metrics() —
    certctl_acme_directory_total, certctl_acme_directory_failures_total,
    certctl_acme_new_nonce_total, certctl_acme_new_nonce_failures_total.
    Naming follows certctl frozen decision 0.10 cardinality discipline.
    Phase 1b will extend with new_account counters; Phase 2 with
    order / finalize / cert; Phase 3 with per-challenge-type counters.

Audit fixes #11 + #12 (cowork/acme-server-prompts/audit-additions.md)
applied:
  - #11: CERTCTL_ACME_SERVER_* prefix avoids the consumer-side
    CERTCTL_ACME_* namespace collision.
  - #12: prior-attempt WIP from two failed Phase-1 dispatches was
    discarded at phase start; this commit starts from a clean tree.

Tests:
  - 14 unit tests in internal/api/acme/ (directory, nonce, errors).
  - 7 handler-level tests via httptest.NewServer + mockACMEService
    (mirrors the mockSCEPService pattern at scep_handler_test.go).
  - 7 service-layer tests with mocked repo + injected profileLookup.
  - All pass under -race -count=1 -short.

Deferred to Phase 1b:
  - JWS verification (go-jose v4 — see master-prompt §8a for the API
    surface and audit doc for the speculation pitfalls).
  - new-account / account/<id> endpoints + AccountService.
  - Nonce *consumption* path (issue path is in this commit; consume
    is only invoked by JWS-verified POSTs which Phase 1b adds).

Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1a".
Per-phase implementation plan: cowork/acme-server-prompts/.
Master plan + audit fixes: cowork/acme-server-endpoint-prompt.md +
cowork/acme-server-prompt-audit.md +
cowork/acme-server-prompts/audit-additions.md.
2026-05-03 12:55:40 +00:00

9.2 KiB

certctl ACME Server (Built-in)

certctl ships an RFC 8555 + RFC 9773 ARI ACME server endpoint at /acme/profile/<profile-id>/*. Any RFC 8555 client (cert-manager 1.15+, Caddy, Traefik, win-acme, certbot, Posh-ACME) can integrate with certctl as an ACME issuer with no certctl-side modification — closing the "deploy a certctl agent on every K8s node" friction that costs deals to external PKI vendors today.

Phase status (2026-05-03): Phase 1a (foundation — directory + new-nonce + per-profile routing). The directory document is live and ACME clients can fetch nonces. Account creation, JWS verification, orders, challenges, key rollover, revocation, and ARI all land in subsequent phases. Track shipped phases via git log --grep='acme-server:'.

Configuration

All ACME-server config uses the CERTCTL_ACME_SERVER_* env-var prefix (distinct from CERTCTL_ACME_* which configures the consumer-side issuer connector). The struct definition lives in internal/config/config.go::ACMEServerConfig.

Env var Default Phase Description
CERTCTL_ACME_SERVER_ENABLED false 1a Master enable flag. Phase 1a's handler is constructed unconditionally so the registry shape stays stable; routes are registered in internal/api/router/router.go::RegisterHandlers regardless. Operators flip this on after configuring per-profile auth_mode.
CERTCTL_ACME_SERVER_DEFAULT_AUTH_MODE trust_authenticated 1a Default value for certificate_profiles.acme_auth_mode on newly-created profiles. Existing profiles retain their stored value. Per-profile column is the source of truth at request time.
CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID "" 1a When set, /acme/* shorthand mirrors /acme/profile/<DefaultProfileID>/* for single-profile deployments. When empty, requests to the shorthand return RFC 7807 + RFC 8555 §6.7 userActionRequired.
CERTCTL_ACME_SERVER_NONCE_TTL 5m 1a How long an issued ACME nonce remains valid before the JWS verifier (Phase 1b) returns urn:ietf:params:acme:error:badNonce per RFC 8555 §6.5.1. Tune up if cert-manager + certctl clocks frequently skew.
CERTCTL_ACME_SERVER_TOS_URL "" 1a Optional meta.termsOfService URL in the directory document.
CERTCTL_ACME_SERVER_WEBSITE "" 1a Optional meta.website URL in the directory document.
CERTCTL_ACME_SERVER_CAA_IDENTITIES (empty) 1a Comma-separated meta.caaIdentities list.
CERTCTL_ACME_SERVER_EAB_REQUIRED false 1a meta.externalAccountRequired advertisement. EAB enforcement is a follow-up; Phase 1a only advertises.
CERTCTL_ACME_SERVER_ORDER_TTL 24h 2 Reserved field, parsed in Phase 1a so operators can set it ahead of Phase 2's order endpoints.
CERTCTL_ACME_SERVER_AUTHZ_TTL 24h 2 Reserved.
CERTCTL_ACME_SERVER_HTTP01_CONCURRENCY 10 3 Reserved.
CERTCTL_ACME_SERVER_DNS01_RESOLVER 8.8.8.8:53 3 Reserved.
CERTCTL_ACME_SERVER_DNS01_CONCURRENCY 10 3 Reserved.
CERTCTL_ACME_SERVER_TLSALPN01_CONCURRENCY 10 3 Reserved.

Per-profile auth mode

Two modes per certificate_profiles.acme_auth_mode:

  • trust_authenticated (default for internal PKI). The JWS- authenticated ACME account is trusted to issue certs for any identifier the profile policy allows; there is no per-identifier ownership proof. The most common certctl use case.
  • challenge. Full HTTP-01 + DNS-01 + TLS-ALPN-01 validation per RFC 8555 §8. Required when certctl is exposing public-trust-style PKI.

A single certctl-server can serve both modes simultaneously — the mode is read from the bound profile's column at request time, not cached at server start. Operators can flip a profile's mode via SQL and the next order picks up the new mode without restart.

The CERTCTL_ACME_SERVER_DEFAULT_AUTH_MODE env var sets the default value for newly-created profiles (e.g. via the certctl API). Existing profile rows retain whatever value they were created with.

TLS trust bootstrap (read this before configuring cert-manager)

When certctl-server uses a self-signed TLS bootstrap cert (deploy/test/certs/server.crt is the demo default; see docs/tls.md), cert-manager 1.15+ will refuse to talk to the directory URL unless the certctl root is trusted. The fix lives in ClusterIssuer.spec.acme.caBundle:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: certctl-test
spec:
  acme:
    server: https://certctl.example.com:8443/acme/profile/prof-corp/directory
    email: ops@example.com
    caBundle: |
      LS0tLS1CRUdJTi...   # base64-encoded PEM of certctl's self-signed root
    privateKeySecretRef:
      name: certctl-test-account-key
    solvers:
      - http01:
          ingress:
            class: nginx

The caBundle value is the base64-encoded PEM of the root that signed your certctl-server's TLS certificate. Extract it from your operator bootstrap (e.g. cat deploy/test/certs/ca.crt | base64 -w0).

This is the single biggest first-time-deploy footgun on the cert-manager integration path. The full cert-manager walkthrough lands in Phase 6; the caBundle requirement is flagged here in Phase 1a's docs because operators hit it the moment they try to point a real ACME client at certctl.

Endpoints (Phase 1a)

Routes registered in internal/api/router/router.go::RegisterHandlers:

Method Path RFC ref Auth Description
GET /acme/profile/{id}/directory RFC 8555 §7.1.1 unauth Per-profile directory document.
HEAD /acme/profile/{id}/new-nonce RFC 8555 §7.2 unauth Returns 200 + Replay-Nonce header.
GET /acme/profile/{id}/new-nonce RFC 8555 §7.2 unauth Returns 204 + Replay-Nonce header.
GET /acme/directory RFC 8555 §7.1.1 unauth Shorthand path; mirrors per-profile when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID is set.
HEAD /acme/new-nonce RFC 8555 §7.2 unauth Shorthand.
GET /acme/new-nonce RFC 8555 §7.2 unauth Shorthand.

The remaining RFC 8555 endpoints (new-account, account/{id}, new-order, order/{id}, order/{id}/finalize, authz/{id}, challenge/{id}, cert/{id}, key-change, revoke-cert, renewal-info) are advertised in the directory document but not yet served — clients hitting them get a 404 until subsequent phases land. The directory document includes their URLs because RFC 8555 doesn't permit a partial directory.

Phases (cross-reference)

Phase Status Surface
1a live directory + new-nonce + per-profile routing
1b not yet new-account + JWS verifier (RFC 7515)
2 not yet orders + authzs + finalize + cert download (trust_authenticated mode end-to-end)
3 not yet HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation
4 not yet key rollover + revocation + ARI (RFC 9773)
5 not yet cert-manager integration test + production hardening
6 not yet full operator-facing reference + walkthroughs + threat model

Track shipped phases via git log --grep='acme-server:' --oneline.

Operational notes (Phase 1a)

  • Schema: migrations/000025_acme_server.up.sql adds 5 ACME tables

    • the certificate_profiles.acme_auth_mode column. Phase 1a actively uses only acme_nonces. The full schema ships now so the migration is stable and Phases 1b-4 don't need additional CREATE TABLE migrations.
  • Replay protection: nonces are persisted in acme_nonces (NOT in-memory). They survive server restart, which is required for the RFC 8555 §6.5 replay defense to hold against a multi-replica certctl-server fleet behind a load balancer.

  • Metrics: the service layer exposes per-op atomic counters via service.ACMEService.Metrics().Snapshot():

    • certctl_acme_directory_total
    • certctl_acme_directory_failures_total
    • certctl_acme_new_nonce_total
    • certctl_acme_new_nonce_failures_total

    Phase 1b will extend with new_account counters; Phase 2 with order / finalize / cert; Phase 3 with per-challenge-type counters.

  • Audit: Phase 1a is read-mostly (directory + nonce). Phase 1b's account-creation path will route through the canonical s.tx.WithinTx(...) + auditService.RecordEventWithTx(...) pattern so every account state mutation is paired with an audit_events row.