Files
certctl/internal/service/acme.go
T
shankar0123 e146b00f0e acme-server: foundation — directory + new-nonce + per-profile routing (Phase 1a/7)
First slice of the RFC 8555 ACME server endpoint (master plan at
cowork/acme-server-endpoint-prompt.md, per-phase prompts at
cowork/acme-server-prompts/). This commit lands the smallest viable
end-to-end deployable slice: an ACME client running

  curl -sk https://certctl/acme/profile/<id>/directory
  curl -sk -I https://certctl/acme/profile/<id>/new-nonce

successfully fetches the directory document and a Replay-Nonce.
Account creation, JWS verification, orders, challenges, and
revocation are all out of scope for this phase and arrive in Phases
1b–4.

Closes the Rank 1 LHF from the 2026-05-03 Infisical deep-research
(cowork/infisical-deep-research-results.md). Pre-fix, certctl was an
ACME consumer only — no /acme/directory endpoint, no JWS verifier,
no challenge validators. K8s customers running cert-manager could
not point at certctl as an ACME issuer; they had to deploy a certctl
agent on every node.

What ships:
  - internal/api/acme/{directory,nonce,errors}.go (+ tests).
  - internal/api/handler/acme.go + acme_handler_test.go.
  - internal/repository/postgres/acme.go (nonce ops only — Phase 1b
    extends with account CRUD; Phases 2-4 extend with order / authz /
    challenge CRUD).
  - internal/service/acme.go (BuildDirectory + IssueNonce stubs;
    Phase 1b adds VerifyJWS / NewAccount / etc.).
  - migrations/000025_acme_server.{up,down}.sql ships the full 5-table
    ACME schema (acme_accounts / acme_orders / acme_authorizations /
    acme_challenges / acme_nonces) PLUS the per-profile
    certificate_profiles.acme_auth_mode column. Phase 1a actively
    uses only acme_nonces; remaining tables are empty until Phases
    1b-4 plug in.
  - internal/config/config.go: ACMEServerConfig struct + ACMEServer
    field on Config. Env vars use CERTCTL_ACME_SERVER_* prefix to
    avoid colliding with the existing consumer-side ACMEConfig at
    config.go:1746 (CERTCTL_ACME_DIRECTORY_URL / PROFILE /
    CHALLENGE_TYPE etc.). Phase 1a wires Enabled +
    DefaultAuthMode + DefaultProfileID + NonceTTL + DirectoryMeta;
    Order/Authz TTLs + per-challenge-type concurrency caps + DNS01
    resolver are reserved fields parsed in 1a so operators can set
    them ahead of Phases 2/3.
  - cmd/server/main.go: wire ACMEHandler into the HandlerRegistry
    literal alongside the existing certificate / EST / SCEP / etc.
    handlers.
  - internal/api/router/router.go: HandlerRegistry.ACME field + 6
    Register calls (3 per-profile + 3 shorthand).
  - internal/api/router/openapi_parity_test.go: 6 new entries in
    SpecParityExceptions. ACME is a wire-protocol surface (JWS-signed
    JSON over HTTPS per RFC 7515) whose semantics are dictated by
    RFC 8555 + RFC 9773 rather than by an OpenAPI document, same
    precedent as SCEP/EST. The canonical reference is
    docs/acme-server.md.
  - docs/acme-server.md: Phase-1a-shaped reference. Configuration
    table for every CERTCTL_ACME_SERVER_* env var. Per-profile
    auth-mode decision tree skeleton. TLS trust bootstrap section
    flagging cert-manager's ClusterIssuer.spec.acme.caBundle
    requirement (the single biggest first-time-deploy footgun;
    the full cert-manager walkthrough lands in Phase 6 but the
    requirement is documented up front).

Architecture decisions baked in:
  - URL family is /acme/profile/<id>/* (per-profile, canonical) with
    /acme/* shorthand active when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID
    is set. Path matches existing per-profile precedent in EST + SCEP.
  - Auth mode is per-profile (acme_auth_mode column on
    certificate_profiles), NOT server-wide. One certctl-server can
    serve trust_authenticated for an internal-PKI profile and
    challenge for a public-trust-style profile simultaneously. The
    column is read at request time, not cached at server start —
    operators flipping a profile's mode via SQL take effect on the
    next order without restart.
  - Nonces are DB-backed (acme_nonces table). Survive server restart.
    The RFC 8555 §6.5 replay defense requires the store to outlast
    the client's nonce caching window; an in-memory-only nonce
    store would lose every in-flight order on restart.
  - Per-op atomic counters on service.ACMEService.Metrics() —
    certctl_acme_directory_total, certctl_acme_directory_failures_total,
    certctl_acme_new_nonce_total, certctl_acme_new_nonce_failures_total.
    Naming follows certctl frozen decision 0.10 cardinality discipline.
    Phase 1b will extend with new_account counters; Phase 2 with
    order / finalize / cert; Phase 3 with per-challenge-type counters.

Audit fixes #11 + #12 (cowork/acme-server-prompts/audit-additions.md)
applied:
  - #11: CERTCTL_ACME_SERVER_* prefix avoids the consumer-side
    CERTCTL_ACME_* namespace collision.
  - #12: prior-attempt WIP from two failed Phase-1 dispatches was
    discarded at phase start; this commit starts from a clean tree.

Tests:
  - 14 unit tests in internal/api/acme/ (directory, nonce, errors).
  - 7 handler-level tests via httptest.NewServer + mockACMEService
    (mirrors the mockSCEPService pattern at scep_handler_test.go).
  - 7 service-layer tests with mocked repo + injected profileLookup.
  - All pass under -race -count=1 -short.

Deferred to Phase 1b:
  - JWS verification (go-jose v4 — see master-prompt §8a for the API
    surface and audit doc for the speculation pitfalls).
  - new-account / account/<id> endpoints + AccountService.
  - Nonce *consumption* path (issue path is in this commit; consume
    is only invoked by JWS-verified POSTs which Phase 1b adds).

Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1a".
Per-phase implementation plan: cowork/acme-server-prompts/.
Master plan + audit fixes: cowork/acme-server-endpoint-prompt.md +
cowork/acme-server-prompt-audit.md +
cowork/acme-server-prompts/audit-additions.md.
2026-05-03 12:55:40 +00:00

210 lines
8.2 KiB
Go

// Copyright (c) certctl
// SPDX-License-Identifier: BSL-1.1
package service
import (
"context"
"errors"
"fmt"
"sync/atomic"
"time"
"github.com/shankar0123/certctl/internal/api/acme"
"github.com/shankar0123/certctl/internal/config"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/repository"
)
// ACMERepo is the persistence-layer surface ACMEService consumes for
// nonce + (later phases) account / order / authz / challenge state.
// Phase 1a wires only the nonce path; the interface is tightened in
// Phase 1b along with the AccountService.
//
// Defining the interface in the service package (rather than
// internal/repository/interfaces.go) keeps the cross-phase blast
// radius small: when Phase 1b adds CreateAccountWithTx /
// GetAccountByThumbprint / etc., only this file's interface and the
// concrete postgres ACMERepository move together. Mock implementations
// in tests satisfy this interface without depending on the postgres
// package.
type ACMERepo interface {
IssueNonce(ctx context.Context, nonce string, ttl time.Duration) error
ConsumeNonce(ctx context.Context, nonce string) error
}
// profileLookup is the minimum surface ACMEService needs to resolve a
// per-profile request. Defined as an interface (rather than taking a
// concrete *postgres.ProfileRepository) so tests can inject an in-memory
// fake without spinning up Postgres.
type profileLookup interface {
Get(ctx context.Context, id string) (*domain.CertificateProfile, error)
}
// ACMEService orchestrates the ACME server's RFC 8555 surface. Phase 1a
// implements:
//
// - BuildDirectory: returns the per-profile directory document.
// - IssueNonce: returns a Replay-Nonce, persisted with TTL.
//
// Phase 1b will extend with VerifyJWS, NewAccount, LookupAccount,
// UpdateAccount, DeactivateAccount.
//
// The struct deliberately holds raw config rather than per-field
// extracted values — the directory builder uses 4 of the 11 fields
// and reading them lazily keeps the constructor signature tight.
type ACMEService struct {
repo ACMERepo
profiles profileLookup
cfg config.ACMEServerConfig
metrics *ACMEMetrics
}
// NewACMEService constructs an ACMEService. The constructor matches
// certctl's per-service convention: required dependencies in the
// argument list (repo, profile lookup, config), optional wiring via
// post-construction setters (metrics is wired now to keep the
// Phase-1a-only footprint clean; Phase 1b adds SetTransactor +
// SetAuditService for the JWS-authenticated POST path).
func NewACMEService(repo ACMERepo, profiles profileLookup, cfg config.ACMEServerConfig) *ACMEService {
return &ACMEService{
repo: repo,
profiles: profiles,
cfg: cfg,
metrics: NewACMEMetrics(),
}
}
// Metrics returns the per-op counter snapshotter. cmd/server/main.go
// passes this into MetricsHandler so the Prometheus exposer picks up
// the per-op signals.
func (s *ACMEService) Metrics() *ACMEMetrics { return s.metrics }
// ErrACMEUserActionRequired is returned by BuildDirectory when the
// caller hits the /acme/* shorthand path without
// CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID being set. Handler maps to
// RFC 7807 + RFC 8555 §6.7 userActionRequired.
var ErrACMEUserActionRequired = errors.New("acme: default profile not configured; use /acme/profile/<id>/*")
// ErrACMEProfileNotFound is returned when the profile in the request
// path doesn't exist. Handler maps to HTTP 404 (NOT 500 — the
// distinction is operator-meaningful: 404 says "fix your URL," 500
// says "something is wrong server-side").
var ErrACMEProfileNotFound = errors.New("acme: profile not found")
// BuildDirectory constructs the per-profile directory document.
//
// profileID resolution:
// - non-empty: look up that profile; ErrACMEProfileNotFound on miss.
// - empty + cfg.DefaultProfileID set: substitute the default.
// - empty + cfg.DefaultProfileID unset: ErrACMEUserActionRequired.
//
// baseURL is the per-profile base path the directory's URL fields are
// constructed against. The handler computes baseURL from the inbound
// request (scheme + host + /acme/profile/<id>) and passes it in;
// keeping the URL composition in the handler avoids embedding HTTP
// concerns in the service layer.
//
// On success the metrics counter for the directory op increments;
// failures bump the failure variant of the same counter.
func (s *ACMEService) BuildDirectory(ctx context.Context, profileID, baseURL string) (*acme.Directory, error) {
profileID, err := s.resolveProfile(ctx, profileID)
if err != nil {
s.metrics.bump(&s.metrics.DirectoryFailureTotal)
return nil, err
}
dir := acme.BuildDirectory(
baseURL,
s.cfg.DirectoryMeta.TermsOfService,
s.cfg.DirectoryMeta.Website,
s.cfg.DirectoryMeta.CAAIdentities,
s.cfg.DirectoryMeta.ExternalAccountRequired,
// Phase 1a: ARI is non-functional. The Phase 4 commit flips this
// to true once the renewal-info handler ships.
false,
)
_ = profileID // Phase 1b will use the resolved profile to read
// acme_auth_mode + record per-profile metrics. Phase 1a
// only needs the existence check above.
s.metrics.bump(&s.metrics.DirectoryTotal)
return dir, nil
}
// IssueNonce generates a fresh ACME nonce, persists it with the
// configured TTL, and returns the encoded string for the
// Replay-Nonce header.
//
// RFC 8555 §6.5: every successful ACME response carries a
// Replay-Nonce. Phase 1a wires this via the directory + new-nonce
// handlers; Phase 1b extends with new-account + account/<id> POST
// responses (the JWS-authenticated paths).
func (s *ACMEService) IssueNonce(ctx context.Context) (string, error) {
nonce, err := acme.GenerateNonce()
if err != nil {
s.metrics.bump(&s.metrics.NewNonceFailureTotal)
return "", fmt.Errorf("acme: generate nonce: %w", err)
}
if err := s.repo.IssueNonce(ctx, nonce, s.cfg.NonceTTL); err != nil {
s.metrics.bump(&s.metrics.NewNonceFailureTotal)
return "", fmt.Errorf("acme: persist nonce: %w", err)
}
s.metrics.bump(&s.metrics.NewNonceTotal)
return nonce, nil
}
// resolveProfile applies the default-profile fallback and confirms the
// profile exists. Returns the resolved (canonical) profileID on
// success. Centralizing the resolution here keeps every Phase
// 1a/1b/2/3/4 endpoint's "which profile is this request bound to"
// logic uniform.
func (s *ACMEService) resolveProfile(ctx context.Context, profileID string) (string, error) {
if profileID == "" {
if s.cfg.DefaultProfileID == "" {
return "", ErrACMEUserActionRequired
}
profileID = s.cfg.DefaultProfileID
}
_, err := s.profiles.Get(ctx, profileID)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
return "", ErrACMEProfileNotFound
}
return "", fmt.Errorf("acme: lookup profile: %w", err)
}
return profileID, nil
}
// ACMEMetrics is the per-op counter table for the ACME server. Mirrors
// the IssuanceMetrics / DeployCounters pattern (atomic.Uint64 + a
// Snapshot method that emits stable tuples). Phase 1a tracks just
// directory + new-nonce; subsequent phases add new-account / new-order
// / etc.
type ACMEMetrics struct {
DirectoryTotal atomic.Uint64
DirectoryFailureTotal atomic.Uint64
NewNonceTotal atomic.Uint64
NewNonceFailureTotal atomic.Uint64
}
// NewACMEMetrics returns a zeroed counter table. Concurrent callers
// can bump counters without external synchronization (atomic.Uint64
// is the synchronization primitive).
func NewACMEMetrics() *ACMEMetrics { return &ACMEMetrics{} }
// bump increments a single atomic counter. Centralized so the call
// sites in BuildDirectory + IssueNonce are uniform.
func (m *ACMEMetrics) bump(c *atomic.Uint64) { c.Add(1) }
// Snapshot emits the current counter values as a map (op → count).
// Naming is certctl_acme_<op>_total per frozen decision 0.10
// (cardinality discipline) so the Prometheus exposer can lift them
// directly without per-op stringly-typed branching.
func (m *ACMEMetrics) Snapshot() map[string]uint64 {
return map[string]uint64{
"certctl_acme_directory_total": m.DirectoryTotal.Load(),
"certctl_acme_directory_failures_total": m.DirectoryFailureTotal.Load(),
"certctl_acme_new_nonce_total": m.NewNonceTotal.Load(),
"certctl_acme_new_nonce_failures_total": m.NewNonceFailureTotal.Load(),
}
}