mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 11:58:52 +00:00
e146b00f0e
First slice of the RFC 8555 ACME server endpoint (master plan at cowork/acme-server-endpoint-prompt.md, per-phase prompts at cowork/acme-server-prompts/). This commit lands the smallest viable end-to-end deployable slice: an ACME client running curl -sk https://certctl/acme/profile/<id>/directory curl -sk -I https://certctl/acme/profile/<id>/new-nonce successfully fetches the directory document and a Replay-Nonce. Account creation, JWS verification, orders, challenges, and revocation are all out of scope for this phase and arrive in Phases 1b–4. Closes the Rank 1 LHF from the 2026-05-03 Infisical deep-research (cowork/infisical-deep-research-results.md). Pre-fix, certctl was an ACME consumer only — no /acme/directory endpoint, no JWS verifier, no challenge validators. K8s customers running cert-manager could not point at certctl as an ACME issuer; they had to deploy a certctl agent on every node. What ships: - internal/api/acme/{directory,nonce,errors}.go (+ tests). - internal/api/handler/acme.go + acme_handler_test.go. - internal/repository/postgres/acme.go (nonce ops only — Phase 1b extends with account CRUD; Phases 2-4 extend with order / authz / challenge CRUD). - internal/service/acme.go (BuildDirectory + IssueNonce stubs; Phase 1b adds VerifyJWS / NewAccount / etc.). - migrations/000025_acme_server.{up,down}.sql ships the full 5-table ACME schema (acme_accounts / acme_orders / acme_authorizations / acme_challenges / acme_nonces) PLUS the per-profile certificate_profiles.acme_auth_mode column. Phase 1a actively uses only acme_nonces; remaining tables are empty until Phases 1b-4 plug in. - internal/config/config.go: ACMEServerConfig struct + ACMEServer field on Config. Env vars use CERTCTL_ACME_SERVER_* prefix to avoid colliding with the existing consumer-side ACMEConfig at config.go:1746 (CERTCTL_ACME_DIRECTORY_URL / PROFILE / CHALLENGE_TYPE etc.). Phase 1a wires Enabled + DefaultAuthMode + DefaultProfileID + NonceTTL + DirectoryMeta; Order/Authz TTLs + per-challenge-type concurrency caps + DNS01 resolver are reserved fields parsed in 1a so operators can set them ahead of Phases 2/3. - cmd/server/main.go: wire ACMEHandler into the HandlerRegistry literal alongside the existing certificate / EST / SCEP / etc. handlers. - internal/api/router/router.go: HandlerRegistry.ACME field + 6 Register calls (3 per-profile + 3 shorthand). - internal/api/router/openapi_parity_test.go: 6 new entries in SpecParityExceptions. ACME is a wire-protocol surface (JWS-signed JSON over HTTPS per RFC 7515) whose semantics are dictated by RFC 8555 + RFC 9773 rather than by an OpenAPI document, same precedent as SCEP/EST. The canonical reference is docs/acme-server.md. - docs/acme-server.md: Phase-1a-shaped reference. Configuration table for every CERTCTL_ACME_SERVER_* env var. Per-profile auth-mode decision tree skeleton. TLS trust bootstrap section flagging cert-manager's ClusterIssuer.spec.acme.caBundle requirement (the single biggest first-time-deploy footgun; the full cert-manager walkthrough lands in Phase 6 but the requirement is documented up front). Architecture decisions baked in: - URL family is /acme/profile/<id>/* (per-profile, canonical) with /acme/* shorthand active when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID is set. Path matches existing per-profile precedent in EST + SCEP. - Auth mode is per-profile (acme_auth_mode column on certificate_profiles), NOT server-wide. One certctl-server can serve trust_authenticated for an internal-PKI profile and challenge for a public-trust-style profile simultaneously. The column is read at request time, not cached at server start — operators flipping a profile's mode via SQL take effect on the next order without restart. - Nonces are DB-backed (acme_nonces table). Survive server restart. The RFC 8555 §6.5 replay defense requires the store to outlast the client's nonce caching window; an in-memory-only nonce store would lose every in-flight order on restart. - Per-op atomic counters on service.ACMEService.Metrics() — certctl_acme_directory_total, certctl_acme_directory_failures_total, certctl_acme_new_nonce_total, certctl_acme_new_nonce_failures_total. Naming follows certctl frozen decision 0.10 cardinality discipline. Phase 1b will extend with new_account counters; Phase 2 with order / finalize / cert; Phase 3 with per-challenge-type counters. Audit fixes #11 + #12 (cowork/acme-server-prompts/audit-additions.md) applied: - #11: CERTCTL_ACME_SERVER_* prefix avoids the consumer-side CERTCTL_ACME_* namespace collision. - #12: prior-attempt WIP from two failed Phase-1 dispatches was discarded at phase start; this commit starts from a clean tree. Tests: - 14 unit tests in internal/api/acme/ (directory, nonce, errors). - 7 handler-level tests via httptest.NewServer + mockACMEService (mirrors the mockSCEPService pattern at scep_handler_test.go). - 7 service-layer tests with mocked repo + injected profileLookup. - All pass under -race -count=1 -short. Deferred to Phase 1b: - JWS verification (go-jose v4 — see master-prompt §8a for the API surface and audit doc for the speculation pitfalls). - new-account / account/<id> endpoints + AccountService. - Nonce *consumption* path (issue path is in this commit; consume is only invoked by JWS-verified POSTs which Phase 1b adds). Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1a". Per-phase implementation plan: cowork/acme-server-prompts/. Master plan + audit fixes: cowork/acme-server-endpoint-prompt.md + cowork/acme-server-prompt-audit.md + cowork/acme-server-prompts/audit-additions.md.
210 lines
8.2 KiB
Go
210 lines
8.2 KiB
Go
// Copyright (c) certctl
|
|
// SPDX-License-Identifier: BSL-1.1
|
|
|
|
package service
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
"fmt"
|
|
"sync/atomic"
|
|
"time"
|
|
|
|
"github.com/shankar0123/certctl/internal/api/acme"
|
|
"github.com/shankar0123/certctl/internal/config"
|
|
"github.com/shankar0123/certctl/internal/domain"
|
|
"github.com/shankar0123/certctl/internal/repository"
|
|
)
|
|
|
|
// ACMERepo is the persistence-layer surface ACMEService consumes for
|
|
// nonce + (later phases) account / order / authz / challenge state.
|
|
// Phase 1a wires only the nonce path; the interface is tightened in
|
|
// Phase 1b along with the AccountService.
|
|
//
|
|
// Defining the interface in the service package (rather than
|
|
// internal/repository/interfaces.go) keeps the cross-phase blast
|
|
// radius small: when Phase 1b adds CreateAccountWithTx /
|
|
// GetAccountByThumbprint / etc., only this file's interface and the
|
|
// concrete postgres ACMERepository move together. Mock implementations
|
|
// in tests satisfy this interface without depending on the postgres
|
|
// package.
|
|
type ACMERepo interface {
|
|
IssueNonce(ctx context.Context, nonce string, ttl time.Duration) error
|
|
ConsumeNonce(ctx context.Context, nonce string) error
|
|
}
|
|
|
|
// profileLookup is the minimum surface ACMEService needs to resolve a
|
|
// per-profile request. Defined as an interface (rather than taking a
|
|
// concrete *postgres.ProfileRepository) so tests can inject an in-memory
|
|
// fake without spinning up Postgres.
|
|
type profileLookup interface {
|
|
Get(ctx context.Context, id string) (*domain.CertificateProfile, error)
|
|
}
|
|
|
|
// ACMEService orchestrates the ACME server's RFC 8555 surface. Phase 1a
|
|
// implements:
|
|
//
|
|
// - BuildDirectory: returns the per-profile directory document.
|
|
// - IssueNonce: returns a Replay-Nonce, persisted with TTL.
|
|
//
|
|
// Phase 1b will extend with VerifyJWS, NewAccount, LookupAccount,
|
|
// UpdateAccount, DeactivateAccount.
|
|
//
|
|
// The struct deliberately holds raw config rather than per-field
|
|
// extracted values — the directory builder uses 4 of the 11 fields
|
|
// and reading them lazily keeps the constructor signature tight.
|
|
type ACMEService struct {
|
|
repo ACMERepo
|
|
profiles profileLookup
|
|
cfg config.ACMEServerConfig
|
|
metrics *ACMEMetrics
|
|
}
|
|
|
|
// NewACMEService constructs an ACMEService. The constructor matches
|
|
// certctl's per-service convention: required dependencies in the
|
|
// argument list (repo, profile lookup, config), optional wiring via
|
|
// post-construction setters (metrics is wired now to keep the
|
|
// Phase-1a-only footprint clean; Phase 1b adds SetTransactor +
|
|
// SetAuditService for the JWS-authenticated POST path).
|
|
func NewACMEService(repo ACMERepo, profiles profileLookup, cfg config.ACMEServerConfig) *ACMEService {
|
|
return &ACMEService{
|
|
repo: repo,
|
|
profiles: profiles,
|
|
cfg: cfg,
|
|
metrics: NewACMEMetrics(),
|
|
}
|
|
}
|
|
|
|
// Metrics returns the per-op counter snapshotter. cmd/server/main.go
|
|
// passes this into MetricsHandler so the Prometheus exposer picks up
|
|
// the per-op signals.
|
|
func (s *ACMEService) Metrics() *ACMEMetrics { return s.metrics }
|
|
|
|
// ErrACMEUserActionRequired is returned by BuildDirectory when the
|
|
// caller hits the /acme/* shorthand path without
|
|
// CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID being set. Handler maps to
|
|
// RFC 7807 + RFC 8555 §6.7 userActionRequired.
|
|
var ErrACMEUserActionRequired = errors.New("acme: default profile not configured; use /acme/profile/<id>/*")
|
|
|
|
// ErrACMEProfileNotFound is returned when the profile in the request
|
|
// path doesn't exist. Handler maps to HTTP 404 (NOT 500 — the
|
|
// distinction is operator-meaningful: 404 says "fix your URL," 500
|
|
// says "something is wrong server-side").
|
|
var ErrACMEProfileNotFound = errors.New("acme: profile not found")
|
|
|
|
// BuildDirectory constructs the per-profile directory document.
|
|
//
|
|
// profileID resolution:
|
|
// - non-empty: look up that profile; ErrACMEProfileNotFound on miss.
|
|
// - empty + cfg.DefaultProfileID set: substitute the default.
|
|
// - empty + cfg.DefaultProfileID unset: ErrACMEUserActionRequired.
|
|
//
|
|
// baseURL is the per-profile base path the directory's URL fields are
|
|
// constructed against. The handler computes baseURL from the inbound
|
|
// request (scheme + host + /acme/profile/<id>) and passes it in;
|
|
// keeping the URL composition in the handler avoids embedding HTTP
|
|
// concerns in the service layer.
|
|
//
|
|
// On success the metrics counter for the directory op increments;
|
|
// failures bump the failure variant of the same counter.
|
|
func (s *ACMEService) BuildDirectory(ctx context.Context, profileID, baseURL string) (*acme.Directory, error) {
|
|
profileID, err := s.resolveProfile(ctx, profileID)
|
|
if err != nil {
|
|
s.metrics.bump(&s.metrics.DirectoryFailureTotal)
|
|
return nil, err
|
|
}
|
|
dir := acme.BuildDirectory(
|
|
baseURL,
|
|
s.cfg.DirectoryMeta.TermsOfService,
|
|
s.cfg.DirectoryMeta.Website,
|
|
s.cfg.DirectoryMeta.CAAIdentities,
|
|
s.cfg.DirectoryMeta.ExternalAccountRequired,
|
|
// Phase 1a: ARI is non-functional. The Phase 4 commit flips this
|
|
// to true once the renewal-info handler ships.
|
|
false,
|
|
)
|
|
_ = profileID // Phase 1b will use the resolved profile to read
|
|
// acme_auth_mode + record per-profile metrics. Phase 1a
|
|
// only needs the existence check above.
|
|
s.metrics.bump(&s.metrics.DirectoryTotal)
|
|
return dir, nil
|
|
}
|
|
|
|
// IssueNonce generates a fresh ACME nonce, persists it with the
|
|
// configured TTL, and returns the encoded string for the
|
|
// Replay-Nonce header.
|
|
//
|
|
// RFC 8555 §6.5: every successful ACME response carries a
|
|
// Replay-Nonce. Phase 1a wires this via the directory + new-nonce
|
|
// handlers; Phase 1b extends with new-account + account/<id> POST
|
|
// responses (the JWS-authenticated paths).
|
|
func (s *ACMEService) IssueNonce(ctx context.Context) (string, error) {
|
|
nonce, err := acme.GenerateNonce()
|
|
if err != nil {
|
|
s.metrics.bump(&s.metrics.NewNonceFailureTotal)
|
|
return "", fmt.Errorf("acme: generate nonce: %w", err)
|
|
}
|
|
if err := s.repo.IssueNonce(ctx, nonce, s.cfg.NonceTTL); err != nil {
|
|
s.metrics.bump(&s.metrics.NewNonceFailureTotal)
|
|
return "", fmt.Errorf("acme: persist nonce: %w", err)
|
|
}
|
|
s.metrics.bump(&s.metrics.NewNonceTotal)
|
|
return nonce, nil
|
|
}
|
|
|
|
// resolveProfile applies the default-profile fallback and confirms the
|
|
// profile exists. Returns the resolved (canonical) profileID on
|
|
// success. Centralizing the resolution here keeps every Phase
|
|
// 1a/1b/2/3/4 endpoint's "which profile is this request bound to"
|
|
// logic uniform.
|
|
func (s *ACMEService) resolveProfile(ctx context.Context, profileID string) (string, error) {
|
|
if profileID == "" {
|
|
if s.cfg.DefaultProfileID == "" {
|
|
return "", ErrACMEUserActionRequired
|
|
}
|
|
profileID = s.cfg.DefaultProfileID
|
|
}
|
|
_, err := s.profiles.Get(ctx, profileID)
|
|
if err != nil {
|
|
if errors.Is(err, repository.ErrNotFound) {
|
|
return "", ErrACMEProfileNotFound
|
|
}
|
|
return "", fmt.Errorf("acme: lookup profile: %w", err)
|
|
}
|
|
return profileID, nil
|
|
}
|
|
|
|
// ACMEMetrics is the per-op counter table for the ACME server. Mirrors
|
|
// the IssuanceMetrics / DeployCounters pattern (atomic.Uint64 + a
|
|
// Snapshot method that emits stable tuples). Phase 1a tracks just
|
|
// directory + new-nonce; subsequent phases add new-account / new-order
|
|
// / etc.
|
|
type ACMEMetrics struct {
|
|
DirectoryTotal atomic.Uint64
|
|
DirectoryFailureTotal atomic.Uint64
|
|
NewNonceTotal atomic.Uint64
|
|
NewNonceFailureTotal atomic.Uint64
|
|
}
|
|
|
|
// NewACMEMetrics returns a zeroed counter table. Concurrent callers
|
|
// can bump counters without external synchronization (atomic.Uint64
|
|
// is the synchronization primitive).
|
|
func NewACMEMetrics() *ACMEMetrics { return &ACMEMetrics{} }
|
|
|
|
// bump increments a single atomic counter. Centralized so the call
|
|
// sites in BuildDirectory + IssueNonce are uniform.
|
|
func (m *ACMEMetrics) bump(c *atomic.Uint64) { c.Add(1) }
|
|
|
|
// Snapshot emits the current counter values as a map (op → count).
|
|
// Naming is certctl_acme_<op>_total per frozen decision 0.10
|
|
// (cardinality discipline) so the Prometheus exposer can lift them
|
|
// directly without per-op stringly-typed branching.
|
|
func (m *ACMEMetrics) Snapshot() map[string]uint64 {
|
|
return map[string]uint64{
|
|
"certctl_acme_directory_total": m.DirectoryTotal.Load(),
|
|
"certctl_acme_directory_failures_total": m.DirectoryFailureTotal.Load(),
|
|
"certctl_acme_new_nonce_total": m.NewNonceTotal.Load(),
|
|
"certctl_acme_new_nonce_failures_total": m.NewNonceFailureTotal.Load(),
|
|
}
|
|
}
|