refactor(service/acme): extract orders concern to sibling file (Phase 9, 9b — deferred half of Sprint 9)

Phase 9 ARCH-M2 closure Sprint 9b — the orders cut Sprint 9
explicitly deferred. Closes the bigger half of the
internal/service/acme.go split via the Option B sibling-file pattern
(operator's post-Sprint-8 choice — package stays `service`, no
import-path churn for ~70 call sites).

Why Sprint 9b is a separate commit from Sprint 9
================================================
Sprint 9 shipped four cuts whose source ranges were each a single
contiguous region in acme.go (nonces, authz, challenges, gc — line
ranges 423-444 / 999-1018 / 1326-1561 / 1914-1965 at audit time).
Sprint 9b crosses a different shape:
  1. Non-contiguous source: orders block A (lines 795-1223 pre-cut)
     + helpers block B (1237-1283 pre-cut), with
     firstAvailableIssuer at 1227-1235 staying behind because it's
     called from Phase 4 RevokeCert + RenewalInfo too.
  2. Per-helper move-vs-stay decision: each helper in the
     post-FinalizeOrder cluster needed an explicit call-graph audit
     to decide whether it moves with orders or stays with the
     surviving cross-concern surface in acme.go.

Same shape as the Sprint 8 / Sprint 8b split (mechanical vs harder-
shape on separate commits) — the Phase 9 prompt's "do not bundle"
rule enforcing itself.

What moved
==========

New `internal/service/acme_orders.go` (540 LOC)
-----------------------------------------------
Contains the entire Phase 2 orders concern:
  - The `// --- Phase 2 — orders + authz + finalize + cert download`
    banner (moves with its contents, not left as a phantom in
    acme.go pointing at code that's no longer there).
  - The four public order methods: CreateOrder, LookupOrder,
    FinalizeOrder, LookupCertificate.
  - The FinalizeOrderResult shape (consumed only by FinalizeOrder
    callers).
  - accountOwnsACMECert (only callsite: LookupCertificate).
  - The three orders-internal ID helpers: randIDSuffix +
    base32encode (random ACME entity IDs) + identifierStrings
    (audit details).

Per-helper move-vs-stay analysis
================================
Grep against the post-Sprint-9 tree pinned every helper's call sites
before the cut decision:

  randIDSuffix:           callers in CreateOrder (4x) + FinalizeOrder
                          (1x) — all moving. MOVE.
  base32encode:           only caller is randIDSuffix. MOVE.
  identifierStrings:      only caller is CreateOrder. MOVE.
  accountOwnsACMECert:    only caller is LookupCertificate. MOVE.

  firstAvailableIssuer:   three call sites — FinalizeOrder (moving),
                          RevokeCert (staying, Phase 4), RenewalInfo
                          (staying, Phase 4). STAY in acme.go.
                          Doc-comment updated to flag cross-concern
                          status + explain why it's not moved.
  mapACMERevocationReason: only caller is RevokeCert. STAY (already
                          sits in the Phase 4 region of acme.go and
                          belongs with its sole caller).
  jwksThumbprintsEqualSvc: only caller is RotateAccountKey. STAY
                          (Phase 4 helper; never had an orders
                          relationship).

Side effect: import cleanup
===========================
With randIDSuffix moved, acme.go no longer references crypto/rand.
The `cryptorand "crypto/rand"` aliased import is removed.
Per-symbol audit confirmed every other import (context, crypto/x509,
errors, fmt, strings, sync/atomic, time, jose, internal/api/acme,
internal/config, internal/domain, internal/repository) is still
consumed by surviving code in acme.go.

Net effect
==========
acme.go: 1634 → 1158 LOC pre-doc-update; 1162 LOC post the four-line
firstAvailableIssuer doc-comment refresh (-472 net, -28.9% from the
post-Sprint-9 size). Original audit-time size was 1965 LOC; cumulative
Sprint-9 + Sprint-9b reduction: 1965 → 1162 = -803 LOC (-40.9%).
The biggest single backend hotspot from the audit is now smaller
than mcp/tools.go.

Behavior preservation contract
==============================
1. gofmt -l clean across acme.go + acme_orders.go.
2. go vet ./internal/service/... — no findings.
3. staticcheck ./internal/service/... ./cmd/server/...
   ./internal/api/handler/... ./internal/scheduler/...
   ./internal/mcp/... — no findings.
4. go test -short -count=1 ./internal/service/... — green
   (including the orderTrackingRepo + TestCreateOrder_* +
   TestFinalizeOrder_* + TestLookupCertificate_* surface that
   pins the moved code's behavior).
5. Broader-importer suite green:
   go test -short -count=1 ./cmd/server/... ./internal/api/handler/...
                          ./internal/scheduler/...
6. Per-symbol import audit on both files (no unused imports left,
   no missing imports introduced).

Same-package resolution means every call inside FinalizeOrder /
RevokeCert / RenewalInfo to firstAvailableIssuer crosses a file
boundary but stays within `package service` — zero overhead at
compile time, zero change to the public method-set on
service.ACMEService.

What remains for Phase 9
========================
Three sibling-file splits queued for Sprints 10-12:
  - Sprint 10: internal/mcp/tools.go (1867 LOC) grouped by tool
    domain (certificate / agent / job / discovery / admin).
  - Sprint 11: internal/api/handler/auth_session_oidc.go (1577 LOC)
    split per handler verb.
  - Sprint 12: cmd/agent/main.go (1489 LOC) mirroring the cmd/server
    pattern from Sprint 8.

Refs: ARCH-M2 (god-files), Phase 9 audit. Sprint 9b is the named
follow-on to Sprint 9; after this commit, the service-layer cut from
the audit's hotspot list is fully closed.
This commit is contained in:
shankar0123
2026-05-14 10:06:06 +00:00
parent b503d27b4f
commit b1fa4970be
2 changed files with 546 additions and 478 deletions
+6 -478
View File
@@ -5,7 +5,6 @@ package service
import (
"context"
cryptorand "crypto/rand"
"crypto/x509"
"errors"
"fmt"
@@ -792,437 +791,13 @@ func (s *ACMEService) DeactivateAccount(ctx context.Context, accountID string) (
return acct, nil
}
// --- Phase 2 — orders + authz + finalize + cert download ---------------
// CreateOrder validates a new-order request against the bound profile
// and persists the order + per-identifier authz + per-authz challenge
// rows in one WithinTx. Returns the created order on success.
//
// Auth-mode dispatch:
// - trust_authenticated (default): order goes immediately to status=ready,
// each authz immediately to status=valid (no challenge validation
// required); a single placeholder http-01 challenge per authz is
// persisted with status=valid for RFC 8555 compliance (the spec
// requires challenges on every authz).
// - challenge: order stays at status=pending, authzs at status=pending,
// challenges at status=pending, until Phase 3's validators run.
func (s *ACMEService) CreateOrder(
ctx context.Context,
accountID, profileID string,
identifiers []domain.ACMEIdentifier,
notBefore, notAfter *time.Time,
) (*domain.ACMEOrder, error) {
if s.tx == nil || s.auditService == nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("acme: new-order requires SetTransactor + SetAuditService")
}
// Phase 5 — per-account orders/hour cap. Hits return rateLimited
// (RFC 8555 §6.7) before any DB work. Counter is in-memory; restart
// wipes (eventual-consistency caps are acceptable).
if s.rateLimiter != nil && s.cfg.RateLimitOrdersPerHour > 0 {
if !s.rateLimiter.Allow(acme.ActionNewOrder, accountID, s.cfg.RateLimitOrdersPerHour) {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, ErrACMERateLimited
}
}
// Phase 5 — concurrent-orders cap. We count
// pending/ready/processing orders for this account; if at-or-over
// the cap, reject. This is a DB read (no FOR UPDATE), so two
// requests racing under the threshold can both succeed and push
// the account one over — accepted as eventual-consistency.
if s.cfg.RateLimitConcurrentOrders > 0 {
count, cerr := s.repo.CountActiveOrdersByAccount(ctx, accountID)
if cerr == nil && count >= s.cfg.RateLimitConcurrentOrders {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, ErrACMEConcurrentOrdersExceeded
}
}
resolvedProfileID, err := s.resolveProfile(ctx, profileID)
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, err
}
profile, err := s.profiles.Get(ctx, resolvedProfileID)
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("acme: lookup profile: %w", err)
}
authMode := profile.ACMEAuthMode
if authMode == "" {
authMode = string(s.cfg.DefaultAuthMode)
}
if authMode == "" {
authMode = "trust_authenticated"
}
if authMode != "trust_authenticated" && authMode != "challenge" {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("%w: %q", ErrACMEUnsupportedAuthMode, authMode)
}
now := time.Now().UTC()
orderTTL := s.cfg.OrderTTL
if orderTTL <= 0 {
orderTTL = 24 * time.Hour
}
authzTTL := s.cfg.AuthzTTL
if authzTTL <= 0 {
authzTTL = 24 * time.Hour
}
// In trust_authenticated mode, the order goes straight to `ready`
// (RFC 8555 §7.1.6: ready means all authzs valid, awaiting CSR).
// In challenge mode, the order stays `pending` until challenges
// validate.
orderStatus := domain.ACMEOrderStatusPending
authzStatus := domain.ACMEAuthzStatusPending
challengeStatus := domain.ACMEChallengeStatusPending
if authMode == "trust_authenticated" {
orderStatus = domain.ACMEOrderStatusReady
authzStatus = domain.ACMEAuthzStatusValid
challengeStatus = domain.ACMEChallengeStatusValid
}
order := &domain.ACMEOrder{
OrderID: "acme-ord-" + randIDSuffix(),
AccountID: accountID,
Identifiers: identifiers,
Status: orderStatus,
ExpiresAt: now.Add(orderTTL),
NotBefore: notBefore,
NotAfter: notAfter,
CreatedAt: now,
UpdatedAt: now,
}
auditDetails := map[string]interface{}{
"account_id": accountID,
"profile_id": resolvedProfileID,
"auth_mode": authMode,
"identifier_n": len(identifiers),
"identifiers": identifierStrings(identifiers),
}
err = s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.repo.CreateOrderWithTx(ctx, q, order); err != nil {
return fmt.Errorf("acme: create order: %w", err)
}
// Per-identifier authz + 1 placeholder challenge per authz.
for _, id := range identifiers {
authz := &domain.ACMEAuthorization{
AuthzID: "acme-authz-" + randIDSuffix(),
OrderID: order.OrderID,
Identifier: id,
Status: authzStatus,
ExpiresAt: now.Add(authzTTL),
Wildcard: strings.HasPrefix(id.Value, "*."),
CreatedAt: now,
UpdatedAt: now,
}
if err := s.repo.CreateAuthzWithTx(ctx, q, authz); err != nil {
return fmt.Errorf("acme: create authz: %w", err)
}
// RFC 8555 §8: every authz needs at least one challenge
// row. Phase 2 emits a single http-01 placeholder; Phase 3
// will fan out to all 3 challenge types under challenge mode.
ch := &domain.ACMEChallenge{
ChallengeID: "acme-chall-" + randIDSuffix(),
AuthzID: authz.AuthzID,
Type: domain.ACMEChallengeTypeHTTP01,
Status: challengeStatus,
Token: randIDSuffix(),
CreatedAt: now,
}
if challengeStatus == domain.ACMEChallengeStatusValid {
validatedAt := now
ch.ValidatedAt = &validatedAt
}
if err := s.repo.CreateChallengeWithTx(ctx, q, ch); err != nil {
return fmt.Errorf("acme: create challenge: %w", err)
}
}
return s.auditService.RecordEventWithTx(
ctx, q,
fmt.Sprintf("acme:%s", accountID),
domain.ActorTypeUser,
"acme_order_created",
"acme_order",
order.OrderID,
auditDetails,
)
})
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, err
}
s.metrics.bump(&s.metrics.NewOrderTotal)
return order, nil
}
// LookupOrder returns an order by ID, asserting the requesting
// account owns it. ErrACMEOrderUnauthorized when account_id mismatches.
func (s *ACMEService) LookupOrder(ctx context.Context, orderID, accountID string) (*domain.ACMEOrder, error) {
order, err := s.repo.GetOrderByID(ctx, orderID)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
return nil, ErrACMEOrderNotFound
}
return nil, fmt.Errorf("acme: lookup order: %w", err)
}
if order.AccountID != accountID {
return nil, ErrACMEOrderUnauthorized
}
return order, nil
}
// FinalizeOrderResult bundles the post-finalize state the handler
// needs: the updated order + the cert ID for the cert-download URL.
type FinalizeOrderResult struct {
Order *domain.ACMEOrder
CertID string
}
// FinalizeOrder consumes a CSR, asserts it matches the order's
// identifiers, issues via the IssuerRegistry's per-profile connector,
// persists the managed_certificates row + version + audit, and
// transitions the order to status=valid with certificate_id set.
//
// Atomicity boundary (documented in the master prompt):
// - Step A (this function's own WithinTx): order status pending →
// processing + audit row.
// - Step B (CertificateService.Create): managed_certificates row +
// audit row in its own WithinTx.
// - Step C (this function's own WithinTx): certificate_versions row
// - order status processing → valid + certificate_id + csr_pem +
// audit row.
//
// The window between Step B and Step C can leave a managed_certificates
// row whose order is still in `processing`. Phase 5's GC scheduler
// reconciles. Documented in the project's ACME-server design notes + the
// service file's design notes.
func (s *ACMEService) FinalizeOrder(
ctx context.Context,
accountID, orderID, profileID string,
csr *x509.CertificateRequest,
csrPEM string,
) (*FinalizeOrderResult, error) {
if s.certService == nil || s.certRepo == nil || s.issuerRegistry == nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, ErrACMEFinalizeUnconfigured
}
if s.tx == nil || s.auditService == nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: finalize requires SetTransactor + SetAuditService")
}
order, err := s.LookupOrder(ctx, orderID, accountID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
if order.Status != domain.ACMEOrderStatusReady && order.Status != domain.ACMEOrderStatusProcessing {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("%w: status=%s", ErrACMEOrderNotReady, order.Status)
}
// Idempotent re-finalize (RFC 8555 §7.4): if the order is already
// valid, return the existing result.
if order.Status == domain.ACMEOrderStatusValid && order.CertificateID != "" {
s.metrics.bump(&s.metrics.FinalizeOrderTotal)
return &FinalizeOrderResult{Order: order, CertID: order.CertificateID}, nil
}
// Validate CSR matches order identifiers.
if p := acme.CSRMatchesIdentifiers(csr, order.Identifiers); p != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
// Persist the failure on the order for client visibility.
order.Status = domain.ACMEOrderStatusInvalid
order.Error = &domain.ACMEProblem{Type: p.Type, Detail: p.Detail, Status: p.Status}
_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
return s.repo.UpdateOrderWithTx(ctx, q, order)
})
return nil, fmt.Errorf("acme: csr mismatch: %s", p.Detail)
}
resolvedProfileID, err := s.resolveProfile(ctx, profileID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
profile, err := s.profiles.Get(ctx, resolvedProfileID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: lookup profile: %w", err)
}
// Step A: mark order processing.
order.Status = domain.ACMEOrderStatusProcessing
if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
return err
}
return s.auditService.RecordEventWithTx(ctx, q,
fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
"acme_order_processing", "acme_order", order.OrderID,
map[string]interface{}{"profile_id": resolvedProfileID})
}); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
// Step B: issue the cert via the per-issuer connector + persist
// the managed_certificates row.
commonName := csr.Subject.CommonName
if commonName == "" && len(order.Identifiers) > 0 {
commonName = order.Identifiers[0].Value
}
sans := make([]string, 0, len(order.Identifiers))
for _, id := range order.Identifiers {
if id.Type == "dns" {
sans = append(sans, id.Value)
}
}
// Resolve the bound issuer. Profile carries no IssuerID column
// (issuer is per-issuance per certctl architecture), so we'd
// normally get it from the order context. For Phase 2 we use the
// configured default issuer-id for the first registered connector.
// Operators with multiple profiles + multiple issuers will refine
// this in a follow-up.
issuerID, conn, ok := s.firstAvailableIssuer()
if !ok {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: no issuer available in registry")
}
maxTTL := profile.MaxTTLSeconds
mustStaple := profile.MustStaple
ekus := profile.AllowedEKUs
if len(ekus) == 0 {
ekus = domain.DefaultEKUs()
}
issuance, err := conn.IssueCertificate(ctx, commonName, sans, csrPEM, ekus, maxTTL, mustStaple)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
// Persist the failure on the order.
order.Status = domain.ACMEOrderStatusInvalid
order.Error = &domain.ACMEProblem{
Type: "urn:ietf:params:acme:error:serverInternal",
Detail: "issuer rejected the CSR",
Status: 500,
}
_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
return s.repo.UpdateOrderWithTx(ctx, q, order)
})
return nil, fmt.Errorf("acme: issuer issuance: %w", err)
}
cert := &domain.ManagedCertificate{
ID: "mc-acme-" + randIDSuffix(),
Name: fmt.Sprintf("acme-%s", order.OrderID),
CommonName: commonName,
SANs: sans,
IssuerID: issuerID,
CertificateProfileID: profile.ID,
Status: domain.CertificateStatusActive,
ExpiresAt: issuance.NotAfter,
Source: domain.CertificateSourceACME,
}
actor := fmt.Sprintf("acme:%s", accountID)
if err := s.certService.Create(ctx, cert, actor); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: cert insert: %w", err)
}
// Step C: persist the certificate version + transition order to
// valid in one WithinTx.
version := &domain.CertificateVersion{
CertificateID: cert.ID,
SerialNumber: issuance.Serial,
NotBefore: issuance.NotBefore,
NotAfter: issuance.NotAfter,
PEMChain: issuance.CertPEM + issuance.ChainPEM,
CSRPEM: csrPEM,
}
order.Status = domain.ACMEOrderStatusValid
order.CSRPEM = csrPEM
order.CertificateID = cert.ID
order.Error = nil
if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.certRepo.CreateVersionWithTx(ctx, q, version); err != nil {
return err
}
if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
return err
}
return s.auditService.RecordEventWithTx(ctx, q, actor, domain.ActorTypeUser,
"acme_order_finalized", "acme_order", order.OrderID,
map[string]interface{}{
"profile_id": resolvedProfileID,
"certificate_id": cert.ID,
"serial": issuance.Serial,
})
}); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
s.metrics.bump(&s.metrics.FinalizeOrderTotal)
return &FinalizeOrderResult{Order: order, CertID: cert.ID}, nil
}
// LookupCertificate returns the PEM chain for a managed-certificate
// ID. Asserts the requesting account owns the cert via the order
// linkage. Phase 2: the caller (Cert handler) provides the cert ID
// from the URL path; we look up the cert + the latest version + the
// order that produced it, and confirm order.AccountID == accountID.
func (s *ACMEService) LookupCertificate(ctx context.Context, certID, accountID string) (string, error) {
if s.certRepo == nil {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMEFinalizeUnconfigured
}
cert, err := s.certRepo.Get(ctx, certID)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMECertificateNotFound
}
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", fmt.Errorf("acme: get cert: %w", err)
}
if cert.Source != domain.CertificateSourceACME {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMECertificateNotFound
}
// Confirm an order owned by this account references this cert.
if !s.accountOwnsACMECert(ctx, accountID, certID) {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMEOrderUnauthorized
}
version, err := s.certRepo.GetLatestVersion(ctx, certID)
if err != nil {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", fmt.Errorf("acme: latest version: %w", err)
}
s.metrics.bump(&s.metrics.CertDownloadTotal)
return version.PEMChain, nil
}
// accountOwnsACMECert returns true when the given account has an
// order linking to certID. Implemented by linear scan via the
// existing repo; Phase 5's GC will add an index if the table grows.
func (s *ACMEService) accountOwnsACMECert(ctx context.Context, accountID, certID string) bool {
// Phase 2 minimal-viable path: use order.GetByCertificateID via a
// dedicated repo method would be ideal, but we don't have it.
// Instead, accept the cert if its CertificateService.Create was
// performed in the FinalizeOrder path (which always pairs with
// this account). We trust the cert.Source = ACME + the URL path
// scoping (operator can't construct an ACME cert without going
// through finalize) for Phase 2; Phase 4's revocation path will
// add a stricter ownership check via a new repo method.
_ = ctx
_ = accountID
_ = certID
return true
}
// firstAvailableIssuer returns the (id, connector) pair for the first
// registered issuer. Phase 2 uses this as the bound issuer; the
// registered issuer. Cross-concern helper: called from Phase 2
// FinalizeOrder (acme_orders.go, post-Sprint-9b) AND Phase 4
// RevokeCert + RenewalInfo (below in this file). Kept in acme.go so
// it's adjacent to two of its three callers and reachable from
// acme_orders.go via Go's same-package scope without dragging the
// helper into a third "shared helpers" sibling. The
// per-profile-issuer mapping arrives in a follow-up.
func (s *ACMEService) firstAvailableIssuer() (string, IssuerConnector, bool) {
if s.issuerRegistry == nil {
@@ -1234,53 +809,6 @@ func (s *ACMEService) firstAvailableIssuer() (string, IssuerConnector, bool) {
return "", nil, false
}
// randIDSuffix returns a short base32-encoded random suffix used for
// new ACME entity IDs (orders, authzs, challenges). Distinct from
// the account-id derivation (which uses the JWK thumbprint for RFC
// 8555 §7.3.1 idempotency).
func randIDSuffix() string {
var b [10]byte
if _, err := cryptorand.Read(b[:]); err != nil {
// ed25519/rand source failure is fatal; surface as a panic
// rather than continue with weak IDs.
panic(fmt.Sprintf("acme: rand source failure: %v", err))
}
return base32encode(b[:])
}
// base32encode emits the lowercase Crockford-style base32 alphabet
// without padding. Used by randIDSuffix; alphabet matches the
// per-id-prefix human-readable convention (acme-acc-, acme-ord-,
// etc.) — see the project's "TEXT primary keys with human-readable
// prefixes" architecture decision.
func base32encode(b []byte) string {
const alpha = "0123456789abcdefghjkmnpqrstvwxyz"
out := make([]byte, 0, len(b)*8/5+1)
var buf uint64
bits := uint(0)
for _, c := range b {
buf = (buf << 8) | uint64(c)
bits += 8
for bits >= 5 {
bits -= 5
out = append(out, alpha[(buf>>bits)&0x1f])
}
}
if bits > 0 {
out = append(out, alpha[(buf<<(5-bits))&0x1f])
}
return string(out)
}
// identifierStrings extracts the value list for audit details.
func identifierStrings(ids []domain.ACMEIdentifier) []string {
out := make([]string, 0, len(ids))
for _, id := range ids {
out = append(out, id.Value)
}
return out
}
// --- Phase 4 — key rollover + revocation + ARI -------------------------
// RotateAccountKey is the service-layer entry point for RFC 8555
+540
View File
@@ -0,0 +1,540 @@
// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package service
import (
"context"
cryptorand "crypto/rand"
"crypto/x509"
"errors"
"fmt"
"strings"
"time"
"github.com/certctl-io/certctl/internal/api/acme"
"github.com/certctl-io/certctl/internal/domain"
"github.com/certctl-io/certctl/internal/repository"
)
// Phase 9 ARCH-M2 closure Sprint 9b (2026-05-14): the deferred half of
// Sprint 9. Extracted from internal/service/acme.go via the Option B
// sibling-file pattern. Package stays `service`; every external caller
// of `service.ACMEService.{CreateOrder,LookupOrder,FinalizeOrder,
// LookupCertificate}` resolves the same way — pure mechanical
// relocation.
//
// What lives here
// ===============
// The Phase 2 orders concern in full: the order CRUD methods
// (CreateOrder + LookupOrder + FinalizeOrder + LookupCertificate),
// the FinalizeOrderResult shape, the order-scoped accountOwnsACMECert
// ownership check, and the three orders-internal ID helpers
// (randIDSuffix + base32encode for generating acme-ord-* / acme-authz-* /
// acme-chall-* / mc-acme-* prefixes per the project's TEXT-primary-keys
// architecture decision; identifierStrings for audit detail rendering).
//
// What stays in acme.go (cross-concern by helper-call analysis)
// =============================================================
// firstAvailableIssuer remains in acme.go. Three call sites consume
// it: FinalizeOrder (here in acme_orders.go) AND Phase 4 RevokeCert
// + RenewalInfo (both in acme.go). Moving it here would leave Phase 4
// reaching across a sibling-file boundary for a single helper; leaving
// it in acme.go keeps it adjacent to its other two callers while still
// staying reachable from this file via Go's same-package scope. The
// alternative (a third "shared helpers" sibling) costs an extra file
// for one helper — not worth the indirection.
//
// mapACMERevocationReason stays in acme.go too. It's used exclusively
// by Phase 4 RevokeCert. Despite sitting in the orders helper cluster
// in audit notes (because of its alphabetical-adjacency to the other
// helpers in the audit-time grep), the actual call graph puts it
// firmly on the Phase 4 side.
//
// Sprint 9 vs Sprint 9b
// =====================
// Sprint 9 (commit b503d27b) shipped nonces + authz + challenges + gc
// — four files, 432 LOC moved, all single-contiguous-region cuts.
// Sprint 9b crosses the harder boundary the original sprint deferred:
// a ~476-LOC two-block cut (orders block A + helpers block B with
// firstAvailableIssuer's 14 lines between them staying behind) plus
// the per-helper move-vs-stay decision documented above. Splitting
// 9 from 9b keeps the four contiguous cuts on one commit and the
// non-contiguous cut on its own, mirroring the Sprint 8 / Sprint 8b
// pattern (mechanical vs harder-shape, separate review windows).
// --- Phase 2 — orders + authz + finalize + cert download ---------------
// CreateOrder validates a new-order request against the bound profile
// and persists the order + per-identifier authz + per-authz challenge
// rows in one WithinTx. Returns the created order on success.
//
// Auth-mode dispatch:
// - trust_authenticated (default): order goes immediately to status=ready,
// each authz immediately to status=valid (no challenge validation
// required); a single placeholder http-01 challenge per authz is
// persisted with status=valid for RFC 8555 compliance (the spec
// requires challenges on every authz).
// - challenge: order stays at status=pending, authzs at status=pending,
// challenges at status=pending, until Phase 3's validators run.
func (s *ACMEService) CreateOrder(
ctx context.Context,
accountID, profileID string,
identifiers []domain.ACMEIdentifier,
notBefore, notAfter *time.Time,
) (*domain.ACMEOrder, error) {
if s.tx == nil || s.auditService == nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("acme: new-order requires SetTransactor + SetAuditService")
}
// Phase 5 — per-account orders/hour cap. Hits return rateLimited
// (RFC 8555 §6.7) before any DB work. Counter is in-memory; restart
// wipes (eventual-consistency caps are acceptable).
if s.rateLimiter != nil && s.cfg.RateLimitOrdersPerHour > 0 {
if !s.rateLimiter.Allow(acme.ActionNewOrder, accountID, s.cfg.RateLimitOrdersPerHour) {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, ErrACMERateLimited
}
}
// Phase 5 — concurrent-orders cap. We count
// pending/ready/processing orders for this account; if at-or-over
// the cap, reject. This is a DB read (no FOR UPDATE), so two
// requests racing under the threshold can both succeed and push
// the account one over — accepted as eventual-consistency.
if s.cfg.RateLimitConcurrentOrders > 0 {
count, cerr := s.repo.CountActiveOrdersByAccount(ctx, accountID)
if cerr == nil && count >= s.cfg.RateLimitConcurrentOrders {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, ErrACMEConcurrentOrdersExceeded
}
}
resolvedProfileID, err := s.resolveProfile(ctx, profileID)
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, err
}
profile, err := s.profiles.Get(ctx, resolvedProfileID)
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("acme: lookup profile: %w", err)
}
authMode := profile.ACMEAuthMode
if authMode == "" {
authMode = string(s.cfg.DefaultAuthMode)
}
if authMode == "" {
authMode = "trust_authenticated"
}
if authMode != "trust_authenticated" && authMode != "challenge" {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, fmt.Errorf("%w: %q", ErrACMEUnsupportedAuthMode, authMode)
}
now := time.Now().UTC()
orderTTL := s.cfg.OrderTTL
if orderTTL <= 0 {
orderTTL = 24 * time.Hour
}
authzTTL := s.cfg.AuthzTTL
if authzTTL <= 0 {
authzTTL = 24 * time.Hour
}
// In trust_authenticated mode, the order goes straight to `ready`
// (RFC 8555 §7.1.6: ready means all authzs valid, awaiting CSR).
// In challenge mode, the order stays `pending` until challenges
// validate.
orderStatus := domain.ACMEOrderStatusPending
authzStatus := domain.ACMEAuthzStatusPending
challengeStatus := domain.ACMEChallengeStatusPending
if authMode == "trust_authenticated" {
orderStatus = domain.ACMEOrderStatusReady
authzStatus = domain.ACMEAuthzStatusValid
challengeStatus = domain.ACMEChallengeStatusValid
}
order := &domain.ACMEOrder{
OrderID: "acme-ord-" + randIDSuffix(),
AccountID: accountID,
Identifiers: identifiers,
Status: orderStatus,
ExpiresAt: now.Add(orderTTL),
NotBefore: notBefore,
NotAfter: notAfter,
CreatedAt: now,
UpdatedAt: now,
}
auditDetails := map[string]interface{}{
"account_id": accountID,
"profile_id": resolvedProfileID,
"auth_mode": authMode,
"identifier_n": len(identifiers),
"identifiers": identifierStrings(identifiers),
}
err = s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.repo.CreateOrderWithTx(ctx, q, order); err != nil {
return fmt.Errorf("acme: create order: %w", err)
}
// Per-identifier authz + 1 placeholder challenge per authz.
for _, id := range identifiers {
authz := &domain.ACMEAuthorization{
AuthzID: "acme-authz-" + randIDSuffix(),
OrderID: order.OrderID,
Identifier: id,
Status: authzStatus,
ExpiresAt: now.Add(authzTTL),
Wildcard: strings.HasPrefix(id.Value, "*."),
CreatedAt: now,
UpdatedAt: now,
}
if err := s.repo.CreateAuthzWithTx(ctx, q, authz); err != nil {
return fmt.Errorf("acme: create authz: %w", err)
}
// RFC 8555 §8: every authz needs at least one challenge
// row. Phase 2 emits a single http-01 placeholder; Phase 3
// will fan out to all 3 challenge types under challenge mode.
ch := &domain.ACMEChallenge{
ChallengeID: "acme-chall-" + randIDSuffix(),
AuthzID: authz.AuthzID,
Type: domain.ACMEChallengeTypeHTTP01,
Status: challengeStatus,
Token: randIDSuffix(),
CreatedAt: now,
}
if challengeStatus == domain.ACMEChallengeStatusValid {
validatedAt := now
ch.ValidatedAt = &validatedAt
}
if err := s.repo.CreateChallengeWithTx(ctx, q, ch); err != nil {
return fmt.Errorf("acme: create challenge: %w", err)
}
}
return s.auditService.RecordEventWithTx(
ctx, q,
fmt.Sprintf("acme:%s", accountID),
domain.ActorTypeUser,
"acme_order_created",
"acme_order",
order.OrderID,
auditDetails,
)
})
if err != nil {
s.metrics.bump(&s.metrics.NewOrderFailureTotal)
return nil, err
}
s.metrics.bump(&s.metrics.NewOrderTotal)
return order, nil
}
// LookupOrder returns an order by ID, asserting the requesting
// account owns it. ErrACMEOrderUnauthorized when account_id mismatches.
func (s *ACMEService) LookupOrder(ctx context.Context, orderID, accountID string) (*domain.ACMEOrder, error) {
order, err := s.repo.GetOrderByID(ctx, orderID)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
return nil, ErrACMEOrderNotFound
}
return nil, fmt.Errorf("acme: lookup order: %w", err)
}
if order.AccountID != accountID {
return nil, ErrACMEOrderUnauthorized
}
return order, nil
}
// FinalizeOrderResult bundles the post-finalize state the handler
// needs: the updated order + the cert ID for the cert-download URL.
type FinalizeOrderResult struct {
Order *domain.ACMEOrder
CertID string
}
// FinalizeOrder consumes a CSR, asserts it matches the order's
// identifiers, issues via the IssuerRegistry's per-profile connector,
// persists the managed_certificates row + version + audit, and
// transitions the order to status=valid with certificate_id set.
//
// Atomicity boundary (documented in the master prompt):
// - Step A (this function's own WithinTx): order status pending →
// processing + audit row.
// - Step B (CertificateService.Create): managed_certificates row +
// audit row in its own WithinTx.
// - Step C (this function's own WithinTx): certificate_versions row
// - order status processing → valid + certificate_id + csr_pem +
// audit row.
//
// The window between Step B and Step C can leave a managed_certificates
// row whose order is still in `processing`. Phase 5's GC scheduler
// reconciles. Documented in the project's ACME-server design notes + the
// service file's design notes.
func (s *ACMEService) FinalizeOrder(
ctx context.Context,
accountID, orderID, profileID string,
csr *x509.CertificateRequest,
csrPEM string,
) (*FinalizeOrderResult, error) {
if s.certService == nil || s.certRepo == nil || s.issuerRegistry == nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, ErrACMEFinalizeUnconfigured
}
if s.tx == nil || s.auditService == nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: finalize requires SetTransactor + SetAuditService")
}
order, err := s.LookupOrder(ctx, orderID, accountID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
if order.Status != domain.ACMEOrderStatusReady && order.Status != domain.ACMEOrderStatusProcessing {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("%w: status=%s", ErrACMEOrderNotReady, order.Status)
}
// Idempotent re-finalize (RFC 8555 §7.4): if the order is already
// valid, return the existing result.
if order.Status == domain.ACMEOrderStatusValid && order.CertificateID != "" {
s.metrics.bump(&s.metrics.FinalizeOrderTotal)
return &FinalizeOrderResult{Order: order, CertID: order.CertificateID}, nil
}
// Validate CSR matches order identifiers.
if p := acme.CSRMatchesIdentifiers(csr, order.Identifiers); p != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
// Persist the failure on the order for client visibility.
order.Status = domain.ACMEOrderStatusInvalid
order.Error = &domain.ACMEProblem{Type: p.Type, Detail: p.Detail, Status: p.Status}
_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
return s.repo.UpdateOrderWithTx(ctx, q, order)
})
return nil, fmt.Errorf("acme: csr mismatch: %s", p.Detail)
}
resolvedProfileID, err := s.resolveProfile(ctx, profileID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
profile, err := s.profiles.Get(ctx, resolvedProfileID)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: lookup profile: %w", err)
}
// Step A: mark order processing.
order.Status = domain.ACMEOrderStatusProcessing
if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
return err
}
return s.auditService.RecordEventWithTx(ctx, q,
fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
"acme_order_processing", "acme_order", order.OrderID,
map[string]interface{}{"profile_id": resolvedProfileID})
}); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
// Step B: issue the cert via the per-issuer connector + persist
// the managed_certificates row.
commonName := csr.Subject.CommonName
if commonName == "" && len(order.Identifiers) > 0 {
commonName = order.Identifiers[0].Value
}
sans := make([]string, 0, len(order.Identifiers))
for _, id := range order.Identifiers {
if id.Type == "dns" {
sans = append(sans, id.Value)
}
}
// Resolve the bound issuer. Profile carries no IssuerID column
// (issuer is per-issuance per certctl architecture), so we'd
// normally get it from the order context. For Phase 2 we use the
// configured default issuer-id for the first registered connector.
// Operators with multiple profiles + multiple issuers will refine
// this in a follow-up.
issuerID, conn, ok := s.firstAvailableIssuer()
if !ok {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: no issuer available in registry")
}
maxTTL := profile.MaxTTLSeconds
mustStaple := profile.MustStaple
ekus := profile.AllowedEKUs
if len(ekus) == 0 {
ekus = domain.DefaultEKUs()
}
issuance, err := conn.IssueCertificate(ctx, commonName, sans, csrPEM, ekus, maxTTL, mustStaple)
if err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
// Persist the failure on the order.
order.Status = domain.ACMEOrderStatusInvalid
order.Error = &domain.ACMEProblem{
Type: "urn:ietf:params:acme:error:serverInternal",
Detail: "issuer rejected the CSR",
Status: 500,
}
_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
return s.repo.UpdateOrderWithTx(ctx, q, order)
})
return nil, fmt.Errorf("acme: issuer issuance: %w", err)
}
cert := &domain.ManagedCertificate{
ID: "mc-acme-" + randIDSuffix(),
Name: fmt.Sprintf("acme-%s", order.OrderID),
CommonName: commonName,
SANs: sans,
IssuerID: issuerID,
CertificateProfileID: profile.ID,
Status: domain.CertificateStatusActive,
ExpiresAt: issuance.NotAfter,
Source: domain.CertificateSourceACME,
}
actor := fmt.Sprintf("acme:%s", accountID)
if err := s.certService.Create(ctx, cert, actor); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, fmt.Errorf("acme: cert insert: %w", err)
}
// Step C: persist the certificate version + transition order to
// valid in one WithinTx.
version := &domain.CertificateVersion{
CertificateID: cert.ID,
SerialNumber: issuance.Serial,
NotBefore: issuance.NotBefore,
NotAfter: issuance.NotAfter,
PEMChain: issuance.CertPEM + issuance.ChainPEM,
CSRPEM: csrPEM,
}
order.Status = domain.ACMEOrderStatusValid
order.CSRPEM = csrPEM
order.CertificateID = cert.ID
order.Error = nil
if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
if err := s.certRepo.CreateVersionWithTx(ctx, q, version); err != nil {
return err
}
if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
return err
}
return s.auditService.RecordEventWithTx(ctx, q, actor, domain.ActorTypeUser,
"acme_order_finalized", "acme_order", order.OrderID,
map[string]interface{}{
"profile_id": resolvedProfileID,
"certificate_id": cert.ID,
"serial": issuance.Serial,
})
}); err != nil {
s.metrics.bump(&s.metrics.FinalizeOrderFailureTotal)
return nil, err
}
s.metrics.bump(&s.metrics.FinalizeOrderTotal)
return &FinalizeOrderResult{Order: order, CertID: cert.ID}, nil
}
// LookupCertificate returns the PEM chain for a managed-certificate
// ID. Asserts the requesting account owns the cert via the order
// linkage. Phase 2: the caller (Cert handler) provides the cert ID
// from the URL path; we look up the cert + the latest version + the
// order that produced it, and confirm order.AccountID == accountID.
func (s *ACMEService) LookupCertificate(ctx context.Context, certID, accountID string) (string, error) {
if s.certRepo == nil {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMEFinalizeUnconfigured
}
cert, err := s.certRepo.Get(ctx, certID)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMECertificateNotFound
}
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", fmt.Errorf("acme: get cert: %w", err)
}
if cert.Source != domain.CertificateSourceACME {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMECertificateNotFound
}
// Confirm an order owned by this account references this cert.
if !s.accountOwnsACMECert(ctx, accountID, certID) {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", ErrACMEOrderUnauthorized
}
version, err := s.certRepo.GetLatestVersion(ctx, certID)
if err != nil {
s.metrics.bump(&s.metrics.CertDownloadFailureTotal)
return "", fmt.Errorf("acme: latest version: %w", err)
}
s.metrics.bump(&s.metrics.CertDownloadTotal)
return version.PEMChain, nil
}
// accountOwnsACMECert returns true when the given account has an
// order linking to certID. Implemented by linear scan via the
// existing repo; Phase 5's GC will add an index if the table grows.
func (s *ACMEService) accountOwnsACMECert(ctx context.Context, accountID, certID string) bool {
// Phase 2 minimal-viable path: use order.GetByCertificateID via a
// dedicated repo method would be ideal, but we don't have it.
// Instead, accept the cert if its CertificateService.Create was
// performed in the FinalizeOrder path (which always pairs with
// this account). We trust the cert.Source = ACME + the URL path
// scoping (operator can't construct an ACME cert without going
// through finalize) for Phase 2; Phase 4's revocation path will
// add a stricter ownership check via a new repo method.
_ = ctx
_ = accountID
_ = certID
return true
}
// randIDSuffix returns a short base32-encoded random suffix used for
// new ACME entity IDs (orders, authzs, challenges). Distinct from
// the account-id derivation (which uses the JWK thumbprint for RFC
// 8555 §7.3.1 idempotency).
func randIDSuffix() string {
var b [10]byte
if _, err := cryptorand.Read(b[:]); err != nil {
// ed25519/rand source failure is fatal; surface as a panic
// rather than continue with weak IDs.
panic(fmt.Sprintf("acme: rand source failure: %v", err))
}
return base32encode(b[:])
}
// base32encode emits the lowercase Crockford-style base32 alphabet
// without padding. Used by randIDSuffix; alphabet matches the
// per-id-prefix human-readable convention (acme-acc-, acme-ord-,
// etc.) — see the project's "TEXT primary keys with human-readable
// prefixes" architecture decision.
func base32encode(b []byte) string {
const alpha = "0123456789abcdefghjkmnpqrstvwxyz"
out := make([]byte, 0, len(b)*8/5+1)
var buf uint64
bits := uint(0)
for _, c := range b {
buf = (buf << 8) | uint64(c)
bits += 8
for bits >= 5 {
bits -= 5
out = append(out, alpha[(buf>>bits)&0x1f])
}
}
if bits > 0 {
out = append(out, alpha[(buf<<(5-bits))&0x1f])
}
return string(out)
}
// identifierStrings extracts the value list for audit details.
func identifierStrings(ids []domain.ACMEIdentifier) []string {
out := make([]string, 0, len(ids))
for _, id := range ids {
out = append(out, id.Value)
}
return out
}