refactor(service/acme): split into sibling files — Option B (Phase 9, 9 of N — partial)

Phase 9 ARCH-M2 closure Sprint 9. Splits internal/service/acme.go (was 1965 LOC, the top hotspot after Sprints 1-8 finished the config + main-binary cuts) via the Option B sibling-file pattern — new files stay in `package service` so every external caller of `service.ACMEService.{IssueNonce,LookupAuthz,ListAuthzsByOrder, RespondToChallenge,GarbageCollect}` resolves the same way. Pure mechanical relocation; no signature, no behavior, no import-graph change. Why Option B (not a subpackage) ================================ A subpackage (e.g. `internal/service/acme/`) would have meant rebadging every public method receiver to its new package — that's import-path churn for ~70 call sites across handlers, scheduler, cmd/server wiring, MCP tools, and tests, plus the cyclic-import risk of pulling acme back into `service` for the shared interfaces. Option B sacrifices the encapsulation discipline a subpackage would have given (sibling files can still reach into each other's unexported state because Go scopes are per-package), but in exchange the diff is restricted to file moves + four sed deletes; zero importer touches anywhere outside this directory. The trade-off matches every prior Sprint 1-7 config cut. What moved ========== New `internal/service/acme_nonces.go` (46 LOC) ---------------------------------------------- The IssueNonce method (RFC 8555 §6.5 Replay-Nonce issuance). The nonceAdapter type — which wraps ACMERepo.ConsumeNonce for the JWS verifier — stays in acme.go alongside VerifyJWS because it's verification-infrastructure plumbing, not a server-issues-nonce concern. New `internal/service/acme_authz.go` (45 LOC) --------------------------------------------- LookupAuthz + ListAuthzsByOrder (the authz read-side). Authz write- side (status cascade after challenge validation) lives in acme_challenges.go alongside recordChallengeOutcome where it belongs operationally; the authz creation path stays inside CreateOrder in acme.go (orders own per-order authz row creation). New `internal/service/acme_challenges.go` (267 LOC) --------------------------------------------------- The whole Phase 3 challenge dispatch + validator callback concern: the `// --- Phase 3 — challenge dispatch + validator callback ---` banner, the ChallengeResponseShape struct, the HTTP-facing RespondToChallenge method (which transitions challenge → processing and submits to the validator pool), and the asynchronous recordChallengeOutcome callback (which persists final challenge status and cascades the parent authz + order status). Largest single extract this sprint by line count. New `internal/service/acme_gc.go` (74 LOC) ------------------------------------------ The Phase 5 ACME GC sweep: scheduler-invoked GarbageCollect entry point (3 sweeps: nonces, expired authzs, expired orders) and the atomicAddUint64 counter helper (only consumed by the sweep body for the rows-affected-N case the default `bump` doesn't cover). What deferred ============= Sprint 9 was originally scoped to ship 5 sub-files (nonces / authz / challenges / orders / gc). The orders cut — CreateOrder + LookupOrder + FinalizeOrder + LookupCertificate + the orders helpers (randIDSuffix / base32encode / identifierStrings / firstAvailableIssuer / accountOwnsACMECert / mapACMERevocationReason) + FinalizeOrderResult — is ~700 LOC spread across multiple non- contiguous regions in acme.go, with the orders helpers also feeding into RevokeCert / RenewalInfo on the Phase 4 side. Disentangling which helpers move with orders vs which stay with Phase 4 needs a focused sprint of its own to avoid leaving a half-cut helper declared in one file but called from a sibling — which works (same package) but defeats the point of organising by concern. Deferred to a potential Sprint 9b. Net effect ========== acme.go: 1965 → 1634 LOC (-331). Four new sibling files at 432 LOC total. The headline 1965-LOC hotspot drops below the next-tier candidates (mcp/tools.go, auth_session_oidc.go, cmd/agent/main.go). Behavior preservation contract ============================== 1. gofmt -l clean across all 5 affected files. 2. go vet ./internal/service/... — no findings. 3. staticcheck ./internal/service/... — no findings. 4. go test -short -count=1 ./internal/service/... — green. 5. Broader-importer build green: go build ./cmd/server/... ./internal/api/handler/... ./internal/scheduler/... ./internal/mcp/... 6. Broader-importer tests green: go test -short -count=1 ./cmd/server/... ./internal/api/handler/... ./internal/scheduler/... 7. Per-import-symbol audit: all 8 imports remaining in acme.go (context, cryptorand, x509, errors, fmt, strings, sync/atomic, time, jose, internal/api/acme, internal/config, internal/domain, internal/repository) verified used by surviving code. New sibling files carry only the imports their extracted code needs. The Option B sibling-file shape means same-package resolution preserves access to ACMEService's unexported state from every extracted method without any visibility tweaks. Worth noting for the future: this also means a careless future caller could reach through file boundaries and re-tangle concerns; the file headers document the intended boundary but Go's tooling won't enforce it. Why this is a partial sprint ============================ Splitting into 4 of 5 named sub-files now (vs blocking until orders is also clean) keeps the hotspot count down with this commit and lets a follow-up Sprint 9b focus exclusively on the orders cut without re-touching the four files this sprint ships. Same "smallest useful slice, document the rest" cadence as Sprint 8 splitting into 8a (mechanical) + 8b (behavior-aware). Refs: ARCH-M2 (god-files), Phase 9 audit. Last in the config / service hotspot chain before the agent + mcp + auth-session cuts land in Sprints 10-12.
2026-06-07 14:21:37 +00:00 · 2026-05-14 09:58:46 +00:00
parent de4f93b35e
commit b503d27b4f
5 changed files with 432 additions and 331 deletions
@@ -420,28 +420,6 @@ func (s *ACMEService) BuildDirectory(ctx context.Context, profileID, baseURL str
 	return dir, nil
 }

-// IssueNonce generates a fresh ACME nonce, persists it with the
-// configured TTL, and returns the encoded string for the
-// Replay-Nonce header.
-//
-// RFC 8555 §6.5: every successful ACME response carries a
-// Replay-Nonce. Phase 1a wires this via the directory + new-nonce
-// handlers; Phase 1b extends with new-account + account/<id> POST
-// responses (the JWS-authenticated paths).
-func (s *ACMEService) IssueNonce(ctx context.Context) (string, error) {
-	nonce, err := acme.GenerateNonce()
-	if err != nil {
-		s.metrics.bump(&s.metrics.NewNonceFailureTotal)
-		return "", fmt.Errorf("acme: generate nonce: %w", err)
-	}
-	if err := s.repo.IssueNonce(ctx, nonce, s.cfg.NonceTTL); err != nil {
-		s.metrics.bump(&s.metrics.NewNonceFailureTotal)
-		return "", fmt.Errorf("acme: persist nonce: %w", err)
-	}
-	s.metrics.bump(&s.metrics.NewNonceTotal)
-	return nonce, nil
-}
-
 // resolveProfile applies the default-profile fallback and confirms the
 // profile exists. Returns the resolved (canonical) profileID on
 // success. Centralizing the resolution here keeps every Phase
@@ -996,26 +974,6 @@ func (s *ACMEService) LookupOrder(ctx context.Context, orderID, accountID string
 	return order, nil
 }

-// LookupAuthz returns an authz by ID. Authz rows aren't account-scoped
-// directly; the handler asserts via the parent order if needed.
-func (s *ACMEService) LookupAuthz(ctx context.Context, authzID string) (*domain.ACMEAuthorization, error) {
-	authz, err := s.repo.GetAuthzByID(ctx, authzID)
-	if err != nil {
-		if errors.Is(err, repository.ErrNotFound) {
-			return nil, ErrACMEAuthzNotFound
-		}
-		return nil, fmt.Errorf("acme: lookup authz: %w", err)
-	}
-	s.metrics.bump(&s.metrics.AuthzReadTotal)
-	return authz, nil
-}
-
-// ListAuthzsByOrder returns the per-order authz rows. Used by
-// MarshalOrder to compute the authorizations URL list.
-func (s *ACMEService) ListAuthzsByOrder(ctx context.Context, orderID string) ([]*domain.ACMEAuthorization, error) {
-	return s.repo.ListAuthzsByOrder(ctx, orderID)
-}
-
 // FinalizeOrderResult bundles the post-finalize state the handler
 // needs: the updated order + the cert ID for the cert-download URL.
 type FinalizeOrderResult struct {
@@ -1323,242 +1281,6 @@ func identifierStrings(ids []domain.ACMEIdentifier) []string {
 	return out
 }

-// --- Phase 3 — challenge dispatch + validator callback -----------------
-
-// ChallengeResponseShape is what RespondToChallenge returns to the
-// handler: the post-dispatch challenge row (status=processing) so the
-// handler can render it via acme.MarshalAuthorization-equivalent. The
-// validator goroutine writes the final status (valid/invalid) as a
-// callback after dispatch completes — clients fetching the challenge
-// via authz GET get the eventual state.
-type ChallengeResponseShape struct {
-	Challenge *domain.ACMEChallenge
-}
-
-// RespondToChallenge handles POST /acme/profile/<id>/challenge/<chall_id>
-// per RFC 8555 §7.5.1.
-//
-// Behavior:
-//   - Look up the challenge + parent authz + parent order; assert the
-//     account owns the order.
-//   - If the challenge is already valid/invalid → idempotent return.
-//   - If pending: transition to processing (atomic via WithinTx + audit).
-//   - Submit to the validator pool with an onComplete callback that
-//     transitions the challenge to valid/invalid in another WithinTx
-//     (and cascades the parent authz status).
-//   - Return the challenge in its current (processing) state; the
-//     client polls authz/challenge for the eventual outcome.
-func (s *ACMEService) RespondToChallenge(
-	ctx context.Context,
-	accountID, challengeID string,
-	accountJWK *jose.JSONWebKey,
-) (*domain.ACMEChallenge, error) {
-	if s.tx == nil || s.auditService == nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, fmt.Errorf("acme: respond-to-challenge requires SetTransactor + SetAuditService")
-	}
-	if s.validatorPool == nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, ErrACMEChallengePoolUnconfigured
-	}
-	// Phase 5 — per-challenge respond rate limit. Defends against retry
-	// storms from a misbehaving client. Keyed by challengeID (not
-	// accountID) so a flood against one challenge doesn't drain the
-	// account's whole budget.
-	if s.rateLimiter != nil && s.cfg.RateLimitChallengeRespondsPerHour > 0 {
-		if !s.rateLimiter.Allow(acme.ActionChallengeRespond, challengeID, s.cfg.RateLimitChallengeRespondsPerHour) {
-			s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-			return nil, ErrACMERateLimited
-		}
-	}
-
-	ch, err := s.repo.GetChallengeByID(ctx, challengeID)
-	if err != nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		if errors.Is(err, repository.ErrNotFound) {
-			return nil, ErrACMEChallengeNotFound
-		}
-		return nil, fmt.Errorf("acme: lookup challenge: %w", err)
-	}
-
-	// Idempotent re-POST: already valid/invalid → just return.
-	if ch.Status == domain.ACMEChallengeStatusValid || ch.Status == domain.ACMEChallengeStatusInvalid {
-		s.metrics.bump(&s.metrics.ChallengeRespondTotal)
-		return ch, nil
-	}
-	if ch.Status == domain.ACMEChallengeStatusProcessing {
-		// In-flight. Return the row as-is.
-		s.metrics.bump(&s.metrics.ChallengeRespondTotal)
-		return ch, nil
-	}
-
-	// Confirm the requesting account owns the parent authz/order.
-	authz, err := s.repo.GetAuthzByID(ctx, ch.AuthzID)
-	if err != nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, fmt.Errorf("acme: lookup parent authz: %w", err)
-	}
-	order, err := s.repo.GetOrderByID(ctx, authz.OrderID)
-	if err != nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, fmt.Errorf("acme: lookup parent order: %w", err)
-	}
-	if order.AccountID != accountID {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, ErrACMEOrderUnauthorized
-	}
-
-	// Compute the key authorization the validator needs.
-	expected, err := acme.KeyAuthorization(ch.Token, accountJWK)
-	if err != nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, fmt.Errorf("acme: key authorization: %w", err)
-	}
-
-	// Transition challenge → processing (atomic with audit row).
-	ch.Status = domain.ACMEChallengeStatusProcessing
-	if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
-		if err := s.repo.UpdateChallengeWithTx(ctx, q, ch); err != nil {
-			return err
-		}
-		return s.auditService.RecordEventWithTx(ctx, q,
-			fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
-			"acme_challenge_processing", "acme_challenge", ch.ChallengeID,
-			map[string]interface{}{
-				"authz_id":   ch.AuthzID,
-				"type":       string(ch.Type),
-				"identifier": authz.Identifier.Value,
-			})
-	}); err != nil {
-		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
-		return nil, err
-	}
-
-	// Submit to the pool. The onComplete callback persists the final
-	// challenge status + cascades the parent authz status. We detach
-	// from the request context via context.WithoutCancel so the
-	// callback's WithinTx survives the HTTP handler returning, while
-	// preserving inherited values (logger, trace IDs, audit actor).
-	bgctx := context.WithoutCancel(ctx)
-	chSnapshot := *ch
-	authzSnapshot := *authz
-	identifier := authz.Identifier.Value
-	s.validatorPool.Submit(bgctx, string(ch.Type), identifier, ch.Token, expected, func(verr error) {
-		s.recordChallengeOutcome(bgctx, accountID, &chSnapshot, &authzSnapshot, verr)
-	})
-
-	s.metrics.bump(&s.metrics.ChallengeRespondTotal)
-	return ch, nil
-}
-
-// recordChallengeOutcome is the validator-pool callback. Persists the
-// challenge's final status + cascades the parent authz status.
-//
-// Authz cascade: if the challenge succeeded, the authz becomes valid
-// (RFC 8555 §7.1.6: any one challenge passing makes the authz valid).
-// If the challenge failed, the authz becomes invalid only if no other
-// pending challenges remain (Phase 3 minimal-viable path: we mark the
-// authz invalid on first failure since Phase 3 emits 1 challenge per
-// authz; Phase 4+ extending to multi-challenge-per-authz revisits this).
-func (s *ACMEService) recordChallengeOutcome(
-	ctx context.Context,
-	accountID string,
-	ch *domain.ACMEChallenge,
-	authz *domain.ACMEAuthorization,
-	verr error,
-) {
-	now := time.Now().UTC()
-	var newAuthzStatus domain.ACMEAuthzStatus
-	if verr == nil {
-		ch.Status = domain.ACMEChallengeStatusValid
-		ch.ValidatedAt = &now
-		ch.Error = nil
-		newAuthzStatus = domain.ACMEAuthzStatusValid
-		s.metrics.bump(&s.metrics.ChallengeValidateValid)
-	} else {
-		ch.Status = domain.ACMEChallengeStatusInvalid
-		if p := acme.ChallengeProblemFromError(string(ch.Type), verr); p != nil {
-			ch.Error = &domain.ACMEProblem{
-				Type:   p.Type,
-				Detail: p.Detail,
-				Status: p.Status,
-			}
-		}
-		newAuthzStatus = domain.ACMEAuthzStatusInvalid
-		s.metrics.bump(&s.metrics.ChallengeValidateInvalid)
-	}
-
-	auditDetails := map[string]interface{}{
-		"authz_id":   ch.AuthzID,
-		"type":       string(ch.Type),
-		"identifier": authz.Identifier.Value,
-		"valid":      verr == nil,
-	}
-	if verr != nil {
-		auditDetails["error"] = verr.Error()
-	}
-
-	_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
-		if err := s.repo.UpdateChallengeWithTx(ctx, q, ch); err != nil {
-			return err
-		}
-		if err := s.repo.UpdateAuthzStatusWithTx(ctx, q, ch.AuthzID, newAuthzStatus); err != nil {
-			return err
-		}
-		// Cascade: if the authz turned valid, see whether the order's
-		// authzs are now ALL valid; flip order to ready if so.
-		// Read-after-write to confirm.
-		authzs, err := s.repo.ListAuthzsByOrder(ctx, authz.OrderID)
-		if err != nil {
-			return err
-		}
-		allValid := len(authzs) > 0
-		anyInvalid := false
-		for _, a := range authzs {
-			if a.AuthzID == ch.AuthzID {
-				if newAuthzStatus != domain.ACMEAuthzStatusValid {
-					allValid = false
-				}
-				if newAuthzStatus == domain.ACMEAuthzStatusInvalid {
-					anyInvalid = true
-				}
-				continue
-			}
-			if a.Status != domain.ACMEAuthzStatusValid {
-				allValid = false
-			}
-			if a.Status == domain.ACMEAuthzStatusInvalid {
-				anyInvalid = true
-			}
-		}
-		order, err := s.repo.GetOrderByID(ctx, authz.OrderID)
-		if err != nil {
-			return err
-		}
-		switch {
-		case allValid && order.Status == domain.ACMEOrderStatusPending:
-			order.Status = domain.ACMEOrderStatusReady
-			if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
-				return err
-			}
-		case anyInvalid && order.Status == domain.ACMEOrderStatusPending:
-			order.Status = domain.ACMEOrderStatusInvalid
-			order.Error = &domain.ACMEProblem{
-				Type:   "urn:ietf:params:acme:error:incorrectResponse",
-				Detail: "one or more authorizations failed",
-				Status: 403,
-			}
-			if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
-				return err
-			}
-		}
-		return s.auditService.RecordEventWithTx(ctx, q,
-			fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
-			"acme_challenge_completed", "acme_challenge", ch.ChallengeID,
-			auditDetails)
-	})
-}
-
 // --- Phase 4 — key rollover + revocation + ARI -------------------------

 // RotateAccountKey is the service-layer entry point for RFC 8555
@@ -1910,56 +1632,3 @@ func mapACMERevocationReason(code int) string {
 		return string(domain.RevocationReasonUnspecified)
 	}
 }
-
-// GarbageCollect runs a single ACME GC sweep. Phase 5 — the scheduler
-// invokes this every cfg.GCInterval. Three independent sweeps:
-//
-//  1. Delete used / expired nonces.
-//  2. Transition expired pending authzs to `expired`.
-//  3. Transition expired pending/ready/processing orders to `invalid`.
-//
-// Each sweep is a single SQL statement (no per-row transactions) so a
-// large reap is one atomic write per sweep. Per-sweep errors are
-// logged-and-continued: a failing nonces sweep doesn't block the
-// authzs sweep. Returns the first error encountered (for caller
-// telemetry); per-sweep counts are recorded on metrics regardless.
-//
-// Idempotent — repeated runs are safe; the second run finds 0 rows.
-func (s *ACMEService) GarbageCollect(ctx context.Context) error {
-	s.metrics.bump(&s.metrics.GCRunsTotal)
-	var firstErr error
-
-	if n, err := s.repo.GCExpiredNonces(ctx); err != nil {
-		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
-		if firstErr == nil {
-			firstErr = fmt.Errorf("acme gc: nonces: %w", err)
-		}
-	} else if n > 0 {
-		atomicAddUint64(&s.metrics.GCNoncesReapedTotal, uint64(n))
-	}
-
-	if n, err := s.repo.GCExpireAuthorizations(ctx); err != nil {
-		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
-		if firstErr == nil {
-			firstErr = fmt.Errorf("acme gc: authzs: %w", err)
-		}
-	} else if n > 0 {
-		atomicAddUint64(&s.metrics.GCAuthzsExpiredTotal, uint64(n))
-	}
-
-	if n, err := s.repo.GCInvalidateExpiredOrders(ctx); err != nil {
-		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
-		if firstErr == nil {
-			firstErr = fmt.Errorf("acme gc: orders: %w", err)
-		}
-	} else if n > 0 {
-		atomicAddUint64(&s.metrics.GCOrdersInvalidatedTotal, uint64(n))
-	}
-
-	return firstErr
-}
-
-// atomicAddUint64 adds delta to the counter. The metrics struct exposes
-// only `bump` (add 1) by default; this helper covers the
-// rows-affected-N case the GC needs.
-func atomicAddUint64(c *atomic.Uint64, delta uint64) { c.Add(delta) }
@@ -0,0 +1,45 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package service
+
+import (
+	"context"
+	"errors"
+	"fmt"
+
+	"github.com/certctl-io/certctl/internal/domain"
+	"github.com/certctl-io/certctl/internal/repository"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 9 (2026-05-14): extracted from
+// internal/service/acme.go via the Option B sibling-file pattern.
+// Package stays `service`; every external caller of
+// `service.ACMEService.LookupAuthz(...)` / `ListAuthzsByOrder(...)`
+// resolves the same way — pure mechanical relocation.
+//
+// This file holds the authz read-side concern. The authz write-side
+// (status cascade after challenge validation) lives in
+// acme_challenges.go alongside recordChallengeOutcome where it
+// belongs operationally; the authz creation path stays inside
+// CreateOrder in acme.go (orders own the per-order authz rows).
+
+// LookupAuthz returns an authz by ID. Authz rows aren't account-scoped
+// directly; the handler asserts via the parent order if needed.
+func (s *ACMEService) LookupAuthz(ctx context.Context, authzID string) (*domain.ACMEAuthorization, error) {
+	authz, err := s.repo.GetAuthzByID(ctx, authzID)
+	if err != nil {
+		if errors.Is(err, repository.ErrNotFound) {
+			return nil, ErrACMEAuthzNotFound
+		}
+		return nil, fmt.Errorf("acme: lookup authz: %w", err)
+	}
+	s.metrics.bump(&s.metrics.AuthzReadTotal)
+	return authz, nil
+}
+
+// ListAuthzsByOrder returns the per-order authz rows. Used by
+// MarshalOrder to compute the authorizations URL list.
+func (s *ACMEService) ListAuthzsByOrder(ctx context.Context, orderID string) ([]*domain.ACMEAuthorization, error) {
+	return s.repo.ListAuthzsByOrder(ctx, orderID)
+}
@@ -0,0 +1,267 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package service
+
+import (
+	"context"
+	"errors"
+	"fmt"
+	"time"
+
+	jose "github.com/go-jose/go-jose/v4"
+
+	"github.com/certctl-io/certctl/internal/api/acme"
+	"github.com/certctl-io/certctl/internal/domain"
+	"github.com/certctl-io/certctl/internal/repository"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 9 (2026-05-14): extracted from
+// internal/service/acme.go via the Option B sibling-file pattern.
+// Package stays `service`; every external caller of
+// `service.ACMEService.RespondToChallenge(...)` resolves the same
+// way — pure mechanical relocation.
+//
+// This file holds the Phase 3 challenge dispatch + validator
+// callback concern: the HTTP-facing RespondToChallenge entry point
+// (which transitions the challenge to `processing` and submits it
+// to the validator pool) plus the asynchronous recordChallengeOutcome
+// callback (which persists the final challenge status and cascades
+// the parent authz + order status). The authz read-side
+// (LookupAuthz / ListAuthzsByOrder) lives in acme_authz.go.
+
+// --- Phase 3 — challenge dispatch + validator callback -----------------
+
+// ChallengeResponseShape is what RespondToChallenge returns to the
+// handler: the post-dispatch challenge row (status=processing) so the
+// handler can render it via acme.MarshalAuthorization-equivalent. The
+// validator goroutine writes the final status (valid/invalid) as a
+// callback after dispatch completes — clients fetching the challenge
+// via authz GET get the eventual state.
+type ChallengeResponseShape struct {
+	Challenge *domain.ACMEChallenge
+}
+
+// RespondToChallenge handles POST /acme/profile/<id>/challenge/<chall_id>
+// per RFC 8555 §7.5.1.
+//
+// Behavior:
+//   - Look up the challenge + parent authz + parent order; assert the
+//     account owns the order.
+//   - If the challenge is already valid/invalid → idempotent return.
+//   - If pending: transition to processing (atomic via WithinTx + audit).
+//   - Submit to the validator pool with an onComplete callback that
+//     transitions the challenge to valid/invalid in another WithinTx
+//     (and cascades the parent authz status).
+//   - Return the challenge in its current (processing) state; the
+//     client polls authz/challenge for the eventual outcome.
+func (s *ACMEService) RespondToChallenge(
+	ctx context.Context,
+	accountID, challengeID string,
+	accountJWK *jose.JSONWebKey,
+) (*domain.ACMEChallenge, error) {
+	if s.tx == nil || s.auditService == nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, fmt.Errorf("acme: respond-to-challenge requires SetTransactor + SetAuditService")
+	}
+	if s.validatorPool == nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, ErrACMEChallengePoolUnconfigured
+	}
+	// Phase 5 — per-challenge respond rate limit. Defends against retry
+	// storms from a misbehaving client. Keyed by challengeID (not
+	// accountID) so a flood against one challenge doesn't drain the
+	// account's whole budget.
+	if s.rateLimiter != nil && s.cfg.RateLimitChallengeRespondsPerHour > 0 {
+		if !s.rateLimiter.Allow(acme.ActionChallengeRespond, challengeID, s.cfg.RateLimitChallengeRespondsPerHour) {
+			s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+			return nil, ErrACMERateLimited
+		}
+	}
+
+	ch, err := s.repo.GetChallengeByID(ctx, challengeID)
+	if err != nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		if errors.Is(err, repository.ErrNotFound) {
+			return nil, ErrACMEChallengeNotFound
+		}
+		return nil, fmt.Errorf("acme: lookup challenge: %w", err)
+	}
+
+	// Idempotent re-POST: already valid/invalid → just return.
+	if ch.Status == domain.ACMEChallengeStatusValid || ch.Status == domain.ACMEChallengeStatusInvalid {
+		s.metrics.bump(&s.metrics.ChallengeRespondTotal)
+		return ch, nil
+	}
+	if ch.Status == domain.ACMEChallengeStatusProcessing {
+		// In-flight. Return the row as-is.
+		s.metrics.bump(&s.metrics.ChallengeRespondTotal)
+		return ch, nil
+	}
+
+	// Confirm the requesting account owns the parent authz/order.
+	authz, err := s.repo.GetAuthzByID(ctx, ch.AuthzID)
+	if err != nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, fmt.Errorf("acme: lookup parent authz: %w", err)
+	}
+	order, err := s.repo.GetOrderByID(ctx, authz.OrderID)
+	if err != nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, fmt.Errorf("acme: lookup parent order: %w", err)
+	}
+	if order.AccountID != accountID {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, ErrACMEOrderUnauthorized
+	}
+
+	// Compute the key authorization the validator needs.
+	expected, err := acme.KeyAuthorization(ch.Token, accountJWK)
+	if err != nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, fmt.Errorf("acme: key authorization: %w", err)
+	}
+
+	// Transition challenge → processing (atomic with audit row).
+	ch.Status = domain.ACMEChallengeStatusProcessing
+	if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
+		if err := s.repo.UpdateChallengeWithTx(ctx, q, ch); err != nil {
+			return err
+		}
+		return s.auditService.RecordEventWithTx(ctx, q,
+			fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
+			"acme_challenge_processing", "acme_challenge", ch.ChallengeID,
+			map[string]interface{}{
+				"authz_id":   ch.AuthzID,
+				"type":       string(ch.Type),
+				"identifier": authz.Identifier.Value,
+			})
+	}); err != nil {
+		s.metrics.bump(&s.metrics.ChallengeRespondFailTotal)
+		return nil, err
+	}
+
+	// Submit to the pool. The onComplete callback persists the final
+	// challenge status + cascades the parent authz status. We detach
+	// from the request context via context.WithoutCancel so the
+	// callback's WithinTx survives the HTTP handler returning, while
+	// preserving inherited values (logger, trace IDs, audit actor).
+	bgctx := context.WithoutCancel(ctx)
+	chSnapshot := *ch
+	authzSnapshot := *authz
+	identifier := authz.Identifier.Value
+	s.validatorPool.Submit(bgctx, string(ch.Type), identifier, ch.Token, expected, func(verr error) {
+		s.recordChallengeOutcome(bgctx, accountID, &chSnapshot, &authzSnapshot, verr)
+	})
+
+	s.metrics.bump(&s.metrics.ChallengeRespondTotal)
+	return ch, nil
+}
+
+// recordChallengeOutcome is the validator-pool callback. Persists the
+// challenge's final status + cascades the parent authz status.
+//
+// Authz cascade: if the challenge succeeded, the authz becomes valid
+// (RFC 8555 §7.1.6: any one challenge passing makes the authz valid).
+// If the challenge failed, the authz becomes invalid only if no other
+// pending challenges remain (Phase 3 minimal-viable path: we mark the
+// authz invalid on first failure since Phase 3 emits 1 challenge per
+// authz; Phase 4+ extending to multi-challenge-per-authz revisits this).
+func (s *ACMEService) recordChallengeOutcome(
+	ctx context.Context,
+	accountID string,
+	ch *domain.ACMEChallenge,
+	authz *domain.ACMEAuthorization,
+	verr error,
+) {
+	now := time.Now().UTC()
+	var newAuthzStatus domain.ACMEAuthzStatus
+	if verr == nil {
+		ch.Status = domain.ACMEChallengeStatusValid
+		ch.ValidatedAt = &now
+		ch.Error = nil
+		newAuthzStatus = domain.ACMEAuthzStatusValid
+		s.metrics.bump(&s.metrics.ChallengeValidateValid)
+	} else {
+		ch.Status = domain.ACMEChallengeStatusInvalid
+		if p := acme.ChallengeProblemFromError(string(ch.Type), verr); p != nil {
+			ch.Error = &domain.ACMEProblem{
+				Type:   p.Type,
+				Detail: p.Detail,
+				Status: p.Status,
+			}
+		}
+		newAuthzStatus = domain.ACMEAuthzStatusInvalid
+		s.metrics.bump(&s.metrics.ChallengeValidateInvalid)
+	}
+
+	auditDetails := map[string]interface{}{
+		"authz_id":   ch.AuthzID,
+		"type":       string(ch.Type),
+		"identifier": authz.Identifier.Value,
+		"valid":      verr == nil,
+	}
+	if verr != nil {
+		auditDetails["error"] = verr.Error()
+	}
+
+	_ = s.tx.WithinTx(ctx, func(q repository.Querier) error {
+		if err := s.repo.UpdateChallengeWithTx(ctx, q, ch); err != nil {
+			return err
+		}
+		if err := s.repo.UpdateAuthzStatusWithTx(ctx, q, ch.AuthzID, newAuthzStatus); err != nil {
+			return err
+		}
+		// Cascade: if the authz turned valid, see whether the order's
+		// authzs are now ALL valid; flip order to ready if so.
+		// Read-after-write to confirm.
+		authzs, err := s.repo.ListAuthzsByOrder(ctx, authz.OrderID)
+		if err != nil {
+			return err
+		}
+		allValid := len(authzs) > 0
+		anyInvalid := false
+		for _, a := range authzs {
+			if a.AuthzID == ch.AuthzID {
+				if newAuthzStatus != domain.ACMEAuthzStatusValid {
+					allValid = false
+				}
+				if newAuthzStatus == domain.ACMEAuthzStatusInvalid {
+					anyInvalid = true
+				}
+				continue
+			}
+			if a.Status != domain.ACMEAuthzStatusValid {
+				allValid = false
+			}
+			if a.Status == domain.ACMEAuthzStatusInvalid {
+				anyInvalid = true
+			}
+		}
+		order, err := s.repo.GetOrderByID(ctx, authz.OrderID)
+		if err != nil {
+			return err
+		}
+		switch {
+		case allValid && order.Status == domain.ACMEOrderStatusPending:
+			order.Status = domain.ACMEOrderStatusReady
+			if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
+				return err
+			}
+		case anyInvalid && order.Status == domain.ACMEOrderStatusPending:
+			order.Status = domain.ACMEOrderStatusInvalid
+			order.Error = &domain.ACMEProblem{
+				Type:   "urn:ietf:params:acme:error:incorrectResponse",
+				Detail: "one or more authorizations failed",
+				Status: 403,
+			}
+			if err := s.repo.UpdateOrderWithTx(ctx, q, order); err != nil {
+				return err
+			}
+		}
+		return s.auditService.RecordEventWithTx(ctx, q,
+			fmt.Sprintf("acme:%s", accountID), domain.ActorTypeUser,
+			"acme_challenge_completed", "acme_challenge", ch.ChallengeID,
+			auditDetails)
+	})
+}
@@ -0,0 +1,74 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package service
+
+import (
+	"context"
+	"fmt"
+	"sync/atomic"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 9 (2026-05-14): extracted from
+// internal/service/acme.go via the Option B sibling-file pattern.
+// Package stays `service`; every external caller of
+// `service.ACMEService.GarbageCollect(...)` resolves the same way —
+// pure mechanical relocation.
+//
+// This file holds the Phase 5 ACME GC sweep concern: the scheduler-
+// invoked GarbageCollect entry point plus the atomicAddUint64
+// counter helper (only consumed inside the sweep body for the
+// rows-affected-N case the default `bump` doesn't cover).
+
+// GarbageCollect runs a single ACME GC sweep. Phase 5 — the scheduler
+// invokes this every cfg.GCInterval. Three independent sweeps:
+//
+//  1. Delete used / expired nonces.
+//  2. Transition expired pending authzs to `expired`.
+//  3. Transition expired pending/ready/processing orders to `invalid`.
+//
+// Each sweep is a single SQL statement (no per-row transactions) so a
+// large reap is one atomic write per sweep. Per-sweep errors are
+// logged-and-continued: a failing nonces sweep doesn't block the
+// authzs sweep. Returns the first error encountered (for caller
+// telemetry); per-sweep counts are recorded on metrics regardless.
+//
+// Idempotent — repeated runs are safe; the second run finds 0 rows.
+func (s *ACMEService) GarbageCollect(ctx context.Context) error {
+	s.metrics.bump(&s.metrics.GCRunsTotal)
+	var firstErr error
+
+	if n, err := s.repo.GCExpiredNonces(ctx); err != nil {
+		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
+		if firstErr == nil {
+			firstErr = fmt.Errorf("acme gc: nonces: %w", err)
+		}
+	} else if n > 0 {
+		atomicAddUint64(&s.metrics.GCNoncesReapedTotal, uint64(n))
+	}
+
+	if n, err := s.repo.GCExpireAuthorizations(ctx); err != nil {
+		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
+		if firstErr == nil {
+			firstErr = fmt.Errorf("acme gc: authzs: %w", err)
+		}
+	} else if n > 0 {
+		atomicAddUint64(&s.metrics.GCAuthzsExpiredTotal, uint64(n))
+	}
+
+	if n, err := s.repo.GCInvalidateExpiredOrders(ctx); err != nil {
+		s.metrics.bump(&s.metrics.GCRunFailuresTotal)
+		if firstErr == nil {
+			firstErr = fmt.Errorf("acme gc: orders: %w", err)
+		}
+	} else if n > 0 {
+		atomicAddUint64(&s.metrics.GCOrdersInvalidatedTotal, uint64(n))
+	}
+
+	return firstErr
+}
+
+// atomicAddUint64 adds delta to the counter. The metrics struct exposes
+// only `bump` (add 1) by default; this helper covers the
+// rows-affected-N case the GC needs.
+func atomicAddUint64(c *atomic.Uint64, delta uint64) { c.Add(delta) }
@@ -0,0 +1,46 @@
+// Copyright 2026 certctl LLC. All rights reserved.
+// SPDX-License-Identifier: BUSL-1.1
+
+package service
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/certctl-io/certctl/internal/api/acme"
+)
+
+// Phase 9 ARCH-M2 closure Sprint 9 (2026-05-14): extracted from
+// internal/service/acme.go via the Option B sibling-file pattern
+// (operator's choice post-Sprint-8). Package stays `service`; every
+// external caller of `service.ACMEService.IssueNonce(...)` resolves
+// the same way — pure mechanical relocation.
+//
+// This file holds the SERVER-issues-nonce concern: the IssueNonce
+// method that generates + persists a fresh ACME nonce for the
+// Replay-Nonce header per RFC 8555 §6.5. The nonceAdapter type
+// (which wraps ACMERepo.ConsumeNonce for the JWS verifier) stays
+// in acme.go alongside VerifyJWS — it's a verification-infrastructure
+// helper, not a server-side nonce concern.
+
+// IssueNonce generates a fresh ACME nonce, persists it with the
+// configured TTL, and returns the encoded string for the
+// Replay-Nonce header.
+//
+// RFC 8555 §6.5: every successful ACME response carries a
+// Replay-Nonce. Phase 1a wires this via the directory + new-nonce
+// handlers; Phase 1b extends with new-account + account/<id> POST
+// responses (the JWS-authenticated paths).
+func (s *ACMEService) IssueNonce(ctx context.Context) (string, error) {
+	nonce, err := acme.GenerateNonce()
+	if err != nil {
+		s.metrics.bump(&s.metrics.NewNonceFailureTotal)
+		return "", fmt.Errorf("acme: generate nonce: %w", err)
+	}
+	if err := s.repo.IssueNonce(ctx, nonce, s.cfg.NonceTTL); err != nil {
+		s.metrics.bump(&s.metrics.NewNonceFailureTotal)
+		return "", fmt.Errorf("acme: persist nonce: %w", err)
+	}
+	s.metrics.bump(&s.metrics.NewNonceTotal)
+	return nonce, nil
+}