repo,service: introduce WithinTx and atomic audit rows for issue/renew/revoke

Closes the #3 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit (Part 1.5 finding #1: audit row not transactional with
issuance). AuditRepository.Create previously ran on the package-level
*sql.DB while the certificate insert / version insert / revocation
insert ran on independent connections — a failed audit INSERT after
a successful operation INSERT was silently lost. SOX §404 over IT
general controls, PCI-DSS §10 audit logging, HIPAA §164.312(b) audit
controls, and CA/B Forum Baseline Requirements §5.4.1 audit log
records all presume audit-with-operation atomicity.

Design — Option A (Querier abstraction). The chosen pattern: a shared
repository.Querier interface (subset of *sql.DB and *sql.Tx) plus a
postgres.WithinTx helper that begins a tx, runs fn, commits on nil
error, rolls back on error or panic, and returns the wrapped result.
Repository methods that participate in a service-layer transaction
expose a *WithTx variant taking repository.Querier; the bare methods
remain for stand-alone use. A repository.Transactor abstracts the
"begin tx, run fn, commit/rollback" lifecycle so service-layer code
runs multi-write operations atomically without holding *sql.DB
directly. Option B (UnitOfWork) was considered but adds boilerplate
without behavioral benefit for the current scope. Option C
(context-carried tx) was explicitly rejected — it hides the
transactional boundary from the type system, reproducing the class
of bug we're fixing.

This commit:
- Adds internal/repository/querier.go with the Querier interface
  (compile-time guards that *sql.DB and *sql.Tx satisfy it) and the
  Transactor interface for service-layer use.
- Adds internal/repository/postgres/tx.go with the WithinTx helper
  (begin/fn/commit/rollback with panic recovery) and a transactor
  type that satisfies repository.Transactor.
- Adds CreateWithTx variants on AuditRepository, CertificateRepository
  (Create + Update + CreateVersion), and RevocationRepository.
  Existing bare methods now delegate to the *WithTx variant using
  the package-level *sql.DB so existing call sites are
  behavior-preserving.
- Updates repository/interfaces.go: AuditRepository, CertificateRepository,
  and RevocationRepository declare the new *WithTx methods. Adds an
  atomicity contract doc-comment on AuditRepository pointing at
  WithinTx + the audit blocker.
- Adds AuditService.RecordEventWithTx, mirroring RecordEvent but
  routing through CreateWithTx so the audit row is part of the
  caller's transaction. Same redaction + marshalling contract.
- Refactors three audit-emitting service paths to use Transactor.WithinTx
  when SetTransactor was wired, with a legacy fallback for backward
  compat:
    * CertificateService.Create — cert insert + audit row in one tx.
    * RevocationSvc.RevokeCertificateWithActor — cert status update +
      revocation row + audit row in one tx. The OCSP cache invalidate
      remains best-effort (out of scope per the prompt).
    * RenewalService CompleteServerRenewal — cert version insert +
      cert update + audit row in one tx. Job status update stays
      outside the audit-atomicity scope (job state lives outside
      the operator-facing audit trail).
- Adds SetTransactor on CertificateService, RevocationSvc, and
  RenewalService. cmd/server/main.go wires a single Transactor
  instance shared across all three so all audit-emitting paths run
  their writes in transactions backed by the same *sql.DB handle.
- Updates 5 mock implementations to satisfy the new interface methods:
  mockCertRepo (testutil_test.go), mockCertRepoWithGetError
  (shortlived_test.go), fakeRevocationRepo (crl_cache_test.go),
  intuneE2EAuditRepo (scep_intune_e2e_test.go), and the integration-
  test mocks (lifecycle_test.go: mockCertificateRepository,
  mockAuditRepository, mockRevocationRepository). All *WithTx mocks
  ignore the Querier and delegate to the bare method (mocks have no
  DB; in-memory state is shared regardless of "tx").
- Adds a service-layer test mockTransactor with BeginTxErr and
  CommitErr knobs so the atomic-audit tests can assert error
  propagation through the transactional boundary.
- Adds internal/repository/postgres/tx_test.go: unit-level test that
  WithinTx surfaces "begin tx" wrap when BeginTx fails, and that
  Transactor.WithinTx delegates correctly. Real-Postgres rollback
  semantics are covered by the testcontainers tests in the postgres
  package — sandbox disk pressure prevented adding a sqlmock dep
  for the in-fn / commit-failure unit test, so those scenarios are
  exercised through atomic_audit_test.go using the mockTransactor's
  CommitErr / BeginTxErr fields.
- Adds internal/service/atomic_audit_test.go:
    * TestCertificateService_Create_AtomicWithTx — asserts audit
      insert failure inside the tx surfaces as the operation's error
      (closes the blocker contract).
    * TestCertificateService_Create_LegacyPathLogs — pins the
      backward-compat behavior when SetTransactor isn't wired:
      audit failure is logged-not-failed, matching pre-fix.
    * TestCertificateService_Create_TransactorBeginFailure — BeginTx
      error path: operation fails, no cert insert, no audit insert.
    * TestCertificateService_Create_TransactorCommitFailure —
      Commit error after successful in-fn writes surfaces as the
      operation's error. Real Postgres can fail Commit on
      serialization conflicts; the service must report this.

Out of scope (separate follow-up commits, same shape):
- Issuer CRUD audit atomicity.
- Target CRUD audit atomicity.
- Agent retire (already transactional via RetireAgentWithCascade;
  verified, not changed).
- Renewal-policy CRUD audit atomicity.
- Owner/team/agent-group CRUD audit atomicity.
- Discovery / health-check audit atomicity.

Verified locally:
- gofmt -l . clean
- go vet ./... clean
- staticcheck ./... clean
- golangci-lint run --timeout 5m ./... → 0 issues
- go test -short -count=1 ./internal/service/ green
- go test -short -count=1 ./internal/api/handler/ green
- go test -short -count=1 ./internal/integration/ green
- go test -short -count=1 ./internal/repository/postgres/ green
- go build ./... success

Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix #3 (Part 3, narrative section).
This commit is contained in:
shankar0123
2026-05-02 00:29:09 +00:00
parent 3669556e57
commit b0efdbe2f8
18 changed files with 907 additions and 63 deletions
+50 -1
View File
@@ -24,6 +24,13 @@ var (
)
// CertificateRepository defines operations for managing certificates.
//
// The *WithTx variants on Create / Update / CreateVersion exist so
// service-layer code can run those writes in a single transaction with
// the audit row insert (postgres.WithinTx). Use the bare methods for
// stand-alone operations that do not need transactional semantics; the
// concrete postgres implementation has the bare methods delegate to
// the *WithTx variant using the package-level *sql.DB.
type CertificateRepository interface {
// List returns a paginated list of certificates matching the filter criteria.
List(ctx context.Context, filter *CertificateFilter) ([]*domain.ManagedCertificate, int, error)
@@ -31,14 +38,28 @@ type CertificateRepository interface {
Get(ctx context.Context, id string) (*domain.ManagedCertificate, error)
// Create stores a new certificate.
Create(ctx context.Context, cert *domain.ManagedCertificate) error
// CreateWithTx stores a new certificate using the supplied Querier
// (typically *sql.Tx from postgres.WithinTx). Closes the audit-
// atomicity blocker for the issuance path.
CreateWithTx(ctx context.Context, q Querier, cert *domain.ManagedCertificate) error
// Update modifies an existing certificate.
Update(ctx context.Context, cert *domain.ManagedCertificate) error
// UpdateWithTx modifies an existing certificate using the supplied
// Querier. Closes the audit-atomicity blocker for the revocation
// path (cert status update must be atomic with the revocation row +
// audit row insert).
UpdateWithTx(ctx context.Context, q Querier, cert *domain.ManagedCertificate) error
// Archive marks a certificate as archived.
Archive(ctx context.Context, id string) error
// ListVersions returns all versions of a certificate.
ListVersions(ctx context.Context, certID string) ([]*domain.CertificateVersion, error)
// CreateVersion stores a new certificate version.
CreateVersion(ctx context.Context, version *domain.CertificateVersion) error
// CreateVersionWithTx stores a new certificate version using the
// supplied Querier. Closes the audit-atomicity blocker for the
// renewal path (version row must be atomic with the audit row
// insert).
CreateVersionWithTx(ctx context.Context, q Querier, version *domain.CertificateVersion) error
// GetExpiringCertificates returns certificates expiring before the given time.
GetExpiringCertificates(ctx context.Context, before time.Time) ([]*domain.ManagedCertificate, error)
// GetLatestVersion returns the most recent certificate version for a certificate.
@@ -58,6 +79,12 @@ type RevocationRepository interface {
// (issuer_id, serial_number) per RFC 5280 §5.2.3, so duplicate serials
// across different issuers are permitted.
Create(ctx context.Context, revocation *domain.CertificateRevocation) error
// CreateWithTx records a revocation using the supplied Querier
// (typically *sql.Tx from postgres.WithinTx). Closes the audit-
// atomicity blocker for the revocation path: the
// certificate_revocations row must be atomic with the
// managed_certificates status update + audit row insert.
CreateWithTx(ctx context.Context, q Querier, revocation *domain.CertificateRevocation) error
// GetByIssuerAndSerial retrieves a revocation by the (issuer_id, serial_number)
// pair. Callers (OCSP, CRL generation) always know the issuer because
// protocol endpoints carry it in the request path; RFC 5280 §5.2.3 guarantees
@@ -426,9 +453,31 @@ type PolicyRepository interface {
}
// AuditRepository defines operations for recording and retrieving audit logs.
//
// Atomicity contract (closes the #3 acquisition-readiness blocker from the
// 2026-05-01 issuer coverage audit, Part 1.5 finding #1): callers that
// emit an audit row as part of a logical operation (issuance, renewal,
// revocation) MUST use CreateWithTx and pass the same *sql.Tx that wraps
// the operation's other writes. The bare Create method exists only for
// stand-alone admin operations that do not have a paired state change
// (manual audit entry, system events that are themselves the only
// state change). Callers using the bare method MUST NOT rely on its
// behavior for compliance-relevant audit trails — those go through
// CreateWithTx + WithinTx.
//
// SOX §404 over IT general controls, PCI-DSS §10 audit logging, HIPAA
// §164.312(b) audit controls, and CA/B Forum Baseline Requirements
// §5.4.1 audit log records all presume audit-with-operation atomicity.
type AuditRepository interface {
// Create stores a new audit event.
// Create stores a new audit event using the repository's package-
// level *sql.DB. Use CreateWithTx when the audit event must be
// atomic with another database operation in a service-layer
// transaction.
Create(ctx context.Context, event *domain.AuditEvent) error
// CreateWithTx stores a new audit event using the supplied Querier.
// Pass *sql.Tx (typically from postgres.WithinTx) to participate in
// a caller's transaction. Closes the audit-atomicity blocker.
CreateWithTx(ctx context.Context, q Querier, event *domain.AuditEvent) error
// List returns audit events matching the filter criteria.
List(ctx context.Context, filter *AuditFilter) ([]*domain.AuditEvent, error)
}