mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:31:33 +00:00
b0efdbe2f8
Closes the #3 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (Part 1.5 finding #1: audit row not transactional with issuance). AuditRepository.Create previously ran on the package-level *sql.DB while the certificate insert / version insert / revocation insert ran on independent connections — a failed audit INSERT after a successful operation INSERT was silently lost. SOX §404 over IT general controls, PCI-DSS §10 audit logging, HIPAA §164.312(b) audit controls, and CA/B Forum Baseline Requirements §5.4.1 audit log records all presume audit-with-operation atomicity. Design — Option A (Querier abstraction). The chosen pattern: a shared repository.Querier interface (subset of *sql.DB and *sql.Tx) plus a postgres.WithinTx helper that begins a tx, runs fn, commits on nil error, rolls back on error or panic, and returns the wrapped result. Repository methods that participate in a service-layer transaction expose a *WithTx variant taking repository.Querier; the bare methods remain for stand-alone use. A repository.Transactor abstracts the "begin tx, run fn, commit/rollback" lifecycle so service-layer code runs multi-write operations atomically without holding *sql.DB directly. Option B (UnitOfWork) was considered but adds boilerplate without behavioral benefit for the current scope. Option C (context-carried tx) was explicitly rejected — it hides the transactional boundary from the type system, reproducing the class of bug we're fixing. This commit: - Adds internal/repository/querier.go with the Querier interface (compile-time guards that *sql.DB and *sql.Tx satisfy it) and the Transactor interface for service-layer use. - Adds internal/repository/postgres/tx.go with the WithinTx helper (begin/fn/commit/rollback with panic recovery) and a transactor type that satisfies repository.Transactor. - Adds CreateWithTx variants on AuditRepository, CertificateRepository (Create + Update + CreateVersion), and RevocationRepository. Existing bare methods now delegate to the *WithTx variant using the package-level *sql.DB so existing call sites are behavior-preserving. - Updates repository/interfaces.go: AuditRepository, CertificateRepository, and RevocationRepository declare the new *WithTx methods. Adds an atomicity contract doc-comment on AuditRepository pointing at WithinTx + the audit blocker. - Adds AuditService.RecordEventWithTx, mirroring RecordEvent but routing through CreateWithTx so the audit row is part of the caller's transaction. Same redaction + marshalling contract. - Refactors three audit-emitting service paths to use Transactor.WithinTx when SetTransactor was wired, with a legacy fallback for backward compat: * CertificateService.Create — cert insert + audit row in one tx. * RevocationSvc.RevokeCertificateWithActor — cert status update + revocation row + audit row in one tx. The OCSP cache invalidate remains best-effort (out of scope per the prompt). * RenewalService CompleteServerRenewal — cert version insert + cert update + audit row in one tx. Job status update stays outside the audit-atomicity scope (job state lives outside the operator-facing audit trail). - Adds SetTransactor on CertificateService, RevocationSvc, and RenewalService. cmd/server/main.go wires a single Transactor instance shared across all three so all audit-emitting paths run their writes in transactions backed by the same *sql.DB handle. - Updates 5 mock implementations to satisfy the new interface methods: mockCertRepo (testutil_test.go), mockCertRepoWithGetError (shortlived_test.go), fakeRevocationRepo (crl_cache_test.go), intuneE2EAuditRepo (scep_intune_e2e_test.go), and the integration- test mocks (lifecycle_test.go: mockCertificateRepository, mockAuditRepository, mockRevocationRepository). All *WithTx mocks ignore the Querier and delegate to the bare method (mocks have no DB; in-memory state is shared regardless of "tx"). - Adds a service-layer test mockTransactor with BeginTxErr and CommitErr knobs so the atomic-audit tests can assert error propagation through the transactional boundary. - Adds internal/repository/postgres/tx_test.go: unit-level test that WithinTx surfaces "begin tx" wrap when BeginTx fails, and that Transactor.WithinTx delegates correctly. Real-Postgres rollback semantics are covered by the testcontainers tests in the postgres package — sandbox disk pressure prevented adding a sqlmock dep for the in-fn / commit-failure unit test, so those scenarios are exercised through atomic_audit_test.go using the mockTransactor's CommitErr / BeginTxErr fields. - Adds internal/service/atomic_audit_test.go: * TestCertificateService_Create_AtomicWithTx — asserts audit insert failure inside the tx surfaces as the operation's error (closes the blocker contract). * TestCertificateService_Create_LegacyPathLogs — pins the backward-compat behavior when SetTransactor isn't wired: audit failure is logged-not-failed, matching pre-fix. * TestCertificateService_Create_TransactorBeginFailure — BeginTx error path: operation fails, no cert insert, no audit insert. * TestCertificateService_Create_TransactorCommitFailure — Commit error after successful in-fn writes surfaces as the operation's error. Real Postgres can fail Commit on serialization conflicts; the service must report this. Out of scope (separate follow-up commits, same shape): - Issuer CRUD audit atomicity. - Target CRUD audit atomicity. - Agent retire (already transactional via RetireAgentWithCascade; verified, not changed). - Renewal-policy CRUD audit atomicity. - Owner/team/agent-group CRUD audit atomicity. - Discovery / health-check audit atomicity. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - golangci-lint run --timeout 5m ./... → 0 issues - go test -short -count=1 ./internal/service/ green - go test -short -count=1 ./internal/api/handler/ green - go test -short -count=1 ./internal/integration/ green - go test -short -count=1 ./internal/repository/postgres/ green - go build ./... success Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #3 (Part 3, narrative section).
326 lines
10 KiB
Go
326 lines
10 KiB
Go
package service_test
|
|
|
|
import (
|
|
"context"
|
|
"io"
|
|
"log/slog"
|
|
"sync"
|
|
"sync/atomic"
|
|
"testing"
|
|
"time"
|
|
|
|
"github.com/shankar0123/certctl/internal/connector/issuer"
|
|
localissuer "github.com/shankar0123/certctl/internal/connector/issuer/local"
|
|
"github.com/shankar0123/certctl/internal/domain"
|
|
"github.com/shankar0123/certctl/internal/repository"
|
|
"github.com/shankar0123/certctl/internal/service"
|
|
)
|
|
|
|
// fakeCRLCacheRepo is an in-memory repository for CRLCacheService
|
|
// tests. The Postgres impl is covered by the testcontainers tests in
|
|
// internal/repository/postgres/crl_cache_test.go (CI only — needs Docker).
|
|
type fakeCRLCacheRepo struct {
|
|
mu sync.Mutex
|
|
rows map[string]*domain.CRLCacheEntry
|
|
events []*domain.CRLGenerationEvent
|
|
getCount int
|
|
putCount int
|
|
}
|
|
|
|
func newFakeCRLCacheRepo() *fakeCRLCacheRepo {
|
|
return &fakeCRLCacheRepo{rows: map[string]*domain.CRLCacheEntry{}}
|
|
}
|
|
|
|
func (r *fakeCRLCacheRepo) Get(_ context.Context, issuerID string) (*domain.CRLCacheEntry, error) {
|
|
r.mu.Lock()
|
|
defer r.mu.Unlock()
|
|
r.getCount++
|
|
if entry, ok := r.rows[issuerID]; ok {
|
|
copy := *entry
|
|
return ©, nil
|
|
}
|
|
return nil, nil
|
|
}
|
|
|
|
func (r *fakeCRLCacheRepo) Put(_ context.Context, entry *domain.CRLCacheEntry) error {
|
|
r.mu.Lock()
|
|
defer r.mu.Unlock()
|
|
r.putCount++
|
|
copy := *entry
|
|
r.rows[entry.IssuerID] = ©
|
|
return nil
|
|
}
|
|
|
|
func (r *fakeCRLCacheRepo) NextCRLNumber(_ context.Context, issuerID string) (int64, error) {
|
|
r.mu.Lock()
|
|
defer r.mu.Unlock()
|
|
if entry, ok := r.rows[issuerID]; ok {
|
|
return entry.CRLNumber + 1, nil
|
|
}
|
|
return 1, nil
|
|
}
|
|
|
|
func (r *fakeCRLCacheRepo) RecordGenerationEvent(_ context.Context, evt *domain.CRLGenerationEvent) error {
|
|
r.mu.Lock()
|
|
defer r.mu.Unlock()
|
|
copy := *evt
|
|
r.events = append(r.events, ©)
|
|
return nil
|
|
}
|
|
|
|
func (r *fakeCRLCacheRepo) ListGenerationEvents(_ context.Context, issuerID string, limit int) ([]*domain.CRLGenerationEvent, error) {
|
|
r.mu.Lock()
|
|
defer r.mu.Unlock()
|
|
var out []*domain.CRLGenerationEvent
|
|
for _, evt := range r.events {
|
|
if evt.IssuerID == issuerID {
|
|
copy := *evt
|
|
out = append(out, ©)
|
|
}
|
|
}
|
|
return out, nil
|
|
}
|
|
|
|
// fakeRevocationRepo is the minimal shape CAOperationsSvc needs:
|
|
// returning revocations by issuer. The cache service walks
|
|
// CAOperationsSvc.GenerateDERCRL, which calls into this.
|
|
type fakeRevocationRepo struct{}
|
|
|
|
func (fakeRevocationRepo) Create(context.Context, *domain.CertificateRevocation) error {
|
|
return nil
|
|
}
|
|
func (fakeRevocationRepo) CreateWithTx(context.Context, repository.Querier, *domain.CertificateRevocation) error {
|
|
return nil
|
|
}
|
|
func (fakeRevocationRepo) GetByIssuerAndSerial(context.Context, string, string) (*domain.CertificateRevocation, error) {
|
|
return nil, nil
|
|
}
|
|
func (fakeRevocationRepo) ListAll(context.Context) ([]*domain.CertificateRevocation, error) {
|
|
return nil, nil
|
|
}
|
|
func (fakeRevocationRepo) ListByIssuer(_ context.Context, issuerID string) ([]*domain.CertificateRevocation, error) {
|
|
// Empty list = no revoked certs; the issuer connector still
|
|
// produces a valid empty CRL (RFC 5280 allows zero entries).
|
|
return nil, nil
|
|
}
|
|
func (fakeRevocationRepo) ListByCertificate(context.Context, string) ([]*domain.CertificateRevocation, error) {
|
|
return nil, nil
|
|
}
|
|
func (fakeRevocationRepo) MarkIssuerNotified(context.Context, string) error { return nil }
|
|
|
|
// helper: spin up a CAOperationsSvc + IssuerRegistry wired with a real
|
|
// local issuer connector. The local issuer's GenerateCRL produces a
|
|
// real DER-encoded CRL that the cache service can parse + persist.
|
|
func newCacheServiceFixture(t *testing.T) (svc *service.CRLCacheService, repo *fakeCRLCacheRepo, registry *service.IssuerRegistry) {
|
|
t.Helper()
|
|
|
|
logger := slog.New(slog.NewTextHandler(io.Discard, nil))
|
|
repo = newFakeCRLCacheRepo()
|
|
|
|
// Real local issuer — produces a real CRL on GenerateCRL.
|
|
localConn := localissuer.New(&localissuer.Config{
|
|
CACommonName: "Test Cache CA",
|
|
ValidityDays: 30,
|
|
}, logger)
|
|
|
|
registry = service.NewIssuerRegistry(logger)
|
|
registry.Set("iss-cache-test", service.NewIssuerConnectorAdapter(localConn))
|
|
|
|
caSvc := service.NewCAOperationsSvc(fakeRevocationRepo{}, nil, nil)
|
|
caSvc.SetIssuerRegistry(registry)
|
|
|
|
svc = service.NewCRLCacheService(repo, caSvc, registry, logger)
|
|
return
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Get: cache hit, miss, staleness
|
|
// ---------------------------------------------------------------------------
|
|
|
|
func TestCRLCacheService_Get_MissTriggersGeneration(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
der, thisUpdate, err := svc.Get(ctx, "iss-cache-test")
|
|
if err != nil {
|
|
t.Fatalf("Get: %v", err)
|
|
}
|
|
if len(der) == 0 {
|
|
t.Fatal("Get returned empty DER")
|
|
}
|
|
if thisUpdate.IsZero() {
|
|
t.Fatal("ThisUpdate is zero")
|
|
}
|
|
if repo.putCount != 1 {
|
|
t.Errorf("putCount = %d, want 1 (miss should trigger one generation)", repo.putCount)
|
|
}
|
|
}
|
|
|
|
func TestCRLCacheService_Get_HitSkipsGeneration(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
// Prime the cache.
|
|
if _, _, err := svc.Get(ctx, "iss-cache-test"); err != nil {
|
|
t.Fatalf("prime: %v", err)
|
|
}
|
|
if repo.putCount != 1 {
|
|
t.Fatalf("prime: putCount = %d, want 1", repo.putCount)
|
|
}
|
|
|
|
// Second Get should be a cache hit.
|
|
if _, _, err := svc.Get(ctx, "iss-cache-test"); err != nil {
|
|
t.Fatalf("hit: %v", err)
|
|
}
|
|
if repo.putCount != 1 {
|
|
t.Errorf("putCount = %d, want 1 (hit should not regenerate)", repo.putCount)
|
|
}
|
|
}
|
|
|
|
func TestCRLCacheService_Get_StalenessTriggersRegeneration(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
// Prime the cache with a row whose next_update is in the past.
|
|
stale := &domain.CRLCacheEntry{
|
|
IssuerID: "iss-cache-test",
|
|
CRLDER: []byte("stale-der"),
|
|
CRLNumber: 1,
|
|
ThisUpdate: time.Now().Add(-48 * time.Hour),
|
|
NextUpdate: time.Now().Add(-24 * time.Hour), // expired
|
|
GeneratedAt: time.Now().Add(-48 * time.Hour),
|
|
}
|
|
if err := repo.Put(ctx, stale); err != nil {
|
|
t.Fatalf("seed stale: %v", err)
|
|
}
|
|
repo.putCount = 0
|
|
|
|
// Get should detect staleness and regenerate.
|
|
der, _, err := svc.Get(ctx, "iss-cache-test")
|
|
if err != nil {
|
|
t.Fatalf("Get on stale: %v", err)
|
|
}
|
|
if string(der) == "stale-der" {
|
|
t.Error("Get returned stale DER instead of regenerating")
|
|
}
|
|
if repo.putCount != 1 {
|
|
t.Errorf("putCount = %d, want 1 (staleness should trigger one regen)", repo.putCount)
|
|
}
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// RegenerateAll
|
|
// ---------------------------------------------------------------------------
|
|
|
|
func TestCRLCacheService_RegenerateAll_PopulatesAllIssuers(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
svc.RegenerateAll(ctx)
|
|
|
|
row, _ := repo.Get(ctx, "iss-cache-test")
|
|
if row == nil {
|
|
t.Fatal("RegenerateAll did not populate iss-cache-test")
|
|
}
|
|
if row.RevokedCount != 0 {
|
|
t.Errorf("RevokedCount = %d, want 0 (fakeRevocationRepo is empty)", row.RevokedCount)
|
|
}
|
|
events, _ := repo.ListGenerationEvents(ctx, "iss-cache-test", 10)
|
|
if len(events) != 1 {
|
|
t.Fatalf("expected 1 generation event, got %d", len(events))
|
|
}
|
|
if !events[0].Succeeded {
|
|
t.Error("event.Succeeded should be true on happy path")
|
|
}
|
|
}
|
|
|
|
func TestCRLCacheService_RegenerateAll_RespectsCancelledContext(t *testing.T) {
|
|
svc, _, _ := newCacheServiceFixture(t)
|
|
ctx, cancel := context.WithCancel(context.Background())
|
|
cancel()
|
|
|
|
// Should return without panicking. The single-issuer fixture means
|
|
// there's nothing to iterate after the cancel check, so this is
|
|
// mostly a smoke test for the ctx.Done() branch.
|
|
svc.RegenerateAll(ctx)
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Singleflight: concurrent miss requests for the same issuer collapse
|
|
// ---------------------------------------------------------------------------
|
|
|
|
func TestCRLCacheService_Get_SingleflightCollapsesConcurrentMisses(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
// Fire 20 concurrent Get calls for the same uncached issuer. The
|
|
// in-tree singleflight gate should collapse them to a single
|
|
// underlying generation (putCount == 1).
|
|
var wg sync.WaitGroup
|
|
var errCount atomic.Int32
|
|
for i := 0; i < 20; i++ {
|
|
wg.Add(1)
|
|
go func() {
|
|
defer wg.Done()
|
|
if _, _, err := svc.Get(ctx, "iss-cache-test"); err != nil {
|
|
errCount.Add(1)
|
|
t.Errorf("concurrent Get: %v", err)
|
|
}
|
|
}()
|
|
}
|
|
wg.Wait()
|
|
|
|
if errCount.Load() != 0 {
|
|
t.Fatalf("%d errors across concurrent Gets", errCount.Load())
|
|
}
|
|
if repo.putCount != 1 {
|
|
t.Errorf("singleflight failed: putCount = %d, want 1 (20 concurrent misses must collapse)", repo.putCount)
|
|
}
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Error paths
|
|
// ---------------------------------------------------------------------------
|
|
|
|
func TestCRLCacheService_Get_NoIssuerInRegistry_RecordsFailureEvent(t *testing.T) {
|
|
svc, repo, _ := newCacheServiceFixture(t)
|
|
ctx := context.Background()
|
|
|
|
// Issuer ID that doesn't exist in the registry → CAOperationsSvc
|
|
// returns an error → cache service records a failure event +
|
|
// surfaces the error to the caller.
|
|
_, _, err := svc.Get(ctx, "iss-does-not-exist")
|
|
if err == nil {
|
|
t.Fatal("Get for unknown issuer should error")
|
|
}
|
|
events, _ := repo.ListGenerationEvents(ctx, "iss-does-not-exist", 10)
|
|
if len(events) != 1 {
|
|
t.Fatalf("expected 1 failure event, got %d", len(events))
|
|
}
|
|
if events[0].Succeeded {
|
|
t.Error("failure event should have Succeeded=false")
|
|
}
|
|
if events[0].Error == "" {
|
|
t.Error("failure event should carry an error message")
|
|
}
|
|
}
|
|
|
|
func TestCRLCacheService_Get_NoCacheRepo_Errors(t *testing.T) {
|
|
logger := slog.New(slog.NewTextHandler(io.Discard, nil))
|
|
svc := service.NewCRLCacheService(nil, nil, nil, logger)
|
|
_, _, err := svc.Get(context.Background(), "any")
|
|
if err == nil {
|
|
t.Fatal("Get with nil cacheRepo should error")
|
|
}
|
|
}
|
|
|
|
// pin via interface satisfaction (compile-time check that fakeRevocationRepo
|
|
// matches what CAOperationsSvc actually calls — guards against shape drift
|
|
// in the repository.RevocationRepository interface).
|
|
var _ interface {
|
|
ListByIssuer(ctx context.Context, issuerID string) ([]*domain.CertificateRevocation, error)
|
|
} = fakeRevocationRepo{}
|
|
|
|
// _ silence the unused import warning when issuer adapter machinery moves.
|
|
var _ = issuer.IssuanceRequest{}
|