mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:41:29 +00:00
b0efdbe2f8
Closes the #3 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (Part 1.5 finding #1: audit row not transactional with issuance). AuditRepository.Create previously ran on the package-level *sql.DB while the certificate insert / version insert / revocation insert ran on independent connections — a failed audit INSERT after a successful operation INSERT was silently lost. SOX §404 over IT general controls, PCI-DSS §10 audit logging, HIPAA §164.312(b) audit controls, and CA/B Forum Baseline Requirements §5.4.1 audit log records all presume audit-with-operation atomicity. Design — Option A (Querier abstraction). The chosen pattern: a shared repository.Querier interface (subset of *sql.DB and *sql.Tx) plus a postgres.WithinTx helper that begins a tx, runs fn, commits on nil error, rolls back on error or panic, and returns the wrapped result. Repository methods that participate in a service-layer transaction expose a *WithTx variant taking repository.Querier; the bare methods remain for stand-alone use. A repository.Transactor abstracts the "begin tx, run fn, commit/rollback" lifecycle so service-layer code runs multi-write operations atomically without holding *sql.DB directly. Option B (UnitOfWork) was considered but adds boilerplate without behavioral benefit for the current scope. Option C (context-carried tx) was explicitly rejected — it hides the transactional boundary from the type system, reproducing the class of bug we're fixing. This commit: - Adds internal/repository/querier.go with the Querier interface (compile-time guards that *sql.DB and *sql.Tx satisfy it) and the Transactor interface for service-layer use. - Adds internal/repository/postgres/tx.go with the WithinTx helper (begin/fn/commit/rollback with panic recovery) and a transactor type that satisfies repository.Transactor. - Adds CreateWithTx variants on AuditRepository, CertificateRepository (Create + Update + CreateVersion), and RevocationRepository. Existing bare methods now delegate to the *WithTx variant using the package-level *sql.DB so existing call sites are behavior-preserving. - Updates repository/interfaces.go: AuditRepository, CertificateRepository, and RevocationRepository declare the new *WithTx methods. Adds an atomicity contract doc-comment on AuditRepository pointing at WithinTx + the audit blocker. - Adds AuditService.RecordEventWithTx, mirroring RecordEvent but routing through CreateWithTx so the audit row is part of the caller's transaction. Same redaction + marshalling contract. - Refactors three audit-emitting service paths to use Transactor.WithinTx when SetTransactor was wired, with a legacy fallback for backward compat: * CertificateService.Create — cert insert + audit row in one tx. * RevocationSvc.RevokeCertificateWithActor — cert status update + revocation row + audit row in one tx. The OCSP cache invalidate remains best-effort (out of scope per the prompt). * RenewalService CompleteServerRenewal — cert version insert + cert update + audit row in one tx. Job status update stays outside the audit-atomicity scope (job state lives outside the operator-facing audit trail). - Adds SetTransactor on CertificateService, RevocationSvc, and RenewalService. cmd/server/main.go wires a single Transactor instance shared across all three so all audit-emitting paths run their writes in transactions backed by the same *sql.DB handle. - Updates 5 mock implementations to satisfy the new interface methods: mockCertRepo (testutil_test.go), mockCertRepoWithGetError (shortlived_test.go), fakeRevocationRepo (crl_cache_test.go), intuneE2EAuditRepo (scep_intune_e2e_test.go), and the integration- test mocks (lifecycle_test.go: mockCertificateRepository, mockAuditRepository, mockRevocationRepository). All *WithTx mocks ignore the Querier and delegate to the bare method (mocks have no DB; in-memory state is shared regardless of "tx"). - Adds a service-layer test mockTransactor with BeginTxErr and CommitErr knobs so the atomic-audit tests can assert error propagation through the transactional boundary. - Adds internal/repository/postgres/tx_test.go: unit-level test that WithinTx surfaces "begin tx" wrap when BeginTx fails, and that Transactor.WithinTx delegates correctly. Real-Postgres rollback semantics are covered by the testcontainers tests in the postgres package — sandbox disk pressure prevented adding a sqlmock dep for the in-fn / commit-failure unit test, so those scenarios are exercised through atomic_audit_test.go using the mockTransactor's CommitErr / BeginTxErr fields. - Adds internal/service/atomic_audit_test.go: * TestCertificateService_Create_AtomicWithTx — asserts audit insert failure inside the tx surfaces as the operation's error (closes the blocker contract). * TestCertificateService_Create_LegacyPathLogs — pins the backward-compat behavior when SetTransactor isn't wired: audit failure is logged-not-failed, matching pre-fix. * TestCertificateService_Create_TransactorBeginFailure — BeginTx error path: operation fails, no cert insert, no audit insert. * TestCertificateService_Create_TransactorCommitFailure — Commit error after successful in-fn writes surfaces as the operation's error. Real Postgres can fail Commit on serialization conflicts; the service must report this. Out of scope (separate follow-up commits, same shape): - Issuer CRUD audit atomicity. - Target CRUD audit atomicity. - Agent retire (already transactional via RetireAgentWithCascade; verified, not changed). - Renewal-policy CRUD audit atomicity. - Owner/team/agent-group CRUD audit atomicity. - Discovery / health-check audit atomicity. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - golangci-lint run --timeout 5m ./... → 0 issues - go test -short -count=1 ./internal/service/ green - go test -short -count=1 ./internal/api/handler/ green - go test -short -count=1 ./internal/integration/ green - go test -short -count=1 ./internal/repository/postgres/ green - go build ./... success Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #3 (Part 3, narrative section).
257 lines
9.3 KiB
Go
257 lines
9.3 KiB
Go
package service
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"log/slog"
|
|
"time"
|
|
|
|
"github.com/shankar0123/certctl/internal/domain"
|
|
"github.com/shankar0123/certctl/internal/repository"
|
|
)
|
|
|
|
// RevocationSvc provides revocation-related business logic.
|
|
// It handles certificate revocation, revocation notifications, and issuer coordination.
|
|
type RevocationSvc struct {
|
|
certRepo repository.CertificateRepository
|
|
revocationRepo repository.RevocationRepository
|
|
auditService *AuditService
|
|
notificationSvc *NotificationService
|
|
issuerRegistry *IssuerRegistry
|
|
// tx — when set, wraps the cert status update + revocation row
|
|
// insert + audit row in a single transaction. Closes the #3 audit-
|
|
// readiness blocker for the revocation path. Optional via
|
|
// SetTransactor; nil means legacy non-transactional behavior
|
|
// (cert.Update committed independently from revocation row +
|
|
// audit, with revocation insert + audit logged-but-not-failed).
|
|
tx repository.Transactor
|
|
// ocspCacheInvalidator — production hardening II Phase 2 load-
|
|
// bearing security wire. After a successful revocation, the
|
|
// service MUST invalidate the OCSP response cache for this
|
|
// (issuer, serial) so the next OCSP fetch returns the revoked
|
|
// status (not the stale "good" cached blob).
|
|
ocspCacheInvalidator OCSPCacheInvalidator
|
|
}
|
|
|
|
// SetTransactor wires a Transactor for atomic revocation (cert update
|
|
// + revocation row + audit row in a single transaction). Closes the
|
|
// #3 audit-readiness blocker for the revocation path. Optional —
|
|
// nil reverts to the legacy non-transactional behavior.
|
|
func (s *RevocationSvc) SetTransactor(tx repository.Transactor) {
|
|
s.tx = tx
|
|
}
|
|
|
|
// OCSPCacheInvalidator is the minimum surface RevocationSvc needs
|
|
// from the OCSP cache. The cache service implements this interface;
|
|
// the indirection keeps RevocationSvc from depending on the cache
|
|
// type and lets tests inject a fake invalidator.
|
|
type OCSPCacheInvalidator interface {
|
|
InvalidateOnRevoke(ctx context.Context, issuerID, serialHex string) error
|
|
}
|
|
|
|
// SetOCSPCacheInvalidator wires the OCSP cache for invalidate-on-
|
|
// revoke. Production hardening II Phase 2.
|
|
func (s *RevocationSvc) SetOCSPCacheInvalidator(c OCSPCacheInvalidator) {
|
|
s.ocspCacheInvalidator = c
|
|
}
|
|
|
|
// NewRevocationSvc creates a new revocation service.
|
|
func NewRevocationSvc(
|
|
certRepo repository.CertificateRepository,
|
|
revocationRepo repository.RevocationRepository,
|
|
auditService *AuditService,
|
|
) *RevocationSvc {
|
|
return &RevocationSvc{
|
|
certRepo: certRepo,
|
|
revocationRepo: revocationRepo,
|
|
auditService: auditService,
|
|
}
|
|
}
|
|
|
|
// SetNotificationService sets the notification service for revocation alerts.
|
|
func (s *RevocationSvc) SetNotificationService(svc *NotificationService) {
|
|
s.notificationSvc = svc
|
|
}
|
|
|
|
// SetIssuerRegistry sets the issuer registry for issuer-level revocation.
|
|
func (s *RevocationSvc) SetIssuerRegistry(registry *IssuerRegistry) {
|
|
s.issuerRegistry = registry
|
|
}
|
|
|
|
// RevokeCertificateWithActor performs revocation with actor tracking.
|
|
// Steps:
|
|
// 1. Validate the certificate exists and is revocable
|
|
// 2. Get the latest certificate version (for serial number)
|
|
// 3. Update certificate status to Revoked
|
|
// 4. Record revocation in certificate_revocations table
|
|
// 5. Notify the issuer connector (best-effort)
|
|
// 6. Record audit event
|
|
// 7. Send revocation notification
|
|
func (s *RevocationSvc) RevokeCertificateWithActor(ctx context.Context, certID string, reason string, actor string) error {
|
|
// 1. Validate certificate exists and is revocable
|
|
cert, err := s.certRepo.Get(ctx, certID)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to fetch certificate: %w", err)
|
|
}
|
|
|
|
if cert.Status == domain.CertificateStatusRevoked {
|
|
return fmt.Errorf("certificate is already revoked")
|
|
}
|
|
if cert.Status == domain.CertificateStatusArchived {
|
|
return fmt.Errorf("cannot revoke archived certificate")
|
|
}
|
|
|
|
// Validate reason code
|
|
if reason == "" {
|
|
reason = string(domain.RevocationReasonUnspecified)
|
|
}
|
|
if !domain.IsValidRevocationReason(reason) {
|
|
return fmt.Errorf("invalid revocation reason: %s", reason)
|
|
}
|
|
|
|
// 2. Get latest certificate version for serial number
|
|
version, err := s.certRepo.GetLatestVersion(ctx, certID)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to get certificate version: %w", err)
|
|
}
|
|
|
|
// 3. + 4. + audit: cert status update + revocation row + audit row.
|
|
// Atomic path (when SetTransactor was wired) keeps these three
|
|
// writes consistent: a failure in any one rolls back the others.
|
|
// Closes the #3 audit-readiness blocker for the revocation path.
|
|
now := time.Now()
|
|
cert.Status = domain.CertificateStatusRevoked
|
|
cert.RevokedAt = &now
|
|
cert.RevocationReason = reason
|
|
cert.UpdatedAt = now
|
|
|
|
auditDetails := map[string]interface{}{
|
|
"common_name": cert.CommonName,
|
|
"serial": version.SerialNumber,
|
|
"reason": reason,
|
|
}
|
|
|
|
if s.tx != nil {
|
|
// Atomic three-write path.
|
|
if err := s.tx.WithinTx(ctx, func(q repository.Querier) error {
|
|
if err := s.certRepo.UpdateWithTx(ctx, q, cert); err != nil {
|
|
return fmt.Errorf("failed to update certificate status: %w", err)
|
|
}
|
|
if s.revocationRepo != nil {
|
|
revocation := &domain.CertificateRevocation{
|
|
ID: generateID("rev"),
|
|
CertificateID: certID,
|
|
SerialNumber: version.SerialNumber,
|
|
Reason: reason,
|
|
RevokedBy: actor,
|
|
RevokedAt: now,
|
|
IssuerID: cert.IssuerID,
|
|
CreatedAt: now,
|
|
}
|
|
if err := s.revocationRepo.CreateWithTx(ctx, q, revocation); err != nil {
|
|
return fmt.Errorf("failed to record revocation: %w", err)
|
|
}
|
|
}
|
|
if err := s.auditService.RecordEventWithTx(ctx, q, actor, domain.ActorTypeUser,
|
|
"certificate_revoked", "certificate", certID, auditDetails); err != nil {
|
|
return fmt.Errorf("failed to record audit event: %w", err)
|
|
}
|
|
return nil
|
|
}); err != nil {
|
|
return err
|
|
}
|
|
} else {
|
|
// Legacy non-transactional path. Pre-fix behavior preserved
|
|
// for backward compat with callers that haven't wired
|
|
// SetTransactor.
|
|
if err := s.certRepo.Update(ctx, cert); err != nil {
|
|
return fmt.Errorf("failed to update certificate status: %w", err)
|
|
}
|
|
if s.revocationRepo != nil {
|
|
revocation := &domain.CertificateRevocation{
|
|
ID: generateID("rev"),
|
|
CertificateID: certID,
|
|
SerialNumber: version.SerialNumber,
|
|
Reason: reason,
|
|
RevokedBy: actor,
|
|
RevokedAt: now,
|
|
IssuerID: cert.IssuerID,
|
|
CreatedAt: now,
|
|
}
|
|
if err := s.revocationRepo.Create(ctx, revocation); err != nil {
|
|
slog.Error("failed to record revocation for CRL", "error", err, "certificate_id", certID)
|
|
// Don't fail the overall revocation — the cert status is already updated
|
|
}
|
|
}
|
|
}
|
|
|
|
// 5. Notify the issuer connector (best-effort)
|
|
if s.issuerRegistry != nil {
|
|
if issuerConn, ok := s.issuerRegistry.Get(cert.IssuerID); ok {
|
|
if err := issuerConn.RevokeCertificate(ctx, version.SerialNumber, reason); err != nil {
|
|
slog.Error("failed to notify issuer of revocation",
|
|
"error", err,
|
|
"issuer_id", cert.IssuerID,
|
|
"serial", version.SerialNumber)
|
|
// Best-effort — don't fail the overall revocation
|
|
} else if s.revocationRepo != nil {
|
|
// Mark issuer as notified
|
|
revocations, _ := s.revocationRepo.ListByCertificate(ctx, certID)
|
|
for _, rev := range revocations {
|
|
if rev.SerialNumber == version.SerialNumber {
|
|
_ = s.revocationRepo.MarkIssuerNotified(ctx, rev.ID)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
// 5.5. Invalidate the OCSP response cache for this (issuer, serial)
|
|
// so the next OCSP fetch returns the revoked status (not the stale
|
|
// "good" cached blob). Production hardening II Phase 2 LOAD-BEARING
|
|
// security wire — without this, a revoked cert keeps returning
|
|
// "good" until the next ocspCacheRefreshLoop tick.
|
|
//
|
|
// Failure is logged and swallowed: the revocation row is committed,
|
|
// the CRL will reflect the revocation on the next regen, and the
|
|
// admin can manually nuke the cache row if necessary. Failing the
|
|
// caller's revoke on cache-failure would leave the operator's
|
|
// intent unachieved (cert appears not-revoked); failing-soft +
|
|
// logging is the right tradeoff.
|
|
if s.ocspCacheInvalidator != nil {
|
|
if err := s.ocspCacheInvalidator.InvalidateOnRevoke(ctx, cert.IssuerID, version.SerialNumber); err != nil {
|
|
slog.Warn("failed to invalidate OCSP response cache after revocation (revocation still committed)",
|
|
"error", err,
|
|
"issuer_id", cert.IssuerID,
|
|
"serial", version.SerialNumber,
|
|
"certificate_id", certID)
|
|
}
|
|
}
|
|
|
|
// 6. Record audit event (legacy non-transactional path only — the
|
|
// atomic path already recorded the audit inside the tx above).
|
|
if s.tx == nil {
|
|
if err := s.auditService.RecordEvent(ctx, actor, domain.ActorTypeUser,
|
|
"certificate_revoked", "certificate", certID, auditDetails); err != nil {
|
|
slog.Error("failed to record audit event", "error", err)
|
|
}
|
|
}
|
|
|
|
// 7. Send revocation notification
|
|
if s.notificationSvc != nil {
|
|
if err := s.notificationSvc.SendRevocationNotification(ctx, cert, reason); err != nil {
|
|
slog.Error("failed to send revocation notification", "error", err, "certificate_id", certID)
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// GetRevokedCertificates returns all revoked certificate records (for CRL generation).
|
|
func (s *RevocationSvc) GetRevokedCertificates(ctx context.Context) ([]*domain.CertificateRevocation, error) {
|
|
if s.revocationRepo == nil {
|
|
return nil, fmt.Errorf("revocation repository not configured")
|
|
}
|
|
return s.revocationRepo.ListAll(ctx)
|
|
}
|