mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-08 19:18:55 +00:00
da00ee0ca5d57e9f84cfbc2fb4f09596c24ed67e
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b0efdbe2f8 |
repo,service: introduce WithinTx and atomic audit rows for issue/renew/revoke
Closes the #3 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (Part 1.5 finding #1: audit row not transactional with issuance). AuditRepository.Create previously ran on the package-level *sql.DB while the certificate insert / version insert / revocation insert ran on independent connections — a failed audit INSERT after a successful operation INSERT was silently lost. SOX §404 over IT general controls, PCI-DSS §10 audit logging, HIPAA §164.312(b) audit controls, and CA/B Forum Baseline Requirements §5.4.1 audit log records all presume audit-with-operation atomicity. Design — Option A (Querier abstraction). The chosen pattern: a shared repository.Querier interface (subset of *sql.DB and *sql.Tx) plus a postgres.WithinTx helper that begins a tx, runs fn, commits on nil error, rolls back on error or panic, and returns the wrapped result. Repository methods that participate in a service-layer transaction expose a *WithTx variant taking repository.Querier; the bare methods remain for stand-alone use. A repository.Transactor abstracts the "begin tx, run fn, commit/rollback" lifecycle so service-layer code runs multi-write operations atomically without holding *sql.DB directly. Option B (UnitOfWork) was considered but adds boilerplate without behavioral benefit for the current scope. Option C (context-carried tx) was explicitly rejected — it hides the transactional boundary from the type system, reproducing the class of bug we're fixing. This commit: - Adds internal/repository/querier.go with the Querier interface (compile-time guards that *sql.DB and *sql.Tx satisfy it) and the Transactor interface for service-layer use. - Adds internal/repository/postgres/tx.go with the WithinTx helper (begin/fn/commit/rollback with panic recovery) and a transactor type that satisfies repository.Transactor. - Adds CreateWithTx variants on AuditRepository, CertificateRepository (Create + Update + CreateVersion), and RevocationRepository. Existing bare methods now delegate to the *WithTx variant using the package-level *sql.DB so existing call sites are behavior-preserving. - Updates repository/interfaces.go: AuditRepository, CertificateRepository, and RevocationRepository declare the new *WithTx methods. Adds an atomicity contract doc-comment on AuditRepository pointing at WithinTx + the audit blocker. - Adds AuditService.RecordEventWithTx, mirroring RecordEvent but routing through CreateWithTx so the audit row is part of the caller's transaction. Same redaction + marshalling contract. - Refactors three audit-emitting service paths to use Transactor.WithinTx when SetTransactor was wired, with a legacy fallback for backward compat: * CertificateService.Create — cert insert + audit row in one tx. * RevocationSvc.RevokeCertificateWithActor — cert status update + revocation row + audit row in one tx. The OCSP cache invalidate remains best-effort (out of scope per the prompt). * RenewalService CompleteServerRenewal — cert version insert + cert update + audit row in one tx. Job status update stays outside the audit-atomicity scope (job state lives outside the operator-facing audit trail). - Adds SetTransactor on CertificateService, RevocationSvc, and RenewalService. cmd/server/main.go wires a single Transactor instance shared across all three so all audit-emitting paths run their writes in transactions backed by the same *sql.DB handle. - Updates 5 mock implementations to satisfy the new interface methods: mockCertRepo (testutil_test.go), mockCertRepoWithGetError (shortlived_test.go), fakeRevocationRepo (crl_cache_test.go), intuneE2EAuditRepo (scep_intune_e2e_test.go), and the integration- test mocks (lifecycle_test.go: mockCertificateRepository, mockAuditRepository, mockRevocationRepository). All *WithTx mocks ignore the Querier and delegate to the bare method (mocks have no DB; in-memory state is shared regardless of "tx"). - Adds a service-layer test mockTransactor with BeginTxErr and CommitErr knobs so the atomic-audit tests can assert error propagation through the transactional boundary. - Adds internal/repository/postgres/tx_test.go: unit-level test that WithinTx surfaces "begin tx" wrap when BeginTx fails, and that Transactor.WithinTx delegates correctly. Real-Postgres rollback semantics are covered by the testcontainers tests in the postgres package — sandbox disk pressure prevented adding a sqlmock dep for the in-fn / commit-failure unit test, so those scenarios are exercised through atomic_audit_test.go using the mockTransactor's CommitErr / BeginTxErr fields. - Adds internal/service/atomic_audit_test.go: * TestCertificateService_Create_AtomicWithTx — asserts audit insert failure inside the tx surfaces as the operation's error (closes the blocker contract). * TestCertificateService_Create_LegacyPathLogs — pins the backward-compat behavior when SetTransactor isn't wired: audit failure is logged-not-failed, matching pre-fix. * TestCertificateService_Create_TransactorBeginFailure — BeginTx error path: operation fails, no cert insert, no audit insert. * TestCertificateService_Create_TransactorCommitFailure — Commit error after successful in-fn writes surfaces as the operation's error. Real Postgres can fail Commit on serialization conflicts; the service must report this. Out of scope (separate follow-up commits, same shape): - Issuer CRUD audit atomicity. - Target CRUD audit atomicity. - Agent retire (already transactional via RetireAgentWithCascade; verified, not changed). - Renewal-policy CRUD audit atomicity. - Owner/team/agent-group CRUD audit atomicity. - Discovery / health-check audit atomicity. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - golangci-lint run --timeout 5m ./... → 0 issues - go test -short -count=1 ./internal/service/ green - go test -short -count=1 ./internal/api/handler/ green - go test -short -count=1 ./internal/integration/ green - go test -short -count=1 ./internal/repository/postgres/ green - go build ./... success Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #3 (Part 3, narrative section). |
||
|
|
7cb453a336 |
chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit
|
||
|
|
0e29c416b1 |
refactor(handler,repo): replace strings.Contains error dispatch with typed sentinels (S-2)
Closes one 2026-04-24 audit finding (P2):
- cat-s6-efc7f6f6bd50: 30 strings.Contains(err.Error(), ...) sites
in internal/api/handler/ — brittle to repository-layer message
changes, untyped against the actual failure mode.
Approach (Option B from prompt design notes):
- New typed sentinels in internal/repository/errors.go:
ErrNotFound, ErrForeignKeyConstraint
IsForeignKeyError(err) helper (the only place substring
matching at the lib/pq boundary is allowed; isolates the
DB-driver string knowledge to one function).
- New typed sentinel in internal/domain/errors.go:
ErrValidation (reserved for future per-entity validation
wrappers; not yet used by all handlers).
- 49 sites in internal/repository/postgres/*.go updated to wrap
sql.ErrNoRows-derived errors via fmt.Errorf("...: %w",
repository.ErrNotFound).
- 18 not-found handler sites + 2 FK-constraint handler sites
refactored to errors.Is(err, repository.ErrNotFound) /
repository.IsForeignKeyError(err).
- 23 inline `fmt.Errorf("X not found")` test fixtures across
handler tests rewrapped to wrap repository.ErrNotFound.
- test_utils.go::ErrMockNotFound rewrapped to wrap
repository.ErrNotFound; renewal_policy.go closure docblock
updated to reflect the new convention.
- integration test mockJobRepository.Get wraps repository.ErrNotFound.
CI regression guardrail:
- .github/workflows/ci.yml::"Forbidden strings.Contains(err.Error())
regression guard (S-2)" greps for the three patterns ("not found",
"violates foreign key", "RESTRICT") under internal/api/handler/
and fails the build on regression.
Verification:
- go build ./... — clean
- go vet ./... — clean
- go test ./... -short -count=1 — all packages pass (handler +
repository + service + integration)
- golangci-lint v2.11.4 run ./... — 0 issues
- S-2 guardrail dry-run on post-fix tree → empty (good)
- All sibling guardrails (S-1, G-3, D-1+D-2, B-1, L-1, H-1, C-1, F-1, P-1) pass
Audit findings closed:
- cat-s6-efc7f6f6bd50 (P2)
Deferred follow-ups:
- 6 domain-specific substring patterns still inline in handlers
("cannot approve", "cannot reject", "cannot be parsed",
"no certificates found", "challenge password", "invalid"/
"required" validation chains in profiles + agent_groups). Each
needs its own typed sentinel, scoped per service. Documented
by the S-2 CI guardrail's allowlist for closure-comments only.
- Per-entity not-found sentinels (Option A — ErrCertificateNotFound,
ErrAgentNotFound, etc.) deferred. Generic ErrNotFound covers the
current dispatch needs; per-entity precision would let handlers
return entity-aware error bodies without a domain.Type field,
but not blocking.
|
||
|
|
4e5522a999 |
F-001/F-002/F-003: CRL prefix-scan, digest error sanitization, ctx-aware sleeps
F-001 (P3): GenerateDERCRL scoped to issuer via composite index
- Add RevocationRepository.ListByIssuer leveraging migration 000012's
idx_certificate_revocations_issuer_serial composite index as a
prefix-scan target. Previously CAOperationsSvc.GenerateDERCRL called
ListAll() and filtered by IssuerID in Go — O(total revocations)
regardless of how many revocations belonged to the target issuer.
- Rewrite GenerateDERCRL to call ListByIssuer(ctx, issuerID) so PostgreSQL
drives a prefix scan of the composite index. Drops the in-memory filter.
- New regression test in ca_operations_test.go asserts the CRL hot path
invokes ListByIssuer exactly once and never ListAll, and that the
issuerID is threaded through correctly.
F-002 (P3): digest.go admin-auth endpoints no longer leak internal errors
- PreviewDigest (GET /api/v1/digest/preview) and SendDigest
(POST /api/v1/digest/send) previously wrote err.Error() into the HTTP
response body on 500s. Replace with slog.Error server-side logging plus
a generic "internal error" response body, matching the house pattern
in certificates.go and export.go.
F-003 (P4): three blocking time.Sleep sites now honor ctx cancellation
- internal/connector/issuer/acme/acme.go:672 (DNS-01 propagation wait)
now runs under a select{case <-ctx.Done(): CleanUp + return ctx.Err();
case <-time.After(d):} so graceful shutdown doesn't get stuck behind
the propagation delay.
- internal/connector/issuer/acme/acme.go:786 (dns-persist-01 propagation
wait) same pattern, returns ctx.Err() on cancel.
- cmd/agent/main.go:272 (polling backoff inside the heartbeat loop) now
wraps the sleep in select{case <-ctx.Done(): continue; case <-time.After(backoff):}
so the outer <-ctx.Done() case on the parent loop fires cleanly.
Verification: build, vet, and race-enabled short tests green across all
55+ packages. govulncheck reports zero vulnerabilities in the code path.
No migration needed — F-001 reuses the existing 000012 composite index.
No frontend changes.
|
||
|
|
387fb555ac |
security: scope revocation unique index to (issuer_id, serial_number) (fixes H-1)
RFC 5280 §5.2.3 defines certificate serial number uniqueness per issuing CA,
not globally. The prior unique index on `certificate_revocations.serial_number`
enforced a stricter invariant than the spec: with 12 issuer connectors (Local
CA, ACME, Vault, step-ca, OpenSSL, DigiCert, Sectigo, Google CAS, AWS ACM PCA,
Entrust, GlobalSign, EJBCA), two distinct certificates legitimately issued by
different CAs can share a serial number. Recording a revocation for the second
collision silently dropped via `ON CONFLICT DO NOTHING`, leaving the second
cert persistently absent from OCSP/CRL responses.
Changes:
- Migration 000012 drops `idx_certificate_revocations_serial` and creates
`idx_certificate_revocations_issuer_serial` UNIQUE ON (issuer_id,
serial_number). Adds a non-unique `idx_certificate_revocations_serial_lookup`
to preserve the serial-only fast path for OCSP/CRL probes that already know
the issuer scope.
- `CertificateRevocationRepository.Create` targets the new composite key in
`ON CONFLICT` — same-issuer idempotency preserved, cross-issuer collisions
now recorded as distinct rows.
- `GetBySerial(serial)` renamed `GetByIssuerAndSerial(issuerID, serial)` on
the interface and Postgres impl. All callers (OCSP responder, CRL
generator, short-lived-cert exemption check) already have `issuerID` in
scope because the protocol paths carry it (`/api/v1/ocsp/{issuer_id}/{serial}`,
`/api/v1/crl/{issuer_id}`).
- Repository integration test added: `TestRevocationRepository_CrossIssuerSerialCollision`
asserts that serial `CAFEBABE01` can be stored under two issuers
simultaneously, that lookups return the correct row per (issuer, serial),
and that same-issuer idempotency still works (re-inserting (issuer, serial)
does not error and does not duplicate).
- Existing tests and service/integration mocks updated for the rename.
Wire-format invariants preserved: CRL DER bytes, OCSP response bytes, and
AES-256-GCM config encryption are unaffected — this change touches only
revocation-record uniqueness scope.
CWE-664.
|
||
|
|
5d98e373e3 |
feat: M15a — certificate revocation API, CRL endpoint, and revocation notifications
Implements core revocation infrastructure: POST /api/v1/certificates/{id}/revoke
with all 8 RFC 5280 reason codes, JSON-formatted CRL at GET /api/v1/crl, webhook
and email revocation notifications, best-effort issuer notification, and immutable
revocation audit trail. Includes 48 new tests across service, handler, integration,
and domain layers (600+ total). Fixes 3 pre-existing test bugs (team_test error
matching, agent_group delete status code, team handler per_page validation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|