mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:21:31 +00:00
02438ad9e1
Twelve findings from the architecture diligence audit's Phase 3 bundle
closed in one PR. All touch the CI workflows + small doc-drift fixes
across the production Go tree + migration headers.
CI workflow changes
====================
TEST-H1 — Race detection on ./... -short
.github/workflows/ci.yml:106 was a 9-package explicit list. Audit
finding TEST-H1 flagged that 25+ packages (internal/auth/*,
internal/repository/*, internal/mcp, internal/scep, internal/pkcs7,
internal/api/router, internal/api/acme, internal/cli, internal/cms,
internal/config, internal/deploy, internal/integration,
internal/ratelimit, internal/secret, internal/trustanchor, all of
cmd/) silently dropped off race coverage.
Post-fix: 'go test -race -short ./... -count=1 -timeout 600s'.
76 testing.Short() guards already cover testcontainers + live-DB
integration suites, so -short keeps the long-running tests out.
TEST-H2 — Cross-platform build matrix
New 'cross-platform-build' job in ci.yml. Matrix:
ubuntu-latest + windows-latest + macos-latest, fail-fast: false.
Builds cmd/server + cmd/agent + cmd/cli + cmd/mcp-server on each.
Catches Windows-specific regressions (path separators, file
permissions, exec.Command semantics) the pre-Phase-3 Ubuntu-only
CI missed.
TEST-L1 — actions/setup-go cache: true (explicit)
setup-go v5 defaults cache: true; making it explicit so a future
setup-go upgrade can't silently flip it. Re-runs hit the Go module
+ build cache instead of recompiling cold.
TEST-M1 — Mutation-testing floor at 55%
security-deep-scan.yml::go-mutesting step rewritten. Removed
continue-on-error + per-package '|| true'. New post-loop check
extracts every 'The mutation score is X.YZ' line and fails the
step if any package drops below 0.55. Floor rationale: starter
ratio catches major regressions without rejecting the audit's
'this is OK' steady state; raise quarterly.
TEST-M2 — 3 advisory deep-scan gates promoted to blocking
Removed continue-on-error: true from:
- gosec (filtered to G201/G202/G304/G108 high-signal rules:
SQL-injection + path-traversal + pprof-exposed)
- osv-scanner (multi-ecosystem CVE; complements govulncheck
which is already blocking in ci.yml)
- trivy image scan (--severity HIGH,CRITICAL --exit-code 1)
continue-on-error count: 15 → 11.
ZAP / schemathesis / nuclei / testssl stay advisory because their
false-positive rates on https://localhost:8443-targeted DAST runs
are high.
TEST-M3 — Playwright harness stub
web/package.json adds '@playwright/test' devDep + 'e2e' / 'e2e:install'
npm scripts. web/playwright.config.ts ships single chromium project
with webServer block pointing at 'npm run dev'. web/src/__tests__/
e2e/smoke.spec.ts proves the harness wires through. The full 15-flow
suite ships in frontend-design-audit Phase 8 (TEST-H1 in THAT audit);
this is the wiring + a single smoke test as the regression floor.
New Makefile target: 'make e2e-test'.
Doc/code drift fixes
====================
TEST-M4 + ARCH-L2 — Skip inventory artifact + CI guard
scripts/skip-inventory.sh walks every t.Skip site under cmd/ +
internal/ + deploy/test/ and emits docs/testing/skip-inventory.md
grouped by package with file:line:expression triples. Current
inventory: 142 t.Skip sites, 76 testing.Short() guards.
scripts/ci-guards/skip-inventory-drift.sh regenerates and fails on
diff (excluding the 'Last reviewed' timestamp line which drifts
daily). The Markdown is the canonical acquisition-diligence artifact
for 'what tests are being skipped and why.'
ARCH-H3 — MCP catalogue floor reconciliation
Audit framing was '121 vs floor 150 — doc/code drift.' Live count
via the test's actual regex over all 5 tool files (tools.go +
tools_audit_fix.go + tools_auth.go + tools_auth_bundle2.go +
tools_est.go): 155 unique 'Name: "certctl_*"' declarations.
Pre-Phase-3 audit measured tools.go in isolation (121) and missed
the other 4 files (+34 unique names). The test at
internal/ciparity/surface_parity_test.go::TestSurfaceParity_MCP
passes today (155 ≥ 150). Added a clarifying comment near
mcpBaselineFloor explaining the measurement scope so future
reviewers don't repeat the audit's framing error.
STATUS: stale — no code drift, just a measurement scoping error in
the audit.
ARCH-L1 — panic() rationale comments
5 panic sites in production Go (excluding _test.go):
- internal/repository/postgres/tx.go:84
- internal/service/issuer.go:861 (mustJSON)
- internal/service/est.go:728 (mustParseTime)
- internal/service/acme.go:1288 (rand source failure — already documented)
- internal/pkcs7/certrep.go:270 (OID marshal — already documented)
Added ARCH-L1 rationale comments to the 3 sites that didn't have
them. All 5 are defensible impossible-path / rethrow / hardcoded-
constant guards.
ARCH-L3 — Migration IF-NOT-EXISTS carve-outs
4 migrations skip the literal 'IF NOT EXISTS' token but ARE
idempotent via different Postgres patterns:
- 000014_policy_violation_severity_check.up.sql: ALTER TABLE
ADD CONSTRAINT CHECK doesn't accept IF NOT EXISTS; idempotency
via DROP CONSTRAINT IF EXISTS preamble.
- 000018_audit_events_worm.up.sql: CREATE OR REPLACE FUNCTION
+ DROP TRIGGER IF EXISTS + CREATE TRIGGER + DO $$ pg_roles
existence check. CREATE TRIGGER doesn't take IF NOT EXISTS.
- 000030_rbac_admin_perms.up.sql: INSERT ... ON CONFLICT DO NOTHING.
- 000039_audit_crit1_perms.up.sql: same INSERT + ON CONFLICT pattern.
Added ARCH-L3 header comments to each explaining the carve-out so
reviewers don't flag the missing literal token.
STATUS: largely stale — migrations are already idempotent.
ARCH-L4 — TODO/FIXME → see #<descriptor>
5 TODOs rewritten to the allowed 'see #<descriptor>' pattern:
- internal/repository/postgres/auth.go:220 → see #bundle-2-scope-fk
- internal/connector/discovery/gcpsm/gcpsm.go:547 → see #gcpsm-pagination
- internal/service/audit.go:244 → see #audit-pagination-count
- internal/service/job.go:295, 299 → see #validation-job-impl
New CI guard scripts/ci-guards/no-todo-in-prod.sh grep-fails any
new TODO/FIXME in cmd/ + internal/ (excluding _test.go); allows
'see #N' / 'see #<descriptor>' patterns.
Sandbox limitation
==================
The 6.1 GB certctl working tree fills the sandbox volume; go1.25.10
toolchain download fails with 'no space left on device' (sandbox has
1.25.9; go.mod requires 1.25.10). Local 'go test' / 'go build' NOT
run in this commit. Operator must run 'make verify' on their
workstation before push per CLAUDE.md operating rules.
The smoke.spec.ts NOT executed in the sandbox (no chromium installed).
Operator runs 'cd web && npm install && npx playwright install
--with-deps chromium && npm run e2e' on first wire-up.
All CI guards (no-todo-in-prod, skip-inventory-drift, G-3
env-docs-drift, doc-rot-detector, and every existing guard) verified
clean by running each individually.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-TEST-H1,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-H2,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-M1,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-M2,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-M3,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-M4,
cowork/certctl-architecture-diligence-audit.html#fix-TEST-L1,
cowork/certctl-architecture-diligence-audit.html#fix-ARCH-H3,
cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L1,
cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L2,
cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L3,
cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L4
312 lines
11 KiB
Go
312 lines
11 KiB
Go
package service
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"time"
|
|
|
|
"github.com/certctl-io/certctl/internal/domain"
|
|
"github.com/certctl-io/certctl/internal/repository"
|
|
)
|
|
|
|
// AuditService provides business logic for recording and retrieving audit events.
|
|
type AuditService struct {
|
|
auditRepo repository.AuditRepository
|
|
}
|
|
|
|
// NewAuditService creates a new audit service.
|
|
func NewAuditService(auditRepo repository.AuditRepository) *AuditService {
|
|
return &AuditService{
|
|
auditRepo: auditRepo,
|
|
}
|
|
}
|
|
|
|
// RecordEvent records an audit event with actor, action, and resource information.
|
|
//
|
|
// Bundle-6 / Audit H-008 + M-022 / CWE-532: every details map flows through
|
|
// RedactDetailsForAudit BEFORE marshaling. The redactor scrubs credential
|
|
// keys (api_key, password, token, *_pem, eab_secret, ...) and PII keys
|
|
// (email, phone, ssn, name, address, ip_address, ...) and surfaces a
|
|
// `redacted_keys` array so operators can audit the redactor itself during
|
|
// a compliance review. See internal/service/audit_redact.go.
|
|
func (s *AuditService) RecordEvent(ctx context.Context, actor string, actorType domain.ActorType, action string, resourceType string, resourceID string, details map[string]interface{}) error {
|
|
return s.RecordEventWithCategory(ctx, actor, actorType, action, "", resourceType, resourceID, details)
|
|
}
|
|
|
|
// RecordEventWithCategory is the Bundle 1 Phase 8 categorized variant
|
|
// of RecordEvent. eventCategory is one of
|
|
// domain.EventCategoryCertLifecycle, domain.EventCategoryAuth,
|
|
// domain.EventCategoryConfig — empty defaults to cert_lifecycle in
|
|
// the persistence layer + DB CHECK constraint.
|
|
//
|
|
// Existing 90+ call sites that don't yet pass a category route
|
|
// through the legacy RecordEvent and inherit the cert_lifecycle
|
|
// default; new callers (auth handlers, bootstrap, config-mutation
|
|
// handlers) call this method directly with their explicit category.
|
|
// Both paths share the same redaction + marshaling contract.
|
|
func (s *AuditService) RecordEventWithCategory(ctx context.Context, actor string, actorType domain.ActorType, action, eventCategory, resourceType, resourceID string, details map[string]interface{}) error {
|
|
redacted := RedactDetailsForAudit(details)
|
|
detailsJSON, err := json.Marshal(redacted)
|
|
if err != nil {
|
|
detailsJSON = []byte("{}")
|
|
}
|
|
|
|
event := &domain.AuditEvent{
|
|
ID: generateID("audit"),
|
|
Timestamp: time.Now(),
|
|
Actor: actor,
|
|
ActorType: actorType,
|
|
Action: action,
|
|
ResourceType: resourceType,
|
|
ResourceID: resourceID,
|
|
Details: json.RawMessage(detailsJSON),
|
|
EventCategory: eventCategory,
|
|
}
|
|
|
|
if err := s.auditRepo.Create(ctx, event); err != nil {
|
|
return fmt.Errorf("failed to record audit event: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// RecordEventWithTx records an audit event using the supplied repository.Querier.
|
|
//
|
|
// Pass *sql.Tx (typically obtained from postgres.WithinTx) to participate in
|
|
// a caller's transaction so the audit row is atomic with the operation that
|
|
// triggered it. Closes the #3 acquisition-readiness blocker from the
|
|
// 2026-05-01 issuer coverage audit (audit row not transactional with the
|
|
// operation it audits).
|
|
//
|
|
// Same redaction + marshalling contract as RecordEvent; only the database
|
|
// handle changes.
|
|
func (s *AuditService) RecordEventWithTx(ctx context.Context, q repository.Querier, actor string, actorType domain.ActorType, action string, resourceType string, resourceID string, details map[string]interface{}) error {
|
|
redacted := RedactDetailsForAudit(details)
|
|
detailsJSON, err := json.Marshal(redacted)
|
|
if err != nil {
|
|
detailsJSON = []byte("{}")
|
|
}
|
|
|
|
event := &domain.AuditEvent{
|
|
ID: generateID("audit"),
|
|
Timestamp: time.Now(),
|
|
Actor: actor,
|
|
ActorType: actorType,
|
|
Action: action,
|
|
ResourceType: resourceType,
|
|
ResourceID: resourceID,
|
|
Details: json.RawMessage(detailsJSON),
|
|
}
|
|
|
|
if err := s.auditRepo.CreateWithTx(ctx, q, event); err != nil {
|
|
return fmt.Errorf("failed to record audit event: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// RecordEventWithCategoryWithTx records a categorized audit event using
|
|
// the supplied repository.Querier so the row is committed in the same
|
|
// transaction as the underlying action. Mirrors RecordEventWithCategory
|
|
// but takes the Querier (typically *sql.Tx from postgres.WithinTx).
|
|
//
|
|
// Audit 2026-05-10 HIGH-6 closure — closes the gap where Bundle-1+2
|
|
// auth-mutation paths emitted the audit row via a separate, non-
|
|
// transactional connection. A DB hiccup or connection reset between
|
|
// the action and the audit-row INSERT used to leave the action
|
|
// committed with no audit trail (CWE-778). With this method, the
|
|
// audit row participates in the action's transaction: rollback on
|
|
// any failure removes both the action row AND any audit row that the
|
|
// caller wrote inside the tx.
|
|
func (s *AuditService) RecordEventWithCategoryWithTx(ctx context.Context, q repository.Querier, actor string, actorType domain.ActorType, action, eventCategory, resourceType, resourceID string, details map[string]interface{}) error {
|
|
redacted := RedactDetailsForAudit(details)
|
|
detailsJSON, err := json.Marshal(redacted)
|
|
if err != nil {
|
|
detailsJSON = []byte("{}")
|
|
}
|
|
|
|
event := &domain.AuditEvent{
|
|
ID: generateID("audit"),
|
|
Timestamp: time.Now(),
|
|
Actor: actor,
|
|
ActorType: actorType,
|
|
Action: action,
|
|
ResourceType: resourceType,
|
|
ResourceID: resourceID,
|
|
Details: json.RawMessage(detailsJSON),
|
|
EventCategory: eventCategory,
|
|
}
|
|
|
|
if err := s.auditRepo.CreateWithTx(ctx, q, event); err != nil {
|
|
return fmt.Errorf("failed to record audit event: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
// List returns audit events matching filter criteria.
|
|
func (s *AuditService) List(ctx context.Context, filter *repository.AuditFilter) ([]*domain.AuditEvent, error) {
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to list audit events: %w", err)
|
|
}
|
|
return events, nil
|
|
}
|
|
|
|
// ListByResource returns all audit events for a specific resource.
|
|
func (s *AuditService) ListByResource(ctx context.Context, resourceType string, resourceID string) ([]*domain.AuditEvent, error) {
|
|
filter := &repository.AuditFilter{
|
|
ResourceType: resourceType,
|
|
ResourceID: resourceID,
|
|
PerPage: 1000, // reasonable default for single resource
|
|
}
|
|
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to list audit events: %w", err)
|
|
}
|
|
return events, nil
|
|
}
|
|
|
|
// ListByActor returns all audit events for a specific actor.
|
|
func (s *AuditService) ListByActor(ctx context.Context, actor string) ([]*domain.AuditEvent, error) {
|
|
filter := &repository.AuditFilter{
|
|
Actor: actor,
|
|
PerPage: 1000,
|
|
}
|
|
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to list audit events: %w", err)
|
|
}
|
|
return events, nil
|
|
}
|
|
|
|
// ListByAction returns all audit events for a specific action type.
|
|
func (s *AuditService) ListByAction(ctx context.Context, action string, from, to time.Time) ([]*domain.AuditEvent, error) {
|
|
filter := &repository.AuditFilter{
|
|
From: from,
|
|
To: to,
|
|
PerPage: 1000,
|
|
}
|
|
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to list audit events: %w", err)
|
|
}
|
|
|
|
// Filter by action on client side (repository may not filter by action directly)
|
|
var filtered []*domain.AuditEvent
|
|
for _, e := range events {
|
|
if e.Action == action {
|
|
filtered = append(filtered, e)
|
|
}
|
|
}
|
|
|
|
return filtered, nil
|
|
}
|
|
|
|
// ListAuditEvents returns paginated audit events (handler interface method).
|
|
func (s *AuditService) ListAuditEvents(ctx context.Context, page, perPage int) ([]domain.AuditEvent, int64, error) {
|
|
return s.ListAuditEventsByCategory(ctx, "", page, perPage)
|
|
}
|
|
|
|
// ListAuditEventsByCategory is the Bundle 1 Phase 8 categorized variant.
|
|
// Empty eventCategory disables the filter.
|
|
func (s *AuditService) ListAuditEventsByCategory(ctx context.Context, eventCategory string, page, perPage int) ([]domain.AuditEvent, int64, error) {
|
|
if page < 1 {
|
|
page = 1
|
|
}
|
|
if perPage < 1 {
|
|
perPage = 50
|
|
}
|
|
|
|
filter := &repository.AuditFilter{
|
|
EventCategory: eventCategory,
|
|
Page: page,
|
|
PerPage: perPage,
|
|
}
|
|
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, 0, fmt.Errorf("failed to list audit events: %w", err)
|
|
}
|
|
|
|
// Convert pointers to values for the handler interface
|
|
var result []domain.AuditEvent
|
|
for _, e := range events {
|
|
if e != nil {
|
|
result = append(result, *e)
|
|
}
|
|
}
|
|
|
|
// see #audit-pagination-count — the repository currently returns
|
|
// the full filtered slice and we surface len(result) as total. This
|
|
// works for the audit page's current shape (server-side filter +
|
|
// client-side pagination over a bounded window) but is wrong when the
|
|
// frontend ports to server-side cursoring (Phase 9 P-H2). At that
|
|
// point the repository must add a CountAuditEvents(filter) method and
|
|
// this line becomes total, _ := s.repo.CountAuditEvents(ctx, filter).
|
|
total := int64(len(result))
|
|
|
|
return result, total, nil
|
|
}
|
|
|
|
// ExportEventsByFilter returns audit events matching a date-range +
|
|
// optional category filter without pagination — the export handler
|
|
// uses this to stream NDJSON for compliance evidence collection.
|
|
//
|
|
// Audit 2026-05-10 HIGH-11 closure: pre-fix, the `audit.export`
|
|
// permission was seeded into r-admin and r-auditor (migration 000031)
|
|
// but no endpoint enforced it — misleading capability advertisement.
|
|
// This method is the service-layer building block for the new
|
|
// GET /api/v1/audit/export endpoint.
|
|
//
|
|
// Bounded callers: the handler enforces a max 90-day range + max-rows
|
|
// cap before invoking this; the service-layer method itself is
|
|
// permissive so future callers (compliance-job runner, MCP tool) can
|
|
// reuse the helper without duplicating the bound enforcement.
|
|
func (s *AuditService) ExportEventsByFilter(ctx context.Context, from, to time.Time, eventCategory string, maxRows int) ([]domain.AuditEvent, error) {
|
|
if maxRows <= 0 {
|
|
maxRows = 50000
|
|
}
|
|
filter := &repository.AuditFilter{
|
|
EventCategory: eventCategory,
|
|
From: from,
|
|
To: to,
|
|
Page: 1,
|
|
PerPage: maxRows,
|
|
}
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to list audit events for export: %w", err)
|
|
}
|
|
out := make([]domain.AuditEvent, 0, len(events))
|
|
for _, e := range events {
|
|
if e != nil {
|
|
out = append(out, *e)
|
|
}
|
|
}
|
|
return out, nil
|
|
}
|
|
|
|
// GetAuditEvent returns a single audit event (handler interface method).
|
|
func (s *AuditService) GetAuditEvent(ctx context.Context, id string) (*domain.AuditEvent, error) {
|
|
filter := &repository.AuditFilter{
|
|
ResourceID: id,
|
|
PerPage: 1,
|
|
}
|
|
|
|
events, err := s.auditRepo.List(ctx, filter)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to get audit event: %w", err)
|
|
}
|
|
|
|
if len(events) == 0 {
|
|
return nil, fmt.Errorf("audit event not found")
|
|
}
|
|
|
|
return events[0], nil
|
|
}
|