Files
certctl/SECURITY_REMEDIATION.md
T
shankar0123 305c7dc851 docs: update project documentation to reflect security remediation
Update README, architecture guide, and feature inventory to document all
changes from the security remediation pass (17 tickets):

- README: Add CI pipeline section (race detection, golangci-lint,
  govulncheck, per-layer coverage thresholds), CORS deny-by-default
  behavior, input validation, SSRF protection, scheduler concurrency
  safety. Update test count to 1050+. Add race detection and govulncheck
  to development commands.

- Architecture guide: Update testing strategy with scheduler tests, fuzz
  tests, and revised CI pipeline description. Add security model sections
  for input validation, CORS, and concurrency safety. Update test count.

- Feature inventory: Document CORS deny-by-default behavior.

- SECURITY_REMEDIATION.md: New file documenting all 17 remediated tickets
  with CWE classifications, before/after behavior, 3 deferred tickets
  with rationale, CI pipeline changes, and breaking CORS change.

Missing docs flagged as future additions:
- Formal threat model document
- Disaster recovery runbook
- Version upgrade guide
- Capacity planning benchmarks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 21:50:51 -04:00

12 KiB

Security Remediation Changelog

Comprehensive security audit and remediation performed March 2026 against the certctl V2 codebase. This document tracks every change, the vulnerability addressed, CWE classification, and before/after behavior.

Summary

  • Tickets remediated: 17 of 20
  • Tickets deferred: 3 (TICKET-003, TICKET-007, TICKET-010)
  • New tests added: 100+
  • CWE classes addressed: 11 distinct CWE categories

Remediated Tickets

TICKET-001: Shell Command Injection in Connector Scripts (CRITICAL)

  • CWE: CWE-78 (OS Command Injection)
  • Severity: CRITICAL
  • Files created: internal/validation/command.go, internal/validation/command_test.go
  • What changed: New ValidateShellCommand() function blocks all shell metacharacters (;|&$\(){}><"'\n\r\x00). ValidateDomainName()enforces RFC 1123 compliance.ValidateACMEToken()restricts to base64url characters.SanitizeForShell()` provides defense-in-depth single-quote wrapping.
  • Before: OpenSSL and ACME connectors passed user-controlled strings directly to shell commands.
  • After: All shell-facing inputs validated against strict character whitelists; 80+ adversarial test cases.

TICKET-002: Scheduler Race Conditions and Ungraceful Shutdown (CRITICAL)

  • CWE: CWE-362 (Race Condition), CWE-404 (Improper Resource Shutdown)
  • Severity: CRITICAL
  • Files modified: internal/scheduler/scheduler.go, cmd/server/main.go
  • Files created: internal/scheduler/scheduler_test.go
  • What changed: Added sync/atomic.Bool idempotency guards on all 6 scheduler loops — if a loop tick fires while the previous iteration is still running, it logs a warning and skips. Added sync.WaitGroup for in-flight work tracking. New WaitForCompletion(timeout) method blocks until all goroutines finish or timeout expires. Server main wires sched.WaitForCompletion(30*time.Second) before database close.
  • Before: Concurrent scheduler ticks could produce duplicate jobs; os.Exit during in-flight work could corrupt state.
  • After: Each loop runs at most one concurrent iteration; graceful shutdown waits up to 30s for in-flight work.

TICKET-004: CORS Misconfiguration — Wildcard Allowed by Default (HIGH)

  • CWE: CWE-942 (Overly Permissive CORS Policy)
  • Severity: HIGH
  • Files modified: internal/api/middleware/middleware.go
  • Files created: internal/api/middleware/cors_test.go
  • What changed: Empty CERTCTL_CORS_ORIGINS now denies all cross-origin requests (no CORS headers set). Previously, an empty config implicitly allowed all origins.
  • Before: Deploying without setting CERTCTL_CORS_ORIGINS left the API open to cross-origin requests from any domain.
  • After: Deny-by-default. Operators must explicitly configure allowed origins. 9 test cases cover deny-default, specific origins, wildcard, and preflight behavior.

TICKET-005: No Race Detection in CI (HIGH)

  • CWE: CWE-362 (Race Condition)
  • Severity: HIGH
  • Files modified: .github/workflows/ci.yml
  • What changed: Added go test -race step targeting service, handler, middleware, and scheduler packages with -count=1 -timeout 300s.
  • Before: Data races could ship undetected.
  • After: Every CI run catches races with Go's built-in race detector.

TICKET-006: 18-Positional-Parameter Function Signature (MEDIUM)

  • CWE: CWE-1078 (Inappropriate Source Code Style)
  • Severity: MEDIUM
  • Files modified: internal/api/router/router.go, cmd/server/main.go, internal/integration/lifecycle_test.go, internal/integration/negative_test.go
  • What changed: Replaced RegisterHandlers(18 positional params) with HandlerRegistry struct containing 18 named fields. All call sites updated to use struct literal initialization.
  • Before: Adding or reordering a handler required updating every call site; parameter order bugs were easy to introduce.
  • After: Named fields make call sites self-documenting; new handlers added without breaking existing code.

TICKET-008: No Static Analysis in CI (MEDIUM)

  • CWE: CWE-1078 (Inappropriate Source Code Style)
  • Severity: MEDIUM
  • Files modified: .github/workflows/ci.yml
  • Files created: .golangci.yml
  • What changed: Added golangci-lint (11 linters: errcheck, govet, staticcheck, unused, gosimple, ineffassign, typecheck, gocritic, gosec, bodyclose, noctx) and govulncheck steps to CI pipeline.
  • Before: Code quality and known CVEs checked only manually.
  • After: Every push and PR runs static analysis and vulnerability scanning.

TICKET-009: Missing HTTP Client Timeouts in Notifier Connectors (HIGH)

  • CWE: CWE-400 (Uncontrolled Resource Consumption)
  • Severity: HIGH
  • Files modified: Slack, Teams, PagerDuty, OpsGenie connector files
  • What changed: All notifier HTTP clients now use explicit Timeout: 10 * time.Second instead of http.DefaultClient (no timeout).
  • Before: A hung webhook endpoint could block a notifier goroutine indefinitely.
  • After: All outbound HTTP calls timeout after 10 seconds.

TICKET-012: Context Propagation — context.Background() Misuse (MEDIUM)

  • CWE: CWE-755 (Improper Handling of Exceptional Conditions)
  • Severity: MEDIUM
  • Files modified: Multiple service files
  • What changed: Replaced context.Background() usage in request-handling code with proper ctx propagation from the incoming HTTP request.
  • Before: Cancellation signals (client disconnect, shutdown) were not propagated to downstream operations.
  • After: Request context flows through service calls, enabling proper cancellation and timeout propagation.

TICKET-013: SSRF in Network Scanner — No Reserved IP Filtering (HIGH)

  • CWE: CWE-918 (Server-Side Request Forgery)
  • Severity: HIGH
  • Files modified: internal/service/network_scan.go
  • What changed: Added isReservedIP() function that filters loopback (127.0.0.0/8), link-local (169.254.0.0/16 — includes cloud metadata at 169.254.169.254), multicast (224.0.0.0/4), and broadcast (255.255.255.255) addresses. RFC 1918 private ranges explicitly allowed since certctl is self-hosted for internal networks.
  • Before: Network scanner could probe cloud metadata endpoints (169.254.169.254) or loopback services.
  • After: Reserved ranges filtered before CIDR expansion; private ranges preserved for legitimate internal scanning.

TICKET-014: Agent Verify Tests Generate Invalid Certificates (MEDIUM)

  • CWE: CWE-1060 (Excessive Runtime Resource Consumption)
  • Severity: MEDIUM
  • Files modified: cmd/agent/verify_test.go
  • What changed: generateTestCert() now creates valid self-signed ECDSA P-256 certificates with proper serial numbers, validity periods, and key usage.
  • Before: Tests used invalid certificate stubs that could mask real parsing bugs.
  • After: Tests exercise real X.509 certificate parsing paths.

TICKET-015: Flaky Async Audit Tests Using time.Sleep (MEDIUM)

  • CWE: CWE-362 (Race Condition in Tests)
  • Severity: MEDIUM
  • Files modified: internal/api/middleware/audit_test.go
  • What changed: Replaced time.Sleep(50ms) with waitableAuditRecorder that uses channel-based synchronization. Tests block on a channel until the async audit goroutine completes.
  • Before: Tests depended on wall-clock timing; could flake under load.
  • After: Deterministic synchronization; no timing dependency.

TICKET-016: Undocumented InsecureSkipVerify: true Usage (LOW)

  • CWE: CWE-295 (Improper Certificate Validation)
  • Severity: LOW
  • Files modified: cmd/agent/verify.go, internal/service/network_scan.go
  • What changed: Added detailed security comments explaining why InsecureSkipVerify: true is intentional and scoped: it's required for discovery/verification probing of all certificates (including self-signed, expired, internal CA) and is never used for control-plane API calls or issuer communication.
  • Before: InsecureSkipVerify appeared without explanation, creating audit findings.
  • After: Each usage site documents the security rationale with ticket references.

TICKET-017: Coverage Thresholds Too Low (MEDIUM)

  • CWE: CWE-1078 (Inappropriate Source Code Style)
  • Severity: MEDIUM
  • Files modified: .github/workflows/ci.yml
  • What changed: Raised CI coverage thresholds: service 30% → 60%, handler 50% → 60%. Added new layers: domain 40%, middleware 50%.
  • Before: Tests could degrade significantly before CI flagged it.
  • After: Per-layer coverage floors prevent regression in any single layer.

TICKET-018: No Fuzz Testing (LOW)

  • CWE: CWE-20 (Improper Input Validation)
  • Severity: LOW
  • Files created: internal/validation/command_fuzz_test.go, internal/domain/revocation_fuzz_test.go
  • What changed: Added Go native fuzz tests (testing/fuzz) for command validation functions and revocation domain parsing. Fuzz targets exercise ValidateShellCommand, ValidateDomainName, ValidateACMEToken with random inputs.
  • Before: Input validation tested only with known adversarial inputs.
  • After: Continuous fuzzing can discover edge cases human testers miss.

TICKET-019: Inconsistent Error Wrapping (LOW)

  • CWE: CWE-755 (Improper Handling of Exceptional Conditions)
  • Severity: LOW
  • Files modified: Multiple service and handler files
  • What changed: Standardized on fmt.Errorf("context: %w", err) wrapping pattern throughout the codebase. Ensures errors.Is() and errors.As() work correctly across error chains.
  • Before: Mix of %v (loses error chain) and %w (preserves chain) formatting.
  • After: Consistent %w wrapping enables proper error type checking.

TICKET-020: Missing Godoc on Config Structs (LOW)

  • CWE: CWE-1078 (Inappropriate Source Code Style)
  • Severity: LOW
  • Files modified: internal/config/config.go
  • What changed: Added godoc comments to all fields in all config structs (ServerConfig, KeygenConfig, CAConfig, ACMEConfig, StepCAConfig, OpenSSLConfig, NotifierConfig, ESTConfig, VerificationConfig, DiscoveryConfig, NetworkScanConfig).
  • Before: Configuration semantics discoverable only by reading code or CLAUDE.md.
  • After: go doc and IDE tooltips show purpose, default values, and env var names.

Deferred Tickets

TICKET-003: No Repository Layer Test Scaffolding (HIGH)

  • CWE: CWE-1060 (Excessive Runtime Resource Consumption)
  • Rationale: Requires testcontainers-go infrastructure (Docker-in-Docker) for real PostgreSQL instances. Estimated 2-3 day effort. Scheduled for next sprint.

TICKET-007: CertificateService God Object (MEDIUM)

  • CWE: CWE-1060 (Excessive Runtime Resource Consumption)
  • Rationale: 700+ line service file mixing CRUD, revocation, CRL/OCSP, and deployment logic. Decomposition into RevocationService, CRLService, OCSPService, and DeploymentService is a multi-day refactor with high regression risk. Scheduled for dedicated refactor sprint.

TICKET-010: Missing Request Body Size Limits (MEDIUM)

  • CWE: CWE-400 (Uncontrolled Resource Consumption)
  • Rationale: Requires http.MaxBytesReader integration across all handlers. Lower risk in practice since API key auth limits exposure to authenticated clients. Scheduled for next sprint.

CI Pipeline Changes

The CI pipeline now enforces:

  1. go vet — basic static analysis
  2. go test -race — race detection on service, handler, middleware, scheduler packages
  3. golangci-lint — 11 linters (errcheck, govet, staticcheck, unused, gosimple, ineffassign, typecheck, gocritic, gosec, bodyclose, noctx)
  4. govulncheck — known CVE scanning against Go dependencies
  5. Coverage thresholds — service 60%, handler 60%, domain 40%, middleware 50%
  6. Frontend — TypeScript type check, Vitest tests, Vite production build

Configuration Changes

Breaking Change: CORS Deny-by-Default

Before: Empty CERTCTL_CORS_ORIGINS implicitly allowed all cross-origin requests. After: Empty CERTCTL_CORS_ORIGINS denies all cross-origin requests. Set CERTCTL_CORS_ORIGINS=http://localhost:3000 for development or CERTCTL_CORS_ORIGINS=* to restore previous behavior.

This affects any deployment that relied on the implicit wildcard CORS behavior without explicitly setting the env var.