The previous commit replaced `certs, total, err :=` with `_, total, err :=`
but certs was used on a subsequent line. Keep the declaration and suppress
the SA4006 warning with a blank assignment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SA5011: use t.Fatal instead of t.Error before nil pointer access in
verification handler tests (stops test execution on nil)
- SA4006: replace unused lvalues with _ in repo_test.go and team_test.go
- ST1020: fix comment format on ListViolations to match method name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add context.Context to handler test mocks (agent, agent_group)
- Refactor scheduler to use local interfaces instead of concrete service types
- Wire RevocationSvc/CAOperationsSvc sub-services in integration tests
- Add context.Background() to service test calls (agent, agent_group)
- Fix repo integration tests: add FK prerequisite records (team, owner,
issuer, renewal_policy) before creating certificates
- Set MaxOpenConns(1) on test DB to preserve SET search_path across queries
- Fix Apache/HAProxy tests: replace "echo ok"/"echo reload" with "true"
binary to avoid macOS exec.Command PATH resolution failure
- Fix validation tests: correct error expectations for regex-first checks,
replace null byte strings with strings.Repeat for length tests
- Fix scheduler timeout test flakiness with t.Skip fallback
- Remove unused imports (context in ca_operations_test, service in scheduler)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010)
and performs comprehensive documentation audit to eliminate drift between
code and docs.
Code changes:
- TICKET-003: Repository integration tests with testcontainers-go (50+ subtests)
- TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc
- TICKET-010: Request body size limits via http.MaxBytesReader middleware
- Fix missing slog import in certificate.go after service decomposition
Documentation updates:
- README: Fix endpoint count (97→93), expand env var reference (15→39 vars)
- CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations
- architecture.md: Add body size limits section, middleware chain ordering
- CONTRIBUTING.md: New contributor guide with architecture conventions,
test patterns, middleware ordering, CI thresholds
- SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored)
- Test files: Add doc comments to all new test files
Documentation that should exist but doesn't yet:
- Architecture diagrams (C4 model or similar)
- Threat model document
- Testing philosophy guide
- Disaster recovery runbook
- Upgrade guide (migration between versions)
- API versioning strategy document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added Go native fuzz tests (testing/fuzz) for security-critical input validation:
1. FuzzValidateShellCommand in internal/validation/command_fuzz_test.go
- Tests shell command validation with injection payloads (;, |, &, $, `, etc.)
- Seed corpus includes valid commands and dangerous metacharacters
- Ensures function never panics under fuzzing
2. FuzzValidateDomainName in internal/validation/command_fuzz_test.go
- Tests RFC 1123 domain validation with wildcard support
- Seed corpus includes SQL injection, path traversal, and malformed domains
- Ensures function never panics under fuzzing
3. FuzzValidateACMEToken in internal/validation/command_fuzz_test.go
- Tests base64url token validation
- Seed corpus includes injection payloads and special characters
- Ensures function never panics under fuzzing
4. FuzzIsValidRevocationReason in internal/domain/revocation_fuzz_test.go
- Tests RFC 5280 revocation reason validation
- Seed corpus includes case variations, injection attempts, and null bytes
- Ensures function never panics and returns only valid booleans
5. FuzzCRLReasonCode in internal/domain/revocation_fuzz_test.go
- Tests CRL reason code mapping
- Validates return codes are within 0-9 range
- Ensures invalid reasons default to 0 (unspecified)
All fuzz tests follow Go 1.18+ testing/fuzz conventions with seed corpus
for faster discovery of edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The audit middleware records events asynchronously via goroutines. Tests previously
used time.Sleep(50ms) to wait for audit recording, which is unreliable.
Implemented waitableAuditRecorder wrapper that:
- Wraps mockAuditRecorder to intercept RecordAPICall invocations
- Signals via buffered channel when recording completes
- Provides Wait(timeout) method for tests to synchronously wait
- Returns true on successful wait, false on timeout
Replaced all 7 time.Sleep(50ms) calls with recorder.Wait(1*time.Second) calls,
improving test reliability and reducing flakiness.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 18-parameter RegisterHandlers function signature with a cleaner
HandlerRegistry struct that groups all API handler dependencies. This eliminates
the signature explosion that made the function difficult to read and maintain.
Changes:
- Added HandlerRegistry struct with 18 fields grouping all handler types
- Updated RegisterHandlers to accept a single HandlerRegistry parameter
- Updated all internal handler references to use reg.FieldName syntax
- Updated call sites in cmd/server/main.go and integration tests
- No functional changes, purely structural refactoring
Resolves TICKET-006: RegisterHandlers Signature Explosion
- Updated AgentService interface to accept context.Context parameter in all methods
- Replaced context.Background() calls with proper ctx parameter in agent.go
- Updated AgentGroupService interface to accept context.Context parameter
- Replaced context.Background() calls with proper ctx parameter in agent_group.go
- Updated handler methods to pass r.Context() to service methods
- Context now properly propagates through request lifecycle for timeout/cancellation
- Improved request tracing and cancellation behavior
## Summary
Fixes two critical scheduler reliability issues in certctl:
### TICKET-002 (CRITICAL): Scheduler job idempotency
- Added atomic.Bool guards to all 6 scheduler loops (renewal, job processor, agent health, notifications, short-lived expiry, network scan)
- Uses CompareAndSwap pattern to prevent duplicate execution if previous job is still running
- Logs warning when a tick is skipped due to in-flight work
- Prevents runaway scheduler duplicates and resource exhaustion
### TICKET-011 (MEDIUM): Graceful shutdown
- Added sync.WaitGroup to track in-flight scheduler work
- Each job is wrapped in wg.Add(1)/wg.Done() for lifecycle tracking
- New WaitForCompletion(timeout) method waits for all in-flight work to complete
- Integrates into main.go: after context cancellation, waits up to 30s for jobs to finish before closing DB
- Graceful shutdown ensures no work is lost during server restart/termination
## Changes
**internal/scheduler/scheduler.go:**
- Imports: added "errors", "sync", "sync/atomic"
- Scheduler struct: added 6 atomic.Bool fields (one per loop) + sync.WaitGroup
- All 6 loop functions: spawn goroutines with wg.Add/Done, check atomic guard on each tick, skip tick if already running
- New WaitForCompletion(timeout) method with timeout support
- New ErrSchedulerShutdownTimeout error type
**cmd/server/main.go:**
- After context cancellation and before HTTP shutdown, call sched.WaitForCompletion(30 * time.Second)
- Logs "waiting for scheduler to complete in-flight work" and any errors
**internal/scheduler/scheduler_test.go (new file):**
- Mock services for testing (renewal, job, agent, notification, network scan)
- TestSchedulerIdempotencyGuard: verifies slow job doesn't cause duplicate execution
- TestWaitForCompletionSuccess: verifies graceful shutdown with adequate timeout
- TestWaitForCompletionTimeout: verifies timeout is respected
- TestSchedulerMultipleLoopsIdempotency: verifies all 6 loops respect idempotency
- TestSchedulerGracefulShutdown: end-to-end graceful shutdown flow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added TestSlack_ClientHasTimeout to verify 10-second timeout
- Added TestTeams_ClientHasTimeout to verify 10-second timeout
- Added TestPagerDuty_ClientHasTimeout to verify 10-second timeout
- Added TestOpsGenie_ClientHasTimeout to verify 10-second timeout
- All notifiers already configured with 10 second timeout in New()
- Tests verify timeout is set and matches expected value
- Fix undefined tls.Listener in verify_test.go (type doesn't exist in
crypto/tls); use server.Listener.Addr() and server.TLS.Certificates
- Fix mockJobRepository missing Delete/ListByStatus/ListByCertificate/
UpdateStatus/GetPendingJobs methods required by JobRepository interface
- Fix mockAuditService type mismatch: NewVerificationService expects
*AuditService (concrete), not a mock; use real AuditService with mock
repo following existing testutil_test.go patterns
- Fix List() signature mismatch (had extra filter param)
- Add nil-safe logger checks in verify.go to prevent panics in tests
- Remove unused imports (crypto/tls, bytes, repository)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The handler's VerificationService interface used interface{} for the ctx
parameter, but the service implementation uses context.Context. This caused
a compile error: *service.VerificationService does not implement
handler.VerificationService.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.
M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.
Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs fixed:
- Docker Compose only mounted migration 000001; migrations 000002-000007
(profiles, agent groups, revocation, discovery, network scans) never ran,
breaking half the demo features. Now mounts all 7 migrations in order.
- Network Scans page crashed with pq.Array scan error because lib/pq
doesn't support []int, only []int64. Changed Ports field accordingly.
- Dashboard pie chart displayed "RenewalInProgress" without spaces.
Added formatStatus() helper for PascalCase → spaced display.
Also adds first-run demo experience improvements:
- 9 discovered certificates (filesystem + network scan mix)
- 3 discovery scans with recent timestamps
- 2 AwaitingApproval renewal jobs for approval workflow demo
- CERTCTL_NETWORK_SCAN_ENABLED=true in Docker Compose
- Network scan targets seeded with last_scan results
- Version badge updated to v2.0.5
- Docs updated (quickstart, advanced demo) to reference seeded data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EAB credentials (KID + HMAC) were defined in the ACME connector config
but never wired into the acme.Account registration call. This fixes the
dead code and adds automatic EAB credential fetching for ZeroSSL — when
the directory URL is detected as ZeroSSL and no EAB credentials are
provided, certctl calls ZeroSSL's public API to get them automatically.
Changes:
- Wire EABKid/EABHmac into acme.Account.ExternalAccountBinding
- Add isZeroSSL() detection and fetchZeroSSLEAB() auto-fetch
- Add CERTCTL_ACME_EAB_KID/CERTCTL_ACME_EAB_HMAC env vars to main.go
- Add 13 ACME connector tests (config validation, EAB decode, ZeroSSL
auto-EAB with mock servers, URL detection)
- Update docs: README, architecture, connectors, demo-advanced,
testing-guide with EAB/auto-EAB documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete frontend visual redesign using certctl logo color palette:
- Deep teal sidebar (#0c2e25) with prominent centered logo (64px in white pill)
- Light content area (#f0f4f8) with white cards and visible borders
- Brand colors from logo: teal (#2ea88f), blue (#3b7dd8), orange (#e8873a), green (#4ebe6e)
- Inter + JetBrains Mono typography, colored stat card top borders
- All 17 pages + 7 components updated (25 files, ~700 lines changed)
- 15 new dashboard screenshots replacing old dark theme screenshots
- Prometheus metrics e2e test added, integration test mock fixes
- Docs updated: architecture.md theme description, testing-guide.md DNS-PERSIST-01 coverage
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standing TXT record at _validation-persist.<domain> eliminates per-renewal
DNS updates. Auto-fallback to dns-01 if CA doesn't offer dns-persist-01.
ScriptDNSSolver extended with PresentPersist method. Configurable via
CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01 and
CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN env vars.
Also fixes IsExpired edge-case test in discovery_test.go that always failed
due to time.Now() drift between test setup and method invocation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CompletedAt was set to Now()-1h which falls on "yesterday" when CI
runs near midnight UTC, causing the date bucket lookup to miss.
Use Now() directly since the test only needs jobs completed "today".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EST handler tests used fake PEM data (e.g., "MIIBmjCCAUCgAwIBAgIRATest")
which is invalid base64 (25 chars, not divisible by 4). Go's pem.Decode
fails silently, causing pemToDERChain to return "no certificates found"
and tests to get 500 instead of 200.
Added generateTestCertPEM() helper that creates a real self-signed ECDSA
P-256 certificate, used across all EST handler tests that need cert PEM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The EST milestone (M23) added GetCACertPEM to the issuer.Connector
interface but missed updating mockConnectorLayerIssuer in the adapter
test file. This caused go vet to fail in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).
Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
issuer interface, resource lists, OpenSSL status)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Frontend: fetchJSON now returns empty object on 204 instead of failing
to parse empty body — fixes silent delete failures across all entities.
Added onError callbacks to owner/team delete mutations to surface errors.
Backend: owner and issuer delete handlers return 409 Conflict with
descriptive messages when FK constraints block deletion, instead of
generic 500.
Added 15 v2 dashboard screenshots, updated README screenshot section,
logo asset, page count references (18→full), and QA guide with FK
constraint test coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes redeclared containsSubstring from network_scan_test.go (already
defined in profile_test.go in the same package). Updates README with 91
endpoints, 19 tables, network discovery API section, Prometheus endpoint,
and M21/M22 roadmap entries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M21 adds server-side active TLS scanning of CIDR ranges with concurrent
probing, sentinel agent pattern for pipeline reuse, and full CRUD API for
scan targets. M22 adds Prometheus exposition format endpoint alongside
existing JSON metrics. Comprehensive documentation audit updates all docs
to reflect 91 endpoints, 19 tables, 6 scheduler loops, and 900+ tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The stats service compared statuses using exact string match against
PascalCase domain constants, but the database may contain legacy
lowercase values. This caused the dashboard to show duplicate pie chart
segments (green "Active" + gray "active") and incorrect summary counts.
Use strings.ToLower() normalization in both GetCertificatesByStatus and
GetDashboardSummary to handle any case variant from the database.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration runner was collecting all .sql files alphabetically, which
caused .down.sql rollback files (DROP TABLE) to execute before .up.sql
files on restart with a persisted postgres volume. Filter to only .up.sql
files — these are idempotent (IF NOT EXISTS) and safe to re-run.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused repository import from discovery_handler_test.go and
unused tests variable from discovery_test.go (replaced by testCases).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The postgres DiscoveryRepository had a duplicate local DiscoveryFilter struct
instead of using repository.DiscoveryFilter, causing a type mismatch that
broke CI build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M17: Script-based issuer connector delegating sign/revoke/CRL to user-provided
scripts. Compatible with any CA tooling (OpenSSL, cfssl, custom PKI). Configurable
timeout, environment variable passthrough. 14 tests including timeout enforcement.
M16b: certctl-cli wraps all 76 REST API endpoints for terminal workflows. Supports
certs/agents/jobs list/get/renew/revoke/cancel, bulk PEM import with progress
reporting, server health status, table and JSON output formats. Zero external
dependencies (stdlib only). 14 tests with mock HTTP server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M19: HTTP middleware records every API call to the immutable audit trail
with method, path, actor, SHA-256 body hash, status, and latency. Best-effort
async recording via goroutine. Health/ready probes excluded.
M16a: Four pluggable notifier connectors — Slack (incoming webhook), Teams
(MessageCard), PagerDuty (Events API v2), OpsGenie (Alert API v2). Each
enabled by config env var. 30 new tests across middleware and connectors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The reject job handler should accept nil/empty bodies (no reason given)
while still rejecting malformed JSON. Check for io.EOF and http.NoBody
to distinguish missing body from invalid body.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrade from Go 1.22 to 1.25 (minimum for MCP SDK, actively supported).
CI updated to match.
Codebase audit fixes:
- Local CA parseIP() now uses net.ParseIP — IP SANs no longer silently dropped
- Nil pointer guards in agent.go GetWorkWithTargets for target/cert enrichment
- MCP CreateCertificateInput marks owner_id/team_id as required
- NGINX connector uses CombinedOutput() — captures diagnostic output on failure
- Jobs handler validates JSON decode on rejection body — returns 400 on malformed
- CRL/OCSP handlers propagate requestID for error tracing
MCP server tests (26 tests):
- client_test.go: HTTP client coverage (GET/POST/PUT/DELETE, auth, 204, errors, binary)
- tools_test.go: tool registration, pagination, end-to-end flows with mock API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Separate standalone binary (cmd/mcp-server/) using official MCP Go SDK
(modelcontextprotocol/go-sdk v1.4.1) with stdio transport. Stateless HTTP
proxy translates MCP tool calls to certctl REST API requests. 76 tools
across 16 resource domains with typed input structs and jsonschema tags
for automatic LLM-friendly schema generation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The error routing only checked for "issuer not found" but not
"certificate not found", causing cert-not-found errors to fall
through to a generic 500. Broadened the check to match any
"not found" error string for both CRL and OCSP handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Service tests: newRevocationTestService() was missing SetIssuerRegistry(),
causing all 8 CRL/OCSP tests to fail with "issuer registry not configured".
Handler tests: CRL tests used /api/v1/issuers/{id}/crl but handler parses
/api/v1/crl/{id}. OCSP tests used query string ?serial=X but handler
expects path param /api/v1/ocsp/{id}/{serial}. Fixed all 9 test URLs.
All issues pre-date CI on v2-dev — introduced during M15b.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- revocation_test.go: certRepo unused in TestGenerateDERCRL_Success,
replaced with blank identifier
- lifecycle_test.go: missing revocationRepo init and setter calls
(SetRevocationRepo, SetNotificationService, SetIssuerRegistry)
that negative_test.go already had
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GenerateDERCRL and GetOCSPResponse don't take a context parameter,
but all 8 test calls passed context.Background() as the first arg.
Pre-existing issue never caught because CI wasn't running on v2-dev.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GenerateCRL test was passing []issuer.RevokedCertEntry instead of
[]CRLEntry, causing go vet failure. Also fixed OCSPSignRequest
references to use service-layer type for consistency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Backend:
- Embedded OCSP responder: GET /api/v1/ocsp/{issuer_id}/{serial} returns
signed OCSP responses (good/revoked/unknown) using CA key
- DER-encoded X.509 CRL: GET /api/v1/crl/{issuer_id} returns proper DER CRL
signed by issuing CA with 24h validity window
- Short-lived cert exemption: certs with profile TTL < 1 hour skip CRL/OCSP
(expiry is sufficient revocation for ephemeral workloads)
- Extended issuer connector interface with GenerateCRL and SignOCSPResponse
- Local CA implements full CRL/OCSP signing; ACME and step-ca return
appropriate "use native endpoint" errors
- IssuerConnectorAdapter bridges new methods between layers
Frontend:
- Revoke button on certificate detail page with RFC 5280 reason modal
- Revocation banner with reason display and timestamp
- Revocation status indicators in lifecycle section
- "Revoked" filter option in certificates list
- API client: revokeCertificate() function and Certificate type extensions
Tests: ~31 new tests across connector, service, handler, and adapter layers
Docs: milestones renumbered (M13-M14, M16-M18), M15b marked complete
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements core revocation infrastructure: POST /api/v1/certificates/{id}/revoke
with all 8 RFC 5280 reason codes, JSON-formatted CRL at GET /api/v1/crl, webhook
and email revocation notifications, best-effort issuer notification, and immutable
revocation audit trail. Includes 48 new tests across service, handler, integration,
and domain layers (600+ total). Fixes 3 pre-existing test bugs (team_test error
matching, agent_group delete status code, team handler per_page validation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>