Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.
Schema — migration 000015:
migrations/000015_agent_retire.up.sql flips
deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
boundary instead of quietly destroying targets. Both `agents` and
`deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
TEXT pair (TEXT not VARCHAR so operator comments are never
truncated), indexed via partial indexes WHERE retired_at IS NOT
NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
EXISTS) so repeated runs against partially-migrated databases
converge. migrations/000015_agent_retire.down.sql restores CASCADE
and drops the new columns for clean rollback. A dedicated
repository-layer testcontainers test
(internal/repository/postgres/migration_000015_test.go) asserts the
before/after FK action, column presence, index presence, and
round-trip idempotency under up→down→up.
Domain — sentinel guard + dependency counts:
internal/domain/connector.go gains IsRetired() on Agent, the
exported SentinelAgentIDs slice listing server-scanner,
cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
four reserved IDs documented in CLAUDE.md and created at startup in
cmd/server/main.go), IsSentinelAgent(id string) predicate,
AgentDependencyCounts{ActiveTargets, ActiveCertificates,
PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
ActorTypeSystem enum values used by audit emission downstream.
Coverage locked down by internal/domain/connector_test.go.
Service — 8-step ordered contract:
internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
opts{Force, Reason}) enforces a fixed execution order:
(1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
unconditionally; force=true does NOT bypass it.
(2) fetch — ErrAgentNotFound on miss.
(3) idempotency — if IsRetired() already, return
AgentRetirementResult{AlreadyRetired: true} with no new audit
event and no state change (safe to replay from flaky clients).
(4) preflight counts — collectAgentDependencyCounts runs
ActiveTargets, ActiveCertificates, PendingJobs sequentially
(not in parallel; keeps the per-query timeout predictable and
matches the repo's existing call-chain shape).
(5) force-reason guard — opts.Force=true with empty Reason returns
ErrForceReasonRequired (wired into the 400 status surface).
(6) dependency guard — HasDependencies() with opts.Force=false
returns BlockedByDependenciesError{Counts} (wired into the 409
body with per-bucket counts).
(7) mutation — single pinned retiredAt := time.Now(); agent
retirement first, then cascade target retirement if opts.Force,
all under the repo's single transaction so the two retired_at
stamps match to the second.
(8) best-effort audit — agent_retired always; agent_retirement_
cascaded additionally on the force path. Actor is whatever the
handler resolves from the request; actor type is mapped by
resolveActorType (system/agent-prefix→Agent/else→User). Audit
emission failures are logged via slog.Error but do not abort
the retirement (matches the house convention used by every
other scheduler-emitted event).
BlockedByDependenciesError implements Error() as
"active_targets=%d, active_certificates=%d, pending_jobs=%d" and
Unwrap() → ErrBlockedByDependencies. The single struct satisfies
errors.Is via Unwrap (used by scheduler-level tests) and errors.As
via the concrete type (used by the handler to fish out Counts for
the 409 body). ListRetiredAgents(page, perPage) adds a separate
paginated accessor with page<1→1 and perPage<1→50 normalization so
retired rows are queryable without polluting the default agent
listing.
Sentinel guard coverage is asymmetric by design: all four reserved
IDs are protected, and force=true cannot override. Regression tests
in internal/service/agent_retire_test.go assert each of the eight
steps in order, plus sentinel bypass attempts and idempotency
replay.
Handler + router — status-code surface:
internal/api/handler/agents.go:RetireAgent exposes seven status
codes on DELETE /agents/{id}:
200 on a fresh retirement (body echoes AgentRetirementResult).
204 on idempotent replay (AlreadyRetired=true; no new audit).
400 on ErrForceReasonRequired.
403 on ErrAgentIsSentinel.
404 on ErrAgentNotFound.
409 on BlockedByDependenciesError, with a custom body shape
{error, counts{active_targets, active_certificates,
pending_jobs}} that bypasses the default ErrorWithRequestID
envelope so callers get the per-bucket numbers directly.
500 on any other error.
Heartbeat HandleHeartbeat returns 410 Gone when the agent is
retired (ErrAgentRetired), signalling the agent to shut down.
Query params `force=true` and `reason=<text>` drive the cascade
path; both are forwarded as url.Values through the new MCP
transport.
internal/api/router/router.go registers GET /api/v1/agents/retired
literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
literal-beats-pattern-var precedence routes "retired" to the
paginated retired-agents listing instead of fetching a hypothetical
agent named "retired".
Agent binary — clean shutdown on 410:
cmd/agent/main.go gains the ErrAgentRetired sentinel, a
retiredOnce sync.Once, and a retiredSignal chan struct{}. A
markRetired(source, statusCode, body) helper closes the channel
exactly once; the Run() select loop observes the close and returns
ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
and exits cleanly instead of spinning in the heartbeat retry loop.
The 410 Gone surface is therefore terminal for the agent process.
MCP transport:
internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
a new additive transport method. Client.Delete is path-only; without
this method the retire tool would silently drop `force` and `reason`,
turning every cascade retire into a default soft-retire. The new
method shares do()'s 204 normalization and 4xx/5xx error
propagation so tool authors get one contract.
internal/mcp/tools.go + internal/mcp/types.go expose the
retire_agent tool with Force+Reason inputs wired through
DeleteWithQuery.
CLI:
cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
`agents list --retired` (client-side strip of --retired then
delegation to ListRetiredAgents, sharing --page/--per-page parsing
with the default listing) and `agents retire <id> [--force --reason
"…"]` (mirrors ErrForceReasonRequired — force without reason is
rejected client-side before the request is sent). JSON + table
output modes both honor the new columns.
Frontend:
web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
web/src/api/client.ts + web/src/api/types.ts expose the retire
endpoint and the retired-listing. 4 new Vitest regression cases.
OpenAPI:
api/openapi.yaml documents DELETE /agents/{id} with all seven
status codes, 410 on heartbeat, and the 409 per-bucket body shape.
Regression coverage (six new test files, all green):
internal/service/agent_retire_test.go — 8-step contract + sentinel guards
internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
internal/mcp/retire_agent_test.go — DeleteWithQuery wire-through
internal/cli/agent_retire_test.go — --retired listing + --force/--reason pairing
internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
internal/domain/connector_test.go — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies
Files:
api/openapi.yaml — DELETE + 410 + 409 body shape
cmd/agent/main.go — ErrAgentRetired, markRetired, retiredSignal
cmd/cli/main.go — handleAgents list/get/retire dispatch
docs/architecture.md, docs/concepts.md,
docs/testing-guide.md — retirement contract narrative
internal/api/handler/agents.go — RetireAgent, status surface, 410 on heartbeat
internal/api/handler/agent_handler_test.go — extended coverage
internal/api/handler/agent_retire_handler_test.go — new
internal/api/router/router.go — /agents/retired before /agents/{id}
internal/cli/agent_retire_test.go — new
internal/cli/client.go — ListRetiredAgents + RetireAgent
internal/domain/connector.go — IsRetired, SentinelAgentIDs,
IsSentinelAgent, AgentDependencyCounts,
ActorTypeAgent/System
internal/domain/connector_test.go — new
internal/integration/lifecycle_test.go — retirement fixture
internal/mcp/client.go — DeleteWithQuery additive transport
internal/mcp/retire_agent_test.go — new
internal/mcp/tools.go, internal/mcp/types.go — retire_agent tool + Force/Reason inputs
internal/repository/interfaces.go — AgentRepository retirement methods
internal/repository/postgres/agent.go — retire + cascade target retire + counts
internal/repository/postgres/migration_000015_test.go — new
internal/service/agent.go — wire into AgentService surface
internal/service/agent_retire.go — new 8-step contract
internal/service/agent_retire_test.go — new
internal/service/deployment.go — skip retired agents
internal/service/target.go — skip retired agents
internal/service/testutil_test.go — shared mocks extended
migrations/000015_agent_retire.up.sql — new
migrations/000015_agent_retire.down.sql — new
web/src/api/client.ts, types.ts + tests — retire endpoint wiring
web/src/pages/AgentsPage.tsx — retire UI
Problem:
5c01c7f closed C-001 at the handler boundary by tightening the
ValidateRequired contract on POST /api/v1/certificates to require six
fields: name, common_name, renewal_policy_id, issuer_id, owner_id,
team_id. (Correction re-derived from source: the handler
ValidateRequired calls on owner_id/team_id/renewal_policy_id were
actually installed in 4536147 under M-002/M-003/M-006 auth unification
— 5c01c7f's commit message overstates scope.) Post-audit on
2026-04-18 found three parallel call sites still shipping
three-to-four-field payloads that the newly strict handler would
reject with HTTP 400:
- GUI: OnboardingWizard CertificateStep (common_name + sans +
issuer_id + environment only)
- CLI: certctl-cli import (common_name + issuer_id + status only;
no required-flag gating)
- Tests: deploy/test/qa_test.go Part03 positive paths
Scope:
Bring every POST /api/v1/certificates caller to six-field parity. No
handler changes — the contract is authoritative; the callers must
conform.
Implementation:
GUI — OnboardingWizard CertificateStep expansion:
web/src/pages/OnboardingWizard.tsx adds name/owner_id/team_id/
renewal_policy_id state. React Query hooks for getOwners/
getTeams/getPolicies use per_page: '500' to populate dropdowns
without pagination-driven truncation. Payload ships all six
required fields plus sans/certificate_profile_id/environment.
nextDisabled gate enforces all six before the Continue button
activates.
CLI — ImportCertificates rewrite:
internal/cli/client.go rewrites ImportCertificates with
flag.NewFlagSet("import", flag.ContinueOnError). Required flags:
--owner-id, --team-id, --renewal-policy-id, --issuer-id. Optional:
--name-template (default {cn}, templated via strings.ReplaceAll
against cert.Subject.CommonName), --environment (default
imported). Missing required flags fail pre-HTTP with a clear
error. Request map ships all six required fields plus sans/
environment/status/optional serial_number.
cmd/cli/main.go — usage string updated to document the new
required/optional flags.
Tests — qa_test.go Part03 positive paths:
deploy/test/qa_test.go Part03 Create_Minimal and Create_Full
updated to include all six fields. Uses seed_demo.sql-supplied IDs
(o-alice, t-platform, rp-standard) — docker-compose.demo.yml is
the run context. C-001 explanatory comment added above
Create_Minimal so future readers understand why the minimal
payload is no longer minimal.
MCP parity:
Verified no-op. internal/mcp/types.go:28 CreateCertificateInput
already declares all six fields; internal/mcp/tools.go:102
forwards the typed struct unchanged.
Verification:
Go CLI regression tests (internal/cli/client_test.go):
* TestClient_ImportCertificates_MissingRequiredFlags — 5 subtests,
one per missing required flag, confirms flag.ContinueOnError
rejects with non-nil error before any HTTP call is attempted.
* TestClient_ImportCertificates_MissingPositionalArgs — confirms
the "usage: import <file>" error path when no PEM file is
supplied after the flags.
* TestClient_ImportCertificates_SixFieldPayload — uses httptest
to decode the POST body and assert all six required fields
plus sans/environment are present on the wire.
Frontend regression test (web/src/api/client.test.ts):
'createCertificate accepts and transmits all six required fields'
pins the wire shape for both GUI call sites (OnboardingWizard
CertificateStep + CertificatesPage CreateCertificateModal). If
either UI surface accidentally drops a field, this assertion
fails in CI rather than surfacing as a 400 at runtime.
Grep-based call-site sweep:
Enumerated every POST /api/v1/certificates create caller. Four
total: OnboardingWizard, CertificatesPage, MCP tools, CLI import.
All four now ship six-field payloads. Claim path
(internal/service/discovery.go) updates existing rows and does
not POST. EST/SCEP handlers invoke internal
certService.CreateVersion, not the public API. Negative-path
tests (qa_test.go:1085/1267/1274/1288/1298) remain valid: they
assert 400/non-500 on oversized/malformed/missing-CN/UTF-8/empty
bodies, and these properties still hold under the stricter
handler.
Static gates:
go build ./..., go vet ./..., go test ./internal/cli/..., and
cd web && npm run test deferred to operator pre-push — the Go
toolchain is not available in the session sandbox. Grep-based
verification confirms the syntactic shape of every changed file.
Residual:
None. Every POST /api/v1/certificates call site now conforms to the
six-field contract; the wire shape is pinned by both Go and
TypeScript regression tests.
Commit:
TBD-SHA (audit doc + CLAUDE.md carry TBD-SHA placeholders to be
amended after commit)
Operator decision answered as Option A: JobService.RetryFailedJobs is
now wired into the scheduler as an always-on 10th loop. Prior to this
commit the method was implemented, unit-tested, and exported but had
zero runtime callers — any job that transitioned to status=Failed stayed
Failed forever regardless of how many attempts it had remaining.
Scheduler — 10th loop:
internal/scheduler/scheduler.go grows a jobRetryLoop alongside the
existing nine loops (renewal, jobs, health, notifications, short-lived,
network scan, digest, health check, cloud discovery). The loop follows
the established run-immediately-then-tick pattern (same shape as
jobProcessorLoop), gated by a sync/atomic.Bool idempotency guard and
joined into the scheduler's sync.WaitGroup so WaitForCompletion drains
it on graceful shutdown. Each tick runs under a 2-minute context
timeout mirroring jobProcessorLoop's opCtx budget. The runJobRetry
helper invokes jobService.RetryFailedJobs(ctx, 3) — the advisory
maxRetries cap is belt-and-suspenders; per-job eligibility is still
enforced inside the service via Attempts < MaxAttempts.
The JobServicer scheduler-interface gains RetryFailedJobs so the
scheduler's dependency surface stays explicit and mockable.
Service — audit trail per retry:
internal/service/job.go:RetryFailedJobs now emits an audit event for
every Failed→Pending transition. Following the house convention used
by all scheduler-emitted events, actor='system' and actorType=
domain.ActorTypeSystem; action='job_retry'; details capture
old_status, new_status, attempts, max_attempts. JobService carries an
optional *AuditService (SetAuditService) that nil-guards to preserve
test-wiring ergonomics — existing tests that construct JobService
without an audit service continue to pass unchanged.
Config — env var with sane default:
internal/config/config.go:SchedulerConfig grows RetryInterval, wired
to CERTCTL_SCHEDULER_RETRY_INTERVAL with a 5-minute default. Validate
rejects intervals below 1 second (matches other scheduler interval
validators).
Server wiring:
cmd/server/main.go calls jobService.SetAuditService(auditService)
after JobService construction and sched.SetJobRetryInterval(
cfg.Scheduler.RetryInterval) alongside the other SetXxxInterval calls.
Regression coverage:
internal/service/job_test.go (3 new)
- TestJobService_RetryFailedJobs_EligibleJobTransitionsAndAudits
- TestJobService_RetryFailedJobs_SkipsJobsAtMaxAttempts
- TestJobService_RetryFailedJobs_NoAuditServiceOK
internal/scheduler/scheduler_test.go (3 new)
- TestScheduler_JobRetryLoop_CallsService
- TestScheduler_JobRetryLoop_IdempotencyGuard
- TestScheduler_JobRetryLoop_WaitForCompletion
The service tests assert status transitions, attempt-cap short-
circuiting, and audit event shape (actor='system', action='job_retry',
details keys). The scheduler tests assert the loop invokes the service,
the atomic.Bool guard skips overlapping ticks with the expected
'still running, skipping tick' log, and WaitForCompletion drains the
in-flight tick on Stop.
Residual follow-up (not in scope for this commit):
internal/service/renewal.go:RetryFailedJobs is a parallel dead-code
duplicate of the same logic on RenewalService — untested and has no
runtime caller. The audit finding called this out as 'implemented
twice'. Removing it is a separate cleanup and does not block the
Option-A wiring this commit delivers.
Files:
cmd/server/main.go — SetAuditService + SetJobRetryInterval
internal/config/config.go — RetryInterval field + env + validate
internal/scheduler/scheduler.go — 10th loop, interface, field, setter
internal/scheduler/scheduler_test.go — 3 new scheduler-loop tests
internal/service/job.go — RetryFailedJobs audit emission + SetAuditService
internal/service/job_test.go — 3 new service-layer tests
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
5c01c7f. The work lands as a single commit spanning server, docs, tests,
and the React client.
M-002 — Named API keys with per-key actor propagation
* Migration 000014 adds the 'api_keys' table (id, name, hash,
principal, role, created_at, last_used_at, disabled_at) so every
credential carries an identifiable principal instead of the
opaque 'anonymous'/'api-key' sentinel.
* Auth middleware now rotates through configured keys, performs
constant-time hash comparison, stamps 'last_used_at', and emits
an actor struct via contextWithActor(). The audit middleware,
bulk-revocation handler, approval handlers, and MCP tool layer
now read the principal off the context and persist it on every
audit_events row.
* Regression coverage:
- internal/api/middleware/audit_test.go — actor propagation,
principal redaction for disabled keys, anonymous fallback for
unauthenticated endpoints.
- internal/api/handler/bulk_revocation_handler_test.go,
job_handler_test.go — principal-on-audit assertions.
M-003 — Authorization gates (Phase B)
* Approval handler rejects self-approval / self-rejection with 403
when the actor principal equals the job's requested_by field.
* Bulk revocation is gated behind the 'admin' role; operators and
viewers receive 403.
* Regression coverage:
- internal/service/job_test.go — TestApproveJob_NotSelf,
TestRejectJob_NotSelf.
- internal/api/handler/bulk_revocation_handler_test.go —
TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.
M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
* Per RFC 8615, relying parties cannot reasonably be asked to
authenticate against the issuing certctl instance to retrieve
revocation material. CRL and OCSP move off the authenticated
'/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
GET /.well-known/pki/crl/{issuer_id}
Content-Type: application/pkix-crl (RFC 5280 §5)
GET /.well-known/pki/ocsp/{issuer_id}/{serial}
Content-Type: application/ocsp-response (RFC 6960)
* Non-standard JSON CRL shape is removed; only DER is served.
* Short-lived certificate exemption (profile TTL < 1h → skip
CRL/OCSP) is preserved; the response simply omits the serial.
* Routes are registered on the unauthenticated 'finalHandler' mux
in cmd/server/main.go alongside EST ('/.well-known/est/*') and
SCEP ('/scep'). Legacy authenticated paths return 404.
* Regression coverage:
- internal/api/handler/certificate_handler_test.go — content
type, DER parseability, 404 for unknown issuer.
- internal/api/handler/adversarial_path_test.go — unauthenticated
access asserted for CRL, OCSP, EST, SCEP.
- internal/api/router/router_test.go — route-table assertion
that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
mounted on the unauthenticated branch.
M-001 — Auto-closed by M-002
EST and SCEP were already registered on the unauthenticated
'finalHandler' mux; the router comment at
internal/api/router/router.go:247 now matches reality. The
adversarial-path tests above lock the behavior in.
Verification (all gates green):
* go vet ./... — clean
* go build ./... — ok
* go test -short ./... (55+ packages) — all pass
* web/ : npm test (225 Vitest tests) — all pass
* web/ : npx tsc --noEmit — clean
* grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
all intentional M-006 tombstone/relocation comments.
Documentation:
* coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
Fixed, with per-finding resolution paragraphs citing regression
test IDs. (Audit file lives outside this repo; see cowork root.)
* CLAUDE.md Project Status line updated with the auth-unification
closure note.
* docs/features.md, docs/architecture.md, docs/quickstart.md,
docs/concepts.md, docs/connectors.md, docs/test-env.md,
docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
— refreshed for the new '.well-known/pki/*' namespace and named
API keys.
* api/openapi.yaml — documents the new unauthenticated endpoints
and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.
.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:
(a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
types ('ownership', 'environment', 'lifetime', 'renewal_window')
that the handler validator rejects on Create/Update but that
raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
switch fell through to the default "unknown policy rule type"
error branch on every demo rule × every cert × every cycle,
flooding logs while emitting zero violations.
(b) migrations/seed_demo.sql persisted lowercase severity values
('critical', 'error', 'warning') on policy_violations rows.
INSERT succeeded because that column had no CHECK, but any
frontend comparing against the canonical PolicySeverity enum
mis-categorized every seeded violation.
(c) evaluateRule hardcoded Severity: PolicySeverityWarning on
every emitted violation and ignored rule.Config entirely —
so the D-006 per-rule severity column (000013) and every
per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
required_keys, allowed, lead_time_days, max_days}) was dead
data below the evaluation layer.
This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.
## Changes
Domain (internal/domain/policy.go):
* Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
arm — routing it through RenewalLeadTime would conflate "how
close to expiry before we renew" with "how long can the cert
possibly be", two distinct semantics. The new type accepts
config {"max_days": int} and flags certs whose
NotAfter - NotBefore exceeds the cap.
Handler validator (internal/api/handler/validation.go):
* ValidatePolicyType allowlist grown to 6 canonicals
(AllowedIssuers, AllowedDomains, RequiredMetadata,
AllowedEnvironments, RenewalLeadTime, CertificateLifetime).
OpenAPI (api/openapi.yaml):
* PolicyType enum grown to match domain.
Frontend (web/src/api/types.ts, types.test.ts):
* POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
all 6 canonicals and rejects casing drift.
Migration 000014 (policy_violations severity CHECK):
* Named CHECK constraint (policy_violations_severity_check)
mirroring 000013's allowlist, defense-in-depth at the DB layer
against future drift from bypassed writes (migrations, psql
sessions, future callers). Symmetric down migration drops by
name.
Seed data:
* migrations/seed.sql rewritten to emit TitleCase canonicals with
per-arm config JSON that actually exercises the config-consuming
paths (not the missing-field backstops):
- pr-require-owner → RequiredMetadata {"required_keys":["owner"]} Warning
- pr-allowed-environments → AllowedEnvironments {"allowed":["production","staging","development"]} Error
- pr-max-certificate-lifetime → CertificateLifetime {"max_days":90} Critical
- pr-min-renewal-window → RenewalLeadTime {"lead_time_days":14} Warning
Severities are now differentiated per rule (D-006 intent).
* migrations/seed_demo.sql violation rows flipped to TitleCase
severity ('Critical', 'Error', 'Warning') so migration 000014
applies cleanly on upgrade paths.
Engine rewrite (internal/service/policy.go):
* evaluateRule rewritten. All six arms now:
1. Parse rule.Config into the per-arm typed struct.
2. Bad JSON → log at ValidateCertificate boundary and skip
this rule (no co-located poisoning of other rules in the
same batch).
3. Empty/null Config → emit the pre-D-008 missing-field
violation (backwards compat invariant — operators who
haven't reconfigured still see the same output).
4. Violations emitted carry rule.Severity (no more hardcoded
Warning); D-006 column is now load-bearing.
* CertificateLifetime arm reads NotBefore/NotAfter from the
certificate's latest version via CertRepo. Injected via
PolicyService.SetCertRepo() setter — avoids churning ~36
NewPolicyService call sites while keeping the lifetime arm
optional (degrades to a log+skip if the setter is not wired).
Server wiring (cmd/server/main.go):
* policyService.SetCertRepo(certRepo) wired after construction.
Tests (internal/service/policy_test.go):
* 25 new subtests across 5 groups:
- TestEvaluateRule_SeverityPassThrough (6): every rule type
emits violations carrying rule.Severity, not hardcoded.
- TestEvaluateRule_ConfigConsumed (12): every per-arm Config
path exercised positive + negative.
- TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
Config still emits pre-D-008 missing-field violations.
- TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
and skips cleanly without poisoning neighbors.
- TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
ok when repo wired, log+skip when not, handles missing
NotBefore/NotAfter edges.
Provenance: D-008 surfaced during D-005/D-006 remediation review
in 7a0ea35. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
Audit events spawned from the HTTP middleware ran in detached goroutines
using context.Background(). On SIGTERM the DB pool was closed before
those goroutines finished writing, silently dropping audit events
(CWE-662 Improper Synchronization / CWE-400 Uncontrolled Resource
Consumption).
NewAuditLog now returns an *AuditMiddleware struct that tracks every
spawned goroutine with sync.WaitGroup. Callers wire the middleware via
its Middleware method value (preserves the existing
func(http.Handler) http.Handler shape) and drain the WaitGroup with
Flush(ctx), which blocks until in-flight recordings complete or the
provided context is cancelled — mirroring scheduler.WaitForCompletion.
Flush is invoked in cmd/server/main.go between http.Server.Shutdown
(no new requests accepted) and db.Close (pool torn down), with a
timeout returning ErrAuditFlushTimeout wrapping ctx.Err().
Request-derived inputs (method, path, status) are snapshotted before
the goroutine spawn so the worker does not race with http.Server
reusing r after the handler returns.
Tests:
TestAuditLog_FlushDrainsInFlightGoroutines
TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout
Verification:
go build ./... : 0
go vet ./... : 0
go test -race -short ./... : 0 (all packages)
go test -cover ./internal/api/middleware : 81.4%
golangci-lint run : 0 issues
govulncheck ./... : 0 vulns in called code
Sentinel agents (server-scanner, cloud-aws-sm, cloud-azure-kv,
cloud-gcp-sm) were created on startup with a plain INSERT whose
duplicate-key error was swallowed unconditionally. That silenced every
other DB failure too (connectivity drop, permissions change, unrelated
constraint violation) — a restart after the first boot quietly
de-fanged cloud discovery and the network scanner (CWE-662, CWE-209-
adjacent).
Shape A: add AgentRepository.CreateIfNotExists using ON CONFLICT (id)
DO NOTHING RETURNING id + sql.ErrNoRows discrimination. This keeps the
strict Create semantics (duplicate-key is an error) intact for real
agent registration and gives sentinels their own idempotent path.
- repo: CreateIfNotExists returns (created bool, err error); false,nil
on pre-existing row; false,wrapped err on anything else.
- interface: CreateIfNotExists added to AgentRepository.
- main.go: 4 sentinel sites log Error/Info/Debug distinctly.
- mocks: service + integration mocks implement the new method.
- tests: 4 new testcontainers integration tests cover first-insert,
idempotent second-call, concurrent 16-goroutine race (exactly one
creator, no duplicate-key panic), and pre-cancelled context
surfacing.
Coverage gates (go test -cover): service 67.6%/55, handler 78.6%/60,
domain 92.7%/40, middleware 80.0%/30, crypto 86.7%/85. Race/vet/
golangci-lint v2.11.4 (0 issues)/govulncheck v1.2.0 clean across all
touched packages.
Problem (CWE-306 Missing Authentication for Critical Function):
internal/service/scep.go PKCSReq skipped the shared-secret check when
s.challengePassword was empty. An unconfigured-but-enabled SCEP server
accepted any unauthenticated client reaching /scep and issued a
certificate against the configured issuer for any CSR with a valid
signature. No audit trail distinguished authenticated from
unauthenticated enrollments. This matches the two-layer fail-closed
pattern already used for C-2 (fb4ce1a): reject at startup AND reject
at the service boundary.
Fix (two layers, defense-in-depth):
Layer 1 — startup pre-flight in cmd/server/main.go:
preflightSCEPChallengePassword returns a non-nil error when SCEP is
enabled and CERTCTL_SCEP_CHALLENGE_PASSWORD is empty. main logs and
os.Exit(1)s before the SCEP service is constructed. Disabled SCEP is
unaffected. The helper is unit-testable in isolation.
Layer 2 — service-layer rejection in internal/service/scep.go:
PKCSReq refuses enrollment when s.challengePassword == "" even though
main already blocks this state — protects future call sites (tests,
library reuse, a REST-over-HTTPS wrapper). When a secret is
configured, the comparison now uses crypto/subtle.ConstantTimeCompare
so response time does not leak the configured secret through a
short-circuiting byte compare.
Files:
- cmd/server/main.go: preflightSCEPChallengePassword helper; call site
inside the `if cfg.SCEP.Enabled` block before issuer lookup; fatal
slog error references CWE-306 and names the env var so operators can
diagnose the startup failure without reading code.
- cmd/server/main_test.go: TestPreflightSCEPChallengePassword with five
table-driven subtests (disabled empty, disabled set, enabled empty
rejected, enabled set, single-char boundary). The enabled-empty case
asserts the error string contains both CERTCTL_SCEP_CHALLENGE_PASSWORD
and CWE-306 so the log message remains actionable.
- internal/config/config.go: SCEPConfig.ChallengePassword godoc now
states the field is REQUIRED when SCEP.Enabled and cross-references
preflightSCEPChallengePassword.
- internal/service/scep.go: imports crypto/subtle; PKCSReq rewritten
with the two-layer check; comment block cites H-2 / CWE-306 and the
constant-time rationale.
- internal/service/scep_test.go: existing tests that relied on the
vulnerable empty-password path now configure a secret on both sides.
TestSCEPService_PKCSReq_ChallengePassword_NotRequired is replaced by
TestSCEPService_PKCSReq_ChallengePassword_EmptyServerConfigRejected
which iterates ["", "any-value", "guess"] against an unconfigured
server and asserts "not configured" in the error. A new
TestSCEPService_PKCSReq_ChallengePassword_ConstantTimeLengthIndependence
exercises same-prefix-longer and wrong-case inputs to guard against a
regression from ConstantTimeCompare to a short-circuiting byte compare.
- internal/service/m11c_crypto_enforcement_test.go: four tests
(RejectsWeakKey, AcceptsStrongKey, MaxTTL_ForwardedToIssuer,
NoProfileRepo_PassesThrough) constructed NewSCEPService with an empty
challenge password and exercised PKCSReq through the now-rejected
vulnerable path. All four now configure "secret123" on both sides with
an inline H-2 comment; the crypto/MaxTTL/profile behavior they assert
is unchanged.
Wire-format / behavioral invariants preserved:
- RFC 8894 SCEP handler is untouched (internal/api/handler/scep.go and
internal/pkcs7/*): GetCACaps/GetCACert responses, PKIOperation request
parsing, and the PKCS#7 certs-only response format are byte-identical.
- RFC 7030 EST handler is untouched
(internal/api/handler/est.go + internal/pkcs7/*).
- Revocation idempotency composite key (H-1, migration 000012) untouched.
- AES-256-GCM config encryption (C-2) untouched.
- CRL DER bytes and OCSP response bytes unchanged.
Verification:
- go build ./... silent success
- go vet ./... silent success
- go test -race -count=1 ./internal/service/ ./cmd/server/
./internal/api/handler/ ./internal/integration/ all OK
- Coverage with comfortable headroom over CI gates:
service 67.8% (gate 55%)
handler 79.0% (gate 60%)
domain 92.7% (gate 40%)
middleware 80.0% (gate 30%)
cmd/server 1.6% (preflightSCEPChallengePassword: 100%)
internal/service/scep.go PKCSReq statement coverage: 100%.
- rg sweeps: no `s.challengePassword != ""` remains;
no `challengePassword != s.challengePassword` remains.
Operational note: operators with SCEP enabled but no challenge password
set will see a fatal startup error and a log line citing
CERTCTL_SCEP_CHALLENGE_PASSWORD and CWE-306 after upgrading. This is the
intended fail-closed behavior. Fix by either setting the env var to a
non-empty shared secret or setting CERTCTL_SCEP_ENABLED=false.
Audit report: certctl-audit-report.md (revision 5) logs this under
H-2 Resolution Log.
EncryptIfKeySet/DecryptIfKeySet in internal/crypto/encryption.go previously
returned plaintext + wasEncrypted=false when the operator had not configured
CERTCTL_CONFIG_ENCRYPTION_KEY. That produced a data-at-rest confidentiality
bypass (CWE-311): sensitive fields on dynamically-configured issuer and
target rows (source='database') were persisted to PostgreSQL without any
encryption, and no caller could distinguish the encrypted from the plaintext
branch at runtime. The only visible signal was a single warning log line
emitted once at startup.
Fail closed instead:
- EncryptIfKeySet / DecryptIfKeySet now return crypto.ErrEncryptionKeyRequired
(a new exported sentinel, errors.Is-unwrappable) when the key is empty or
nil, rather than silently emitting plaintext. The (result, wasEncrypted,
err) tuple signature is preserved for source compatibility; only the
semantics of the no-key branch changed.
- cmd/server/main.go grows a startup pre-flight check: if no encryption key
is configured the server lists issuers and targets, counts rows with
source='database', and refuses to start (os.Exit(1)) if any exist. Operators
must either configure CERTCTL_CONFIG_ENCRYPTION_KEY or remove the exposed
rows before the control plane can boot. The warning-only path is retained
for the clean-slate case (no database rows).
- internal/service/issuer.go's SeedFromEnvVars now guards the encryption call
with len(s.encryptionKey) > 0 so env-seeded rows (source='env', which are
reconstructable on every boot from process env) continue to persist as
plaintext in the 'config' column when no key is configured. Registry load
already falls through to cfg.Config when EncryptedConfig is nil. GUI/API
write paths (source='database') remain fail-closed via propagation of
ErrEncryptionKeyRequired.
- Integration tests that exercise CreateIssuer via the handler layer now
supply a real 32-byte AES-256 test key so the encrypt path runs instead of
returning ErrEncryptionKeyRequired. Same pattern in internal/service/
testutil_test.go for consolidated service-layer tests.
- internal/crypto/encryption_test.go grows regression guards:
TestEncryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
TestDecryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
TestEncryptDecryptIfKeySet_RoundTripProducesDifferentCiphertext,
TestDecryptIfKeySet_RejectsTamperedCiphertext, and
TestEncryptIfKeySet_PreservesErrEncryptionKeyRequiredSentinel (verifies
the sentinel unwraps through fmt.Errorf(%w)-style wrapping).
Wire format is unchanged: AES-256-GCM Encrypt/Decrypt/DeriveKey, the
12-byte nonce prefix, the GCM auth tag, the PBKDF2 salt
('certctl-config-encryption-v1'), and the 100,000 iteration count are all
byte-identical. Ciphertexts produced before this change remain decryptable.
Verified:
- go build ./... : clean
- go vet ./... : clean
- go test -race ./internal/crypto/... ./internal/service/... \
./internal/integration/... ./cmd/server/... : pass
- golangci-lint run ./... : 0 issues
- govulncheck ./... : 0 reachable vulnerabilities
- rg 'return plaintext, false, nil' internal/ : no matches
- Coverage: crypto 85.0% (unchanged), service 67.8% (was 67.9%, noise),
cmd/server 0.0% (unchanged baseline). All above CI thresholds.
See certctl-audit-report.md for the full finding record and resolution log.
Enforce certificate profile crypto constraints across all 5 issuance paths
(renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs
with key algorithm/size that don't match profile rules. MaxTTL enforcement
caps certificate validity per issuer connector (Local CA, Vault, step-ca
enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and
size are now persisted in certificate_versions for audit compliance.
16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded
version number from GUI sidebar. Documentation updated across architecture,
features, connectors, and README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Simple Certificate Enrollment Protocol with single-endpoint
operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7
SignedData CSR extraction with fallback for raw/base64 CSR, challenge
password authentication via CSR attributes, and shared internal/pkcs7
package extracted from EST handler to eliminate code duplication.
24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SA1029: use typed context key instead of string in main_test.go
S1039: remove unnecessary fmt.Sprintf in validation_test.go
SA4023: fix unreachable nil check on concrete error type
SA4006: fix unused variable assignments in stepca_test.go (4 occurrences)
SA4000: fix duplicate expression in ssh_test.go (BEGIN vs END CERTIFICATE)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.
All tests pass: go test -short, go vet, go test -race clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.
Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard
Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror M34's dynamic issuer config pattern for deployment targets: AES-256-GCM
encrypted config storage, sensitive field redaction in API responses, agent
heartbeat-based test connection endpoint, and full frontend updates including
test status indicators, source badges, and removal of stale hostname/status
fields from the Target interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace static env-var-based issuer wiring with GUI-driven dynamic
configuration stored encrypted in PostgreSQL. Operators can now
configure, test, enable/disable, and manage issuers from the dashboard
without restarting the server.
Key changes:
- AES-256-GCM encryption for sensitive issuer config at rest (PBKDF2
key derivation with 100k iterations)
- Dynamic IssuerRegistry with sync.RWMutex replacing static map
- Connector factory pattern (issuerfactory.NewFromConfig) replacing
140 lines of static wiring in main.go
- Migration 000009: encrypted_config, last_tested_at, test_status,
source columns on issuers table
- Env var seeding on first boot with ON CONFLICT DO NOTHING
- Registry Rebuild() for atomic map swap after CRUD operations
- Issuer type validation against domain constants on Create
- Audit trail for test connection results
- Conditional seeding for step-ca/OpenSSL (only when env vars set)
- GUI: source badge, connection test status on issuer detail page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google Cloud Certificate Authority Service integration via REST API
with OAuth2 service account auth (JWT→access token). Synchronous
issuance model, CA pool selection, mutex-guarded token caching,
revocation with RFC 5280 reason mapping. No Google SDK dependency —
all stdlib. 19 tests with httptest mock OAuth2 + CAS API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dual-mode TLS connector for mail servers — single package with mode
field selecting Postfix or Dovecot defaults. File-based cert/key
deployment with correct permissions (cert 0644, key 0600), optional
chain append, shell injection prevention, and configurable
reload/validate commands. 18 tests covering config validation,
deployment, and security. GUI wizard fields and OpenAPI enum updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
File-based deployment for Envoy service mesh — writes cert/key/chain
to watched directory with optional SDS JSON config for xDS bootstrap.
Path traversal prevention, configurable filenames, 15 tests passing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete the IIS target connector with dual-mode deployment:
- WinRM proxy agent mode via masterzen/winrm for remote Windows servers
- Base64 PFX transfer with try/finally cleanup on remote host
- GUI wizard updated with 13 IIS config fields including WinRM settings
- TargetDetailPage sensitive field redaction (password/secret/token/key)
- OpenAPI TargetType enum updated (added Traefik, Caddy)
- connectors.md fully documented with WinRM proxy config example
- 38 total IIS tests (10 new WinRM tests), all passing with race detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:
ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)
step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)
NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional
Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission
Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deployment jobs now set agent_id from target→agent relationship at
creation time. GetPendingWork() uses ListPendingByAgentID() with a
3-way UNION query (direct match, legacy NULL fallback via target JOIN,
AwaitingCSR via cert→target→agent chain) so each agent only receives
its own jobs.
- Added AgentID *string to Job domain struct
- Added agent_id to all job SQL queries (5 SELECTs, INSERT, UPDATE, scanJob)
- New ListPendingByAgentID() repository method
- Rewrote GetPendingWork() from ~25 lines to single scoped query
- 4 new Go tests (3 agent routing + 1 deployment agent_id)
- Frontend: agent_id/target_id on Job type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add certificate export in PEM (JSON or file download) and PKCS#12 formats.
Private keys are never included — they stay on agents. Add EKU-aware
issuance threading profile EKUs (serverAuth, clientAuth, codeSigning,
emailProtection, timeStamping) through the full issuance pipeline. Fix
agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for
S/MIME vs TLS, and a pre-existing generateID collision bug in deployment
job creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Six pages were read-only viewers despite the API client having all
create functions wired up. Users deploying certctl had no way to create
CAs or other objects from the GUI — reported in GitHub issue.
- IssuersPage: 2-step create modal (type selection → config) for
Local CA, ACME, step-ca, OpenSSL/Custom issuer types
- PoliciesPage: create modal with type, severity, JSON config, enabled
- ProfilesPage: create modal with name, description, max TTL, short-lived
- OwnersPage: create modal with name, email, team dropdown
- TeamsPage: create modal with name, description
- AgentGroupsPage: create modal with match criteria fields
- Layout.tsx: version v2.0.5 → v2.0.7
- cmd/server/main.go: version 0.1.0 → 2.0.7
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010)
and performs comprehensive documentation audit to eliminate drift between
code and docs.
Code changes:
- TICKET-003: Repository integration tests with testcontainers-go (50+ subtests)
- TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc
- TICKET-010: Request body size limits via http.MaxBytesReader middleware
- Fix missing slog import in certificate.go after service decomposition
Documentation updates:
- README: Fix endpoint count (97→93), expand env var reference (15→39 vars)
- CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations
- architecture.md: Add body size limits section, middleware chain ordering
- CONTRIBUTING.md: New contributor guide with architecture conventions,
test patterns, middleware ordering, CI thresholds
- SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored)
- Test files: Add doc comments to all new test files
Documentation that should exist but doesn't yet:
- Architecture diagrams (C4 model or similar)
- Threat model document
- Testing philosophy guide
- Disaster recovery runbook
- Upgrade guide (migration between versions)
- API versioning strategy document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 18-parameter RegisterHandlers function signature with a cleaner
HandlerRegistry struct that groups all API handler dependencies. This eliminates
the signature explosion that made the function difficult to read and maintain.
Changes:
- Added HandlerRegistry struct with 18 fields grouping all handler types
- Updated RegisterHandlers to accept a single HandlerRegistry parameter
- Updated all internal handler references to use reg.FieldName syntax
- Updated call sites in cmd/server/main.go and integration tests
- No functional changes, purely structural refactoring
Resolves TICKET-006: RegisterHandlers Signature Explosion
The generateTestCert() function previously returned &x509.Certificate{Raw: []byte("test")},
which is not a valid DER-encoded certificate. Replace with a proper self-signed certificate
generator using ECDSA P-256 that creates valid X.509 certificates for testing.
Added imports: crypto/ecdsa, crypto/elliptic, crypto/rand, crypto/x509/pkix, math/big
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added TestSlack_ClientHasTimeout to verify 10-second timeout
- Added TestTeams_ClientHasTimeout to verify 10-second timeout
- Added TestPagerDuty_ClientHasTimeout to verify 10-second timeout
- Added TestOpsGenie_ClientHasTimeout to verify 10-second timeout
- All notifiers already configured with 10 second timeout in New()
- Tests verify timeout is set and matches expected value
- Fix undefined tls.Listener in verify_test.go (type doesn't exist in
crypto/tls); use server.Listener.Addr() and server.TLS.Certificates
- Fix mockJobRepository missing Delete/ListByStatus/ListByCertificate/
UpdateStatus/GetPendingJobs methods required by JobRepository interface
- Fix mockAuditService type mismatch: NewVerificationService expects
*AuditService (concrete), not a mock; use real AuditService with mock
repo following existing testutil_test.go patterns
- Fix List() signature mismatch (had extra filter param)
- Add nil-safe logger checks in verify.go to prevent panics in tests
- Remove unused imports (crypto/tls, bytes, repository)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.
M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.
Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EAB credentials (KID + HMAC) were defined in the ACME connector config
but never wired into the acme.Account registration call. This fixes the
dead code and adds automatic EAB credential fetching for ZeroSSL — when
the directory URL is detected as ZeroSSL and no EAB credentials are
provided, certctl calls ZeroSSL's public API to get them automatically.
Changes:
- Wire EABKid/EABHmac into acme.Account.ExternalAccountBinding
- Add isZeroSSL() detection and fetchZeroSSLEAB() auto-fetch
- Add CERTCTL_ACME_EAB_KID/CERTCTL_ACME_EAB_HMAC env vars to main.go
- Add 13 ACME connector tests (config validation, EAB decode, ZeroSSL
auto-EAB with mock servers, URL detection)
- Update docs: README, architecture, connectors, demo-advanced,
testing-guide with EAB/auto-EAB documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standing TXT record at _validation-persist.<domain> eliminates per-renewal
DNS updates. Auto-fallback to dns-01 if CA doesn't offer dns-persist-01.
ScriptDNSSolver extended with PresentPersist method. Configurable via
CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01 and
CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN env vars.
Also fixes IsExpired edge-case test in discovery_test.go that always failed
due to time.Now() drift between test setup and method invocation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).
Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
issuer interface, resource lists, OpenSSL status)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>