M-004 — OCSP issuer binding (composite key):
The OCSP lookup path now binds (issuer_id, serial) as a composite key
rather than resolving by serial alone. CertificateRepository and
RevocationRepository gain GetByIssuerAndSerial methods; ca_operations.go
scopes both lookups by the issuer_id path param. When no managed cert
binds to that (issuer, serial) tuple, GetOCSPResponse constructs an
RFC 6960 §2.2 'unknown' response (CertStatus=2) instead of the prior
default 'good'. Short-lived cert exemption (profile TTL < 1h) is
preserved. Real repo errors (non-sql.ErrNoRows) fail closed with a log.
Regression coverage: internal/service/ca_operations_test.go
- TestCAOperationsSvc_GetOCSPResponse_Unknown_CrossIssuer
- TestCAOperationsSvc_GetOCSPResponse_Unknown_UnknownSerial
M-005 — Discovery Claim/Dismiss actor propagation:
DiscoveryService.ClaimDiscovered and DismissDiscovered now accept an
explicit 'actor string' parameter (propagation pattern mirrors
bulk_revocation.go / revocation_svc.go). The handler layer passes
resolveActor(r.Context()) — the named-key identity established by the
M-002 auth unification — and the service falls back to 'api' (the same
safe sentinel resolveActor uses when no auth context is present) only
when the caller passes an empty string. Never falls back to 'operator'.
Regression coverage: internal/service/discovery_test.go
- TestDiscoveryService_ClaimDiscovered_AuditActor
- TestDiscoveryService_DismissDiscovered_AuditActor
- TestDiscoveryService_ClaimDiscovered_EmptyActorFallsBackToAPI
- TestDiscoveryService_DismissDiscovered_EmptyActorFallsBackToAPI
Each new test asserts event.Actor matches the caller-supplied string (or
'api' on empty input) and explicitly asserts event.Actor != 'operator'
to lock in the historical fix intent.
Files:
internal/api/handler/discovery.go — pass resolveActor(ctx)
internal/api/handler/discovery_handler_test.go — updated call sites
internal/integration/lifecycle_test.go — updated mock wiring
internal/repository/interfaces.go — GetByIssuerAndSerial on
CertificateRepository +
RevocationRepository
internal/repository/postgres/certificate.go — composite key lookup
internal/service/ca_operations.go — (issuer_id, serial) scoping
internal/service/ca_operations_test.go — 2 new M-004 tests
internal/service/discovery.go — actor parameter + 'api' fallback
internal/service/discovery_test.go — 4 new M-005 tests
internal/service/shortlived_test.go — mock signature update
internal/service/testutil_test.go — mock GetByIssuerAndSerial
CI failure on master (commit 3287e17) — staticcheck ST1020:
internal/api/middleware/middleware.go:125:1: ST1020: comment on exported
function NewAuthWithNamedKeys should be of the form
"NewAuthWithNamedKeys ..." (staticcheck)
When NewAuth was renamed to NewAuthWithNamedKeys during the M-002 auth
unification, the leading godoc sentence was left pointing at the old name.
Rewrite the comment so its first sentence starts with the new function
name, and expand the body to describe the named-key + admin-flag contract
introduced in 3287e17.
Also gitignore /.gopath/ — session-scoped tool install cache, same
category as /.gocache/ and /.gomodcache/.
Verification:
go vet ./internal/api/middleware/... — clean
go build ./internal/api/middleware/... — clean
go test ./internal/api/middleware/... — PASS (0.245s)
staticcheck -checks=all,<project exclusions> — clean across
middleware, handler, service, domain, cmd/server, scheduler
Closes: CI failure on 3287e17.
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
a53a4b8. The work lands as a single commit spanning server, docs, tests,
and the React client.
M-002 — Named API keys with per-key actor propagation
* Migration 000014 adds the 'api_keys' table (id, name, hash,
principal, role, created_at, last_used_at, disabled_at) so every
credential carries an identifiable principal instead of the
opaque 'anonymous'/'api-key' sentinel.
* Auth middleware now rotates through configured keys, performs
constant-time hash comparison, stamps 'last_used_at', and emits
an actor struct via contextWithActor(). The audit middleware,
bulk-revocation handler, approval handlers, and MCP tool layer
now read the principal off the context and persist it on every
audit_events row.
* Regression coverage:
- internal/api/middleware/audit_test.go — actor propagation,
principal redaction for disabled keys, anonymous fallback for
unauthenticated endpoints.
- internal/api/handler/bulk_revocation_handler_test.go,
job_handler_test.go — principal-on-audit assertions.
M-003 — Authorization gates (Phase B)
* Approval handler rejects self-approval / self-rejection with 403
when the actor principal equals the job's requested_by field.
* Bulk revocation is gated behind the 'admin' role; operators and
viewers receive 403.
* Regression coverage:
- internal/service/job_test.go — TestApproveJob_NotSelf,
TestRejectJob_NotSelf.
- internal/api/handler/bulk_revocation_handler_test.go —
TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.
M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
* Per RFC 8615, relying parties cannot reasonably be asked to
authenticate against the issuing certctl instance to retrieve
revocation material. CRL and OCSP move off the authenticated
'/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
GET /.well-known/pki/crl/{issuer_id}
Content-Type: application/pkix-crl (RFC 5280 §5)
GET /.well-known/pki/ocsp/{issuer_id}/{serial}
Content-Type: application/ocsp-response (RFC 6960)
* Non-standard JSON CRL shape is removed; only DER is served.
* Short-lived certificate exemption (profile TTL < 1h → skip
CRL/OCSP) is preserved; the response simply omits the serial.
* Routes are registered on the unauthenticated 'finalHandler' mux
in cmd/server/main.go alongside EST ('/.well-known/est/*') and
SCEP ('/scep'). Legacy authenticated paths return 404.
* Regression coverage:
- internal/api/handler/certificate_handler_test.go — content
type, DER parseability, 404 for unknown issuer.
- internal/api/handler/adversarial_path_test.go — unauthenticated
access asserted for CRL, OCSP, EST, SCEP.
- internal/api/router/router_test.go — route-table assertion
that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
mounted on the unauthenticated branch.
M-001 — Auto-closed by M-002
EST and SCEP were already registered on the unauthenticated
'finalHandler' mux; the router comment at
internal/api/router/router.go:247 now matches reality. The
adversarial-path tests above lock the behavior in.
Verification (all gates green):
* go vet ./... — clean
* go build ./... — ok
* go test -short ./... (55+ packages) — all pass
* web/ : npm test (225 Vitest tests) — all pass
* web/ : npx tsc --noEmit — clean
* grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
all intentional M-006 tombstone/relocation comments.
Documentation:
* coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
Fixed, with per-finding resolution paragraphs citing regression
test IDs. (Audit file lives outside this repo; see cowork root.)
* CLAUDE.md Project Status line updated with the auth-unification
closure note.
* docs/features.md, docs/architecture.md, docs/quickstart.md,
docs/concepts.md, docs/connectors.md, docs/test-env.md,
docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
— refreshed for the new '.well-known/pki/*' namespace and named
API keys.
* api/openapi.yaml — documents the new unauthenticated endpoints
and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.
.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
C-001 — CreateCertificate was server-accepted with null owner_id,
team_id, renewal_policy_id because the GUI neither collected the fields
nor enforced them, even though the backend's ManagedCertificate schema
and handler contract treat them as required. Fix the contract at all
four layers:
- web/src/pages/CertificatesPage.tsx: replace owner_id/team_id free-
text inputs with <select> elements fed by getOwners/getTeams/
getPolicies queries; mark all three required; gate the Create
button on owner_id + team_id + renewal_policy_id being set.
- internal/api/handler/certificates.go: ValidateRequired for
owner_id, team_id, renewal_policy_id on CreateCertificate so the
handler returns HTTP 400 with the offending field name before the
service layer is reached.
- internal/mcp/types.go: drop ',omitempty' from
CreateCertificateInput.RenewalPolicyID so the MCP schema reflects
the required contract; Update inputs keep partial-update semantics.
- api/openapi.yaml: 'required: [name, common_name, renewal_policy_id,
issuer_id, owner_id, team_id]' was already present on the Create
schema; clarified DeploymentTarget.agent_id description to note the
FK contract.
C-002 — CreateTargetWizard accepted an empty or bogus agent_id and the
service inserted directly, producing a Postgres 23503 FK-violation that
bubbled out as a generic HTTP 500. The FK itself (migration 000001 line
104: agent_id TEXT NOT NULL REFERENCES agents(id)) is correct; we keep
the schema strict and add validation at three layers:
- internal/service/target.go: introduce
ErrAgentNotFound sentinel and pre-validate agent_id in
TargetService.CreateTarget — empty string returns
'agent_id is required'; a nonexistent id returns the full
'referenced agent does not exist: <id>' error. Both wrap
ErrAgentNotFound via fmt.Errorf %w so callers can use errors.Is.
- internal/api/handler/targets.go: ValidateRequired on agent_id; map
errors.Is(err, service.ErrAgentNotFound) to HTTP 400 instead of
letting it fall through to the generic 500 branch.
- internal/mcp/types.go: drop ',omitempty' from
CreateTargetInput.AgentID to match the required contract.
- web/src/pages/TargetsPage.tsx: replace the free-text Agent ID input
with a <select> populated from getAgents(); include agent in the
canProceedToReview gate so Next is disabled until an agent is
chosen.
Regression coverage (21 new subtests total):
- TestCreateCertificate_MissingRequiredField_Returns400 — 6 subtests,
one per required field, each proves the handler guard fires before
the mock service is called.
- TestCreateTarget_MissingAgentID_Returns400 — handler guard.
- TestCreateTarget_NonexistentAgent_Returns400 — pins the
ErrAgentNotFound -> 400 translation.
- TestTargetService_CreateTarget_MissingAgentID — errors.Is sentinel.
- TestTargetService_CreateTarget_NonexistentAgentID — errors.Is.
- The existing TestTargetService_CreateTarget_Success, along with
TestCreateTarget_{MissingName,MissingType,NameTooLong}_* handler
tests, were updated to seed a real agent or include agent_id in
the request body so the happy paths still run cleanly.
Gates (Phase 4):
- go build/vet/test/race: green
- go test -cover: internal/service 68.7% (gate 55%),
internal/api/handler 78.9% (gate 60%)
- golangci-lint on service+handler+mcp: 0 issues
- govulncheck: no reachable vulns
- tsc --noEmit: clean
- vitest: 223/223 passing
See cowork/certctl-coverage-gap-audit.md entries C-001 and C-002.
D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:
(a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
types ('ownership', 'environment', 'lifetime', 'renewal_window')
that the handler validator rejects on Create/Update but that
raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
switch fell through to the default "unknown policy rule type"
error branch on every demo rule × every cert × every cycle,
flooding logs while emitting zero violations.
(b) migrations/seed_demo.sql persisted lowercase severity values
('critical', 'error', 'warning') on policy_violations rows.
INSERT succeeded because that column had no CHECK, but any
frontend comparing against the canonical PolicySeverity enum
mis-categorized every seeded violation.
(c) evaluateRule hardcoded Severity: PolicySeverityWarning on
every emitted violation and ignored rule.Config entirely —
so the D-006 per-rule severity column (000013) and every
per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
required_keys, allowed, lead_time_days, max_days}) was dead
data below the evaluation layer.
This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.
## Changes
Domain (internal/domain/policy.go):
* Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
arm — routing it through RenewalLeadTime would conflate "how
close to expiry before we renew" with "how long can the cert
possibly be", two distinct semantics. The new type accepts
config {"max_days": int} and flags certs whose
NotAfter - NotBefore exceeds the cap.
Handler validator (internal/api/handler/validation.go):
* ValidatePolicyType allowlist grown to 6 canonicals
(AllowedIssuers, AllowedDomains, RequiredMetadata,
AllowedEnvironments, RenewalLeadTime, CertificateLifetime).
OpenAPI (api/openapi.yaml):
* PolicyType enum grown to match domain.
Frontend (web/src/api/types.ts, types.test.ts):
* POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
all 6 canonicals and rejects casing drift.
Migration 000014 (policy_violations severity CHECK):
* Named CHECK constraint (policy_violations_severity_check)
mirroring 000013's allowlist, defense-in-depth at the DB layer
against future drift from bypassed writes (migrations, psql
sessions, future callers). Symmetric down migration drops by
name.
Seed data:
* migrations/seed.sql rewritten to emit TitleCase canonicals with
per-arm config JSON that actually exercises the config-consuming
paths (not the missing-field backstops):
- pr-require-owner → RequiredMetadata {"required_keys":["owner"]} Warning
- pr-allowed-environments → AllowedEnvironments {"allowed":["production","staging","development"]} Error
- pr-max-certificate-lifetime → CertificateLifetime {"max_days":90} Critical
- pr-min-renewal-window → RenewalLeadTime {"lead_time_days":14} Warning
Severities are now differentiated per rule (D-006 intent).
* migrations/seed_demo.sql violation rows flipped to TitleCase
severity ('Critical', 'Error', 'Warning') so migration 000014
applies cleanly on upgrade paths.
Engine rewrite (internal/service/policy.go):
* evaluateRule rewritten. All six arms now:
1. Parse rule.Config into the per-arm typed struct.
2. Bad JSON → log at ValidateCertificate boundary and skip
this rule (no co-located poisoning of other rules in the
same batch).
3. Empty/null Config → emit the pre-D-008 missing-field
violation (backwards compat invariant — operators who
haven't reconfigured still see the same output).
4. Violations emitted carry rule.Severity (no more hardcoded
Warning); D-006 column is now load-bearing.
* CertificateLifetime arm reads NotBefore/NotAfter from the
certificate's latest version via CertRepo. Injected via
PolicyService.SetCertRepo() setter — avoids churning ~36
NewPolicyService call sites while keeping the lifetime arm
optional (degrades to a log+skip if the setter is not wired).
Server wiring (cmd/server/main.go):
* policyService.SetCertRepo(certRepo) wired after construction.
Tests (internal/service/policy_test.go):
* 25 new subtests across 5 groups:
- TestEvaluateRule_SeverityPassThrough (6): every rule type
emits violations carrying rule.Severity, not hardcoded.
- TestEvaluateRule_ConfigConsumed (12): every per-arm Config
path exercised positive + negative.
- TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
Config still emits pre-D-008 missing-field violations.
- TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
and skips cleanly without poisoning neighbors.
- TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
ok when repo wired, log+skip when not, handles missing
NotBefore/NotAfter edges.
Provenance: D-008 surfaced during D-005/D-006 remediation review
in eef1db0. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
Coverage Gap Audit findings D-005 (P0) + D-006 (P1) fixed together in a
single commit because they share the same root cause — policy CRUD sending
values the backend silently rejects — and splitting them would leave a
half-working UI between commits.
## D-005 (P0): PoliciesPage dropdown 400s every Create Policy
Root cause
----------
`web/src/pages/PoliciesPage.tsx` populated the Type `<select>` from a
hardcoded `['key_algorithm', 'ownership', 'allowed_issuers', ...]` array.
The backend's `internal/api/handler/validators.go::ValidatePolicyType`
enforces the TitleCase allowlist `AllowedIssuers`, `AllowedDomains`,
`RequiredMetadata`, `AllowedEnvironments`, `RenewalLeadTime` — defined in
`internal/domain/policy.go`. Every Create Policy request was rejected with
`400 invalid policy type`. The error surfaced only as a transient toast;
the modal closed anyway. Silent user-visible failure.
Fix
---
- `web/src/api/types.ts`: added `POLICY_TYPES` and `POLICY_SEVERITIES`
tuples with `as const` and narrowed `PolicyRule.type`, `.severity`, and
`PolicyViolation.severity` to the literal-union types. Dropdown is now
sourced from the tuple; casing drift becomes a compile error.
- `web/src/pages/PoliciesPage.tsx`: rekeyed `severityStyles` /
`severityDots` to the TitleCase values, added `humanize()` for display
(AllowedIssuers → "Allowed Issuers"), removed the `badge-neutral`
fallback that was papering over the mismatch.
- `web/src/api/types.test.ts` (new): pins both tuples exactly. If anyone
edits one side of the frontend/backend contract without the other, CI
fails with a clear assertion. Pure-TS vitest, no RTL dependency.
## D-006 (P1): `severity` field silently dropped on create/update
Root cause
----------
`PolicyRule` had no `Severity` field in `internal/domain/policy.go`. The
frontend has always sent `severity` on create/update, but Go's
`json.Decoder` (default settings, no `DisallowUnknownFields`) silently
dropped it. The value never reached PostgreSQL. Every rule rendered with
the same severity because there was no severity — just a display
computation downstream.
Fix: option (b), full-stack schema add (not delete-the-field)
-------------------------------------------------------------
- Migration `000013_policy_rule_severity` (up + down): adds
`severity VARCHAR(50) NOT NULL DEFAULT 'Warning'` to `policy_rules` with
CHECK constraint `severity IN ('Warning', 'Error', 'Critical')`. No
index — three-value column on a low-thousands-rows table, planner will
seq-scan regardless. PG 11+ metadata-only ADD COLUMN, safe on live data.
- `internal/domain/policy.go`: added `Severity PolicySeverity` field.
- `internal/repository/postgres/policy.go`: plumbed `severity` through
ListRules SELECT + Scan, GetRule SELECT + Scan, CreateRule INSERT,
UpdateRule UPDATE (4 queries).
- `internal/service/policy.go::UpdatePolicy`: if the client omits
severity on a PUT (zero-value empty string), fetch the existing rule
and preserve its severity. Without this, partial updates would trip the
NOT NULL CHECK and 500. Preserves pre-existing behavior for Name/Type
(out of scope).
- `internal/api/handler/policies.go::CreatePolicy`: default empty severity
to `'Warning'`, then validate via `ValidatePolicySeverity`. 400 with
clear message instead of 500 on CHECK violation. `UpdatePolicy`:
validates severity only when provided.
- `internal/mcp/types.go` + `internal/mcp/tools.go`: added optional
`severity` on the MCP `create_policy` / `update_policy` tool inputs so
LLM callers stay in sync with the wire contract.
- `api/openapi.yaml`: added `severity` to the `PolicyRule` schema with
the enum and default.
Acceptance criterion (user-defined)
-----------------------------------
"Create a rule with severity=Critical, reload the page, and still see
Critical — no silent drops." Verified end-to-end: frontend sends
`severity: "Critical"`, handler validates, service persists, DB stores,
GET returns, React renders the correct badge.
Seed data
---------
`migrations/seed.sql`: four demo rules now have differentiated severities
— `pr-require-owner` → Warning, `pr-allowed-environments` → Error,
`pr-max-certificate-lifetime` → Critical, `pr-min-renewal-window` →
Warning. The user called out that seeding all four at the same severity
makes the feature look decorative; differentiation demonstrates the
column carries real signal.
## Integration test fix (side effect of D-006)
`internal/integration/e2e_test.go::TestCrossResourceWorkflow/CreatePolicy`
was sending `"severity": "High"` — a value from the pre-audit severity
vocabulary that the new `ValidatePolicySeverity` correctly rejects with
400. Changed to `"Error"` (closest semantic match in the new TitleCase
allowlist). Only severity reference in the integration/ directory;
verified via grep.
## Out of scope, logged for follow-up (d/D-008)
Three policy-engine drift issues orthogonal to D-005 + D-006, explicitly
deferred per direction:
1. `migrations/seed.sql` policy_rules INSERTs use lowercase TYPE values
(`'ownership'`, `'environment'`, `'lifetime'`, `'renewal_window'`).
These are load-bearing on `internal/service/policy.go::evaluateRule`'s
`switch rule.Type` (which also uses the lowercase strings). Migrating
requires coordinated changes across seed + evaluation engine.
2. `migrations/seed_demo.sql:482-483` contains lowercase `'critical'`
severity — will now fail the new CHECK constraint. Separate fix.
3. `evaluateRule` hardcodes `Severity: domain.PolicySeverityWarning` on
emitted violations and ignores the configured `rule.Config`. The new
severity column is read correctly on the CRUD path but not yet
consulted during evaluation.
## Verification
Backend:
- `go build ./...` — clean
- `go vet ./...` — clean
- `go test -short ./...` — all packages green, including
`internal/service` (policy service), `internal/api/handler` (policy +
MCP handler tests), `internal/integration` (e2e_test.go after fix),
`internal/domain`, `internal/repository/postgres`.
Frontend:
- `tsc --noEmit` — clean
- `vitest run` — 223/223 passing (4 new assertions in types.test.ts)
- `vite build` — clean (only the pre-existing chunk-size warning)
Final PR in the six-commit M-2 sequence (PR-A: CertificateService cluster
cdc9d03, PR-B: IssuerService+TargetService eb14236, PR-C: Policy/Profile/
Owner/Team 2497be4, PR-D: Job/Notification/Audit ccd89c3, PR-E: AgentService
283ec27, PR-F: this commit). PR-A through PR-E collapsed the service-layer
shim methods and deleted every in-production context.Background() /
context.TODO() call from internal/service/; this PR completes the sweep
across the non-service tiers (HTTP middleware + ACME connector) and wires
the contextcheck linter so regressions fail CI.
Three narrow edits land the D-3 pattern (context.WithoutCancel for
subsidiary async writes and deferred shutdown contexts):
- internal/api/middleware/audit.go -- async audit goroutine now runs
on auditCtx := context.WithoutCancel(r.Context()) instead of
context.Background(). Preserves request-scoped values (trace ID, auth)
while detaching from the request's cancellation so the audit write
does not get killed when the response completes. Goroutine is still
tracked via a.wg (M-1 shutdown drain) so Flush(ctx) behaviour is
unchanged. CWE-770 Missing Release (goroutine leak potential) +
CWE-400 Resource Exhaustion (missed cancellation propagation).
- internal/api/middleware/middleware.go -- Recovery panic path now
logs via slog.ErrorContext(ctx, ...) instead of log.Printf. Request-
scoped trace/auth metadata now carries through the panic log, matching
every other request log. D-3 non-bypass: the context is r.Context()
captured before the defer, so even a panic mid-handler propagates
the ctx's trace ID into the ERROR log line.
- internal/connector/issuer/acme/acme.go (HTTP-01 challenge server
shutdown) -- defer shutdown context derived from
context.WithTimeout(context.WithoutCancel(ctx), 5s) instead of
context.Background(). Preserves parent ctx values, detaches from
parent cancellation so Shutdown always gets its full 5-second
budget even when the parent was cancelled. Matches the same pattern
applied in ACME's solveAuthorizationsDNS01 and solveAuthorizationsDNSPersist01.
Linter wiring: .golangci.yml adds `contextcheck` to the enabled set.
golangci-lint v2.11.4 now fails CI on any function that takes a
context.Context parameter but calls into context.Background() or
context.TODO() instead of propagating -- regression guard for all five
prior PRs.
Verification (CI parity, GOCACHE=/tmp/gocache GOMODCACHE=/tmp/gomodcache
GOLANGCI_LINT_CACHE=/tmp/lintcache):
- go build ./... -> 0
- go vet ./... -> 0
- golangci-lint run (contextcheck enabled) -> 0 issues
- go test -race -short ./internal/api/middleware/... -> PASS
- go test -race -short ./internal/scheduler/... -> PASS
- go test -race -short ./internal/connector/issuer/acme/... -> PASS
- go test -race -short ./internal/service/... -> PASS
- rg "context\.(Background|TODO)\(\)" internal/service/ internal/scheduler/
internal/connector/ internal/api/middleware/ -> 0 non-test hits
(one pedagogical godoc reference in audit.go documenting why
context.Background() would be wrong remains intentional)
Wire-format invariants preserved: 0 API routes, 0 SQL migrations, 0
frontend bytes, 0 OpenAPI bytes, 0 connector interface signature changes,
0 new env vars, 0 new external dependencies (pure context stdlib). The
AuditRecorder interface signature, the body-hash algorithm (SHA-256 16
hex chars), the excluded-path short-circuit, the actor-extraction path,
the responseWriter status-capture wrapper, the AuditServiceAdapter, and
all 116 API routes under /api/v1/, /.well-known/est/, /scep, /health,
/auth are byte-identical.
M-2 aggregate across PR-A through PR-F: 57 files, +635 / -613 (PR-A 12f
+227/-237, PR-B 9f +150/-146, PR-C 17f +156/-148, PR-D 11f +67/-63,
PR-E 4f +9/-15, PR-F 4f +26/-4). With M-2 closed, 8 of 10 Medium
findings resolved; M-9, M-10, L-1..L-4, I-1..I-8 remain post-v2.1.0
hardening batch.
Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
Collapse CancelJobWithContext into CancelJob; eliminate 10 context.Background()
hits across the Job+Notification+Audit service cluster by threading ctx
through their handler-facing service interfaces.
Services (ctx-first):
- service/job.go: ListJobs, GetJob, CancelJob, ApproveJob, RejectJob now
accept ctx; the CancelJobWithContext wrapper is removed (handler callers
continue to invoke CancelJob, now ctx-aware).
- service/notification.go: ListNotifications, GetNotification, MarkAsRead
accept ctx.
- service/audit.go: ListAuditEvents, GetAuditEvent accept ctx.
Handlers (interface + callsites):
- handler/jobs.go, handler/notifications.go, handler/audit.go: local
service interfaces updated, r.Context() threaded at every callsite.
Tests:
- Mock services updated to match the new interfaces (ctx accepted and
ignored via '_ context.Context' first parameter; Fn closure fields
unchanged).
- job_test.go / notification_test.go callsites thread context.Background()
to match production shape.
Verification:
go build ./... ok
go vet ./... ok
go test -short ./... ok
go test -race -short ./... ok
golangci-lint run ./... 0 issues
Locked decisions from the M-2 plan:
D-1 ctx-only signatures (no dual forms)
D-4 preserve handler method names facing the router
D-5 domain types stay ctx-free
Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
Collapses CertificateService, RevocationSvc, and CAOperationsSvc to
ctx-accepting method signatures. Removes context.Background() synthesis
at 24 internal call sites across certificate.go, revocation_svc.go, and
ca_operations.go.
- Primary repo calls inherit request cancellation via the passed ctx.
- Audit and notification dispatches use context.WithoutCancel(ctx) so
they survive client disconnect.
- Collapses TriggerRenewal/TriggerRenewalWithActor,
TriggerDeployment/TriggerDeploymentWithActor, and
RevokeCertificate/RevokeCertificateWithActor sibling pairs into single
canonical ctx-accepting methods (decisions D-1, D-2).
Handlers pass r.Context(). Mocks and tests updated to match new
signatures. No HTTP surface change, no OpenAPI change.
PR 1 of 6 in the M-2 remediation chain. Master green at this commit.
Refs: certctl-audit-report.md M-2 (L143, L224)
Audit events spawned from the HTTP middleware ran in detached goroutines
using context.Background(). On SIGTERM the DB pool was closed before
those goroutines finished writing, silently dropping audit events
(CWE-662 Improper Synchronization / CWE-400 Uncontrolled Resource
Consumption).
NewAuditLog now returns an *AuditMiddleware struct that tracks every
spawned goroutine with sync.WaitGroup. Callers wire the middleware via
its Middleware method value (preserves the existing
func(http.Handler) http.Handler shape) and drain the WaitGroup with
Flush(ctx), which blocks until in-flight recordings complete or the
provided context is cancelled — mirroring scheduler.WaitForCompletion.
Flush is invoked in cmd/server/main.go between http.Server.Shutdown
(no new requests accepted) and db.Close (pool torn down), with a
timeout returning ErrAuditFlushTimeout wrapping ctx.Err().
Request-derived inputs (method, path, status) are snapshotted before
the goroutine spawn so the worker does not race with http.Server
reusing r after the handler returns.
Tests:
TestAuditLog_FlushDrainsInFlightGoroutines
TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout
Verification:
go build ./... : 0
go vet ./... : 0
go test -race -short ./... : 0 (all packages)
go test -cover ./internal/api/middleware : 81.4%
golangci-lint run : 0 issues
govulncheck ./... : 0 vulns in called code
Implements Simple Certificate Enrollment Protocol with single-endpoint
operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7
SignedData CSR extraction with fallback for raw/base64 CSR, challenge
password authentication via CSR attributes, and shared internal/pkcs7
package extracted from EST handler to eliminate code duplication.
24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes#7. The issuer create/update handlers swallowed all service errors
as generic 500s. Now differentiates: 409 for UNIQUE constraint violations,
400 for unsupported issuer type, 404 for not-found on update, 500 for
unknown errors. Adds structured error logging via slog.
OnboardingWizard now pre-populates config field defaults when a type is
selected (matching IssuersPage behavior), preventing empty required fields
from causing silent failures.
install-agent.sh hardened for curl|bash usage: --agent-id flag, =value
syntax, /dev/tty stdin reopening, proper stderr routing in download_binary,
non-interactive install examples in help text, and updated wizard commands.
Adds adversarial security tests for EST, path traversal, and query
injection handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SA1029: use typed context key instead of string in main_test.go
S1039: remove unnecessary fmt.Sprintf in validation_test.go
SA4023: fix unreachable nil check on concrete error type
SA4006: fix unused variable assignments in stepca_test.go (4 occurrences)
SA4000: fix duplicate expression in ssh_test.go (BEGIN vs END CERTIFICATE)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.
All tests pass: go test -short, go vet, go test -race clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror M34's dynamic issuer config pattern for deployment targets: AES-256-GCM
encrypted config storage, sensitive field redaction in API responses, agent
heartbeat-based test connection endpoint, and full frontend updates including
test status indicators, source badges, and removal of stale hostname/status
fields from the Target interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add S/MIME (emailProtection EKU) end-to-end test coverage:
- ValidateCommonName() now accepts email addresses for S/MIME certs
- S/MIME test profile (prof-test-smime) in seed data
- Phase 11 test: issuance, EKU, KeyUsage, email SAN verification
- EST config enabled in test Docker Compose
- Portable KeyUsage parsing (awk, works on BSD/GNU)
- Full test environment documentation (docs/test-env.md)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:
ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)
step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)
NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional
Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission
Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issuer Catalog (M33):
- Shared issuer type config (issuerTypes.ts) with 6 supported + 2 coming-soon types
- Composable wizard components (TypeSelector, ConfigForm, ConfigDetailModal)
- Catalog card layout with Connected/Available/Coming Soon badges
- VaultPKI and DigiCert added to create wizard with full config fields
- ACME EAB fields (eab_kid, eab_hmac with sensitive flag)
- Issuer type filter dropdown on configured issuers table
- Config detail modal replacing 60-char truncation
- IssuerDetailPage uses shared typeLabels/redactConfig, Edit button, enabled/disabled status
- StatusBadge extended with Enabled/Disabled styles
- 2 new frontend tests (VaultPKI + DigiCert create payload verification)
Bug fixes:
- CertificateService.CreateCertificate now defaults Status to Pending and Tags to
empty map when not set (DB column DEFAULTs only apply when columns are omitted
from INSERT, but our repo always includes all columns)
- CreateCertificate handler now logs actual error via slog.Error before returning
generic 500, enabling root cause debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: certificate_versions.csr_pem is nullable in the schema but
Go code scanned it into a plain string. Used sql.NullString in
ListVersions and GetLatestVersion to handle NULL values correctly.
Also includes: partial update fetch-merge-update pattern to prevent FK
violations, nil directory guard in discovery service, diagnostic slog
logging in handlers, export handler 422 for unparseable PEM, OpenAPI
spec corrections, MCP tool description improvements, and test fixes.
Rewrites the Release Sign-Off section in testing-guide.md to individual
test-level granularity (320 rows) with smoke test results audited and
checked off (121 pass, 5 skip, 194 manual remaining).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add certificate export in PEM (JSON or file download) and PKCS#12 formats.
Private keys are never included — they stay on agents. Add EKU-aware
issuance threading profile EKUs (serverAuth, clientAuth, codeSigning,
emailProtection, timeStamping) through the full issuance pipeline. Fix
agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for
S/MIME vs TLS, and a pre-existing generateID collision bug in deployment
job creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SA5011: use t.Fatal instead of t.Error before nil pointer access in
verification handler tests (stops test execution on nil)
- SA4006: replace unused lvalues with _ in repo_test.go and team_test.go
- ST1020: fix comment format on ListViolations to match method name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add context.Context to handler test mocks (agent, agent_group)
- Refactor scheduler to use local interfaces instead of concrete service types
- Wire RevocationSvc/CAOperationsSvc sub-services in integration tests
- Add context.Background() to service test calls (agent, agent_group)
- Fix repo integration tests: add FK prerequisite records (team, owner,
issuer, renewal_policy) before creating certificates
- Set MaxOpenConns(1) on test DB to preserve SET search_path across queries
- Fix Apache/HAProxy tests: replace "echo ok"/"echo reload" with "true"
binary to avoid macOS exec.Command PATH resolution failure
- Fix validation tests: correct error expectations for regex-first checks,
replace null byte strings with strings.Repeat for length tests
- Fix scheduler timeout test flakiness with t.Skip fallback
- Remove unused imports (context in ca_operations_test, service in scheduler)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010)
and performs comprehensive documentation audit to eliminate drift between
code and docs.
Code changes:
- TICKET-003: Repository integration tests with testcontainers-go (50+ subtests)
- TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc
- TICKET-010: Request body size limits via http.MaxBytesReader middleware
- Fix missing slog import in certificate.go after service decomposition
Documentation updates:
- README: Fix endpoint count (97→93), expand env var reference (15→39 vars)
- CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations
- architecture.md: Add body size limits section, middleware chain ordering
- CONTRIBUTING.md: New contributor guide with architecture conventions,
test patterns, middleware ordering, CI thresholds
- SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored)
- Test files: Add doc comments to all new test files
Documentation that should exist but doesn't yet:
- Architecture diagrams (C4 model or similar)
- Threat model document
- Testing philosophy guide
- Disaster recovery runbook
- Upgrade guide (migration between versions)
- API versioning strategy document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The audit middleware records events asynchronously via goroutines. Tests previously
used time.Sleep(50ms) to wait for audit recording, which is unreliable.
Implemented waitableAuditRecorder wrapper that:
- Wraps mockAuditRecorder to intercept RecordAPICall invocations
- Signals via buffered channel when recording completes
- Provides Wait(timeout) method for tests to synchronously wait
- Returns true on successful wait, false on timeout
Replaced all 7 time.Sleep(50ms) calls with recorder.Wait(1*time.Second) calls,
improving test reliability and reducing flakiness.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 18-parameter RegisterHandlers function signature with a cleaner
HandlerRegistry struct that groups all API handler dependencies. This eliminates
the signature explosion that made the function difficult to read and maintain.
Changes:
- Added HandlerRegistry struct with 18 fields grouping all handler types
- Updated RegisterHandlers to accept a single HandlerRegistry parameter
- Updated all internal handler references to use reg.FieldName syntax
- Updated call sites in cmd/server/main.go and integration tests
- No functional changes, purely structural refactoring
Resolves TICKET-006: RegisterHandlers Signature Explosion
- Updated AgentService interface to accept context.Context parameter in all methods
- Replaced context.Background() calls with proper ctx parameter in agent.go
- Updated AgentGroupService interface to accept context.Context parameter
- Replaced context.Background() calls with proper ctx parameter in agent_group.go
- Updated handler methods to pass r.Context() to service methods
- Context now properly propagates through request lifecycle for timeout/cancellation
- Improved request tracing and cancellation behavior
- Added TestSlack_ClientHasTimeout to verify 10-second timeout
- Added TestTeams_ClientHasTimeout to verify 10-second timeout
- Added TestPagerDuty_ClientHasTimeout to verify 10-second timeout
- Added TestOpsGenie_ClientHasTimeout to verify 10-second timeout
- All notifiers already configured with 10 second timeout in New()
- Tests verify timeout is set and matches expected value
The handler's VerificationService interface used interface{} for the ctx
parameter, but the service implementation uses context.Context. This caused
a compile error: *service.VerificationService does not implement
handler.VerificationService.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.
M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.
Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs fixed:
- Docker Compose only mounted migration 000001; migrations 000002-000007
(profiles, agent groups, revocation, discovery, network scans) never ran,
breaking half the demo features. Now mounts all 7 migrations in order.
- Network Scans page crashed with pq.Array scan error because lib/pq
doesn't support []int, only []int64. Changed Ports field accordingly.
- Dashboard pie chart displayed "RenewalInProgress" without spaces.
Added formatStatus() helper for PascalCase → spaced display.
Also adds first-run demo experience improvements:
- 9 discovered certificates (filesystem + network scan mix)
- 3 discovery scans with recent timestamps
- 2 AwaitingApproval renewal jobs for approval workflow demo
- CERTCTL_NETWORK_SCAN_ENABLED=true in Docker Compose
- Network scan targets seeded with last_scan results
- Version badge updated to v2.0.5
- Docs updated (quickstart, advanced demo) to reference seeded data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EST handler tests used fake PEM data (e.g., "MIIBmjCCAUCgAwIBAgIRATest")
which is invalid base64 (25 chars, not divisible by 4). Go's pem.Decode
fails silently, causing pemToDERChain to return "no certificates found"
and tests to get 500 instead of 200.
Added generateTestCertPEM() helper that creates a real self-signed ECDSA
P-256 certificate, used across all EST handler tests that need cert PEM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).
Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
issuer interface, resource lists, OpenSSL status)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Frontend: fetchJSON now returns empty object on 204 instead of failing
to parse empty body — fixes silent delete failures across all entities.
Added onError callbacks to owner/team delete mutations to surface errors.
Backend: owner and issuer delete handlers return 409 Conflict with
descriptive messages when FK constraints block deletion, instead of
generic 500.
Added 15 v2 dashboard screenshots, updated README screenshot section,
logo asset, page count references (18→full), and QA guide with FK
constraint test coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M21 adds server-side active TLS scanning of CIDR ranges with concurrent
probing, sentinel agent pattern for pipeline reuse, and full CRUD API for
scan targets. M22 adds Prometheus exposition format endpoint alongside
existing JSON metrics. Comprehensive documentation audit updates all docs
to reflect 91 endpoints, 19 tables, 6 scheduler loops, and 900+ tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused repository import from discovery_handler_test.go and
unused tests variable from discovery_test.go (replaced by testCases).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
M19: HTTP middleware records every API call to the immutable audit trail
with method, path, actor, SHA-256 body hash, status, and latency. Best-effort
async recording via goroutine. Health/ready probes excluded.
M16a: Four pluggable notifier connectors — Slack (incoming webhook), Teams
(MessageCard), PagerDuty (Events API v2), OpsGenie (Alert API v2). Each
enabled by config env var. 30 new tests across middleware and connectors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The reject job handler should accept nil/empty bodies (no reason given)
while still rejecting malformed JSON. Check for io.EOF and http.NoBody
to distinguish missing body from invalid body.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upgrade from Go 1.22 to 1.25 (minimum for MCP SDK, actively supported).
CI updated to match.
Codebase audit fixes:
- Local CA parseIP() now uses net.ParseIP — IP SANs no longer silently dropped
- Nil pointer guards in agent.go GetWorkWithTargets for target/cert enrichment
- MCP CreateCertificateInput marks owner_id/team_id as required
- NGINX connector uses CombinedOutput() — captures diagnostic output on failure
- Jobs handler validates JSON decode on rejection body — returns 400 on malformed
- CRL/OCSP handlers propagate requestID for error tracing
MCP server tests (26 tests):
- client_test.go: HTTP client coverage (GET/POST/PUT/DELETE, auth, 204, errors, binary)
- tools_test.go: tool registration, pagination, end-to-end flows with mock API
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The error routing only checked for "issuer not found" but not
"certificate not found", causing cert-not-found errors to fall
through to a generic 500. Broadened the check to match any
"not found" error string for both CRL and OCSP handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>