Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
5c01c7f. The work lands as a single commit spanning server, docs, tests,
and the React client.
M-002 — Named API keys with per-key actor propagation
* Migration 000014 adds the 'api_keys' table (id, name, hash,
principal, role, created_at, last_used_at, disabled_at) so every
credential carries an identifiable principal instead of the
opaque 'anonymous'/'api-key' sentinel.
* Auth middleware now rotates through configured keys, performs
constant-time hash comparison, stamps 'last_used_at', and emits
an actor struct via contextWithActor(). The audit middleware,
bulk-revocation handler, approval handlers, and MCP tool layer
now read the principal off the context and persist it on every
audit_events row.
* Regression coverage:
- internal/api/middleware/audit_test.go — actor propagation,
principal redaction for disabled keys, anonymous fallback for
unauthenticated endpoints.
- internal/api/handler/bulk_revocation_handler_test.go,
job_handler_test.go — principal-on-audit assertions.
M-003 — Authorization gates (Phase B)
* Approval handler rejects self-approval / self-rejection with 403
when the actor principal equals the job's requested_by field.
* Bulk revocation is gated behind the 'admin' role; operators and
viewers receive 403.
* Regression coverage:
- internal/service/job_test.go — TestApproveJob_NotSelf,
TestRejectJob_NotSelf.
- internal/api/handler/bulk_revocation_handler_test.go —
TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.
M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
* Per RFC 8615, relying parties cannot reasonably be asked to
authenticate against the issuing certctl instance to retrieve
revocation material. CRL and OCSP move off the authenticated
'/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
GET /.well-known/pki/crl/{issuer_id}
Content-Type: application/pkix-crl (RFC 5280 §5)
GET /.well-known/pki/ocsp/{issuer_id}/{serial}
Content-Type: application/ocsp-response (RFC 6960)
* Non-standard JSON CRL shape is removed; only DER is served.
* Short-lived certificate exemption (profile TTL < 1h → skip
CRL/OCSP) is preserved; the response simply omits the serial.
* Routes are registered on the unauthenticated 'finalHandler' mux
in cmd/server/main.go alongside EST ('/.well-known/est/*') and
SCEP ('/scep'). Legacy authenticated paths return 404.
* Regression coverage:
- internal/api/handler/certificate_handler_test.go — content
type, DER parseability, 404 for unknown issuer.
- internal/api/handler/adversarial_path_test.go — unauthenticated
access asserted for CRL, OCSP, EST, SCEP.
- internal/api/router/router_test.go — route-table assertion
that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
mounted on the unauthenticated branch.
M-001 — Auto-closed by M-002
EST and SCEP were already registered on the unauthenticated
'finalHandler' mux; the router comment at
internal/api/router/router.go:247 now matches reality. The
adversarial-path tests above lock the behavior in.
Verification (all gates green):
* go vet ./... — clean
* go build ./... — ok
* go test -short ./... (55+ packages) — all pass
* web/ : npm test (225 Vitest tests) — all pass
* web/ : npx tsc --noEmit — clean
* grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
all intentional M-006 tombstone/relocation comments.
Documentation:
* coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
Fixed, with per-finding resolution paragraphs citing regression
test IDs. (Audit file lives outside this repo; see cowork root.)
* CLAUDE.md Project Status line updated with the auth-unification
closure note.
* docs/features.md, docs/architecture.md, docs/quickstart.md,
docs/concepts.md, docs/connectors.md, docs/test-env.md,
docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
— refreshed for the new '.well-known/pki/*' namespace and named
API keys.
* api/openapi.yaml — documents the new unauthenticated endpoints
and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.
.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
Addresses Medium finding M-4 in the audit report. The multi-stage
Dockerfiles previously had no ARG declarations for HTTP_PROXY,
HTTPS_PROXY, or NO_PROXY, so corporate-proxy environments silently
failed at 'npm ci' (frontend stage) and 'go mod download' (Go builder).
The npm retry idiom (`npm ci --include=dev || npm ci --include=dev`)
masked the failure because the upstream 'Exit handler never called!'
bug exits 0 despite the install crash.
Fix: thread HTTP_PROXY / HTTPS_PROXY / NO_PROXY ARGs through every
Docker build stage that performs network I/O, re-export them as ENV
with both upper- and lower-case aliases (apk/curl/npm read lowercase;
Go/Node read uppercase), and forward the host shell's environment via
`build.args:` in every compose file and `build-args:` in the release
workflow's docker/build-push-action steps. Defaults are empty strings
so un-proxied builds remain byte-identical to the pre-fix tree.
Scope: Dockerfile (frontend + Go builder stages), Dockerfile.agent
(Go builder stage), deploy/docker-compose.yml (server + agent),
deploy/docker-compose.dev.yml (server + agent), deploy/docker-compose.test.yml
(server + agent), .github/workflows/release.yml (both docker/build-push-action
v6 invocations). Zero Go, web, test, or runtime code changes. Zero
base-image changes. Existing npm `||` retry idiom and `ARG TARGETARCH`
preserved verbatim.
CWE-1173 (Improper Use of Validated Input) / CWE-16 (Configuration).
Verification:
- YAML parses clean across all four compose files and release.yml.
- yamllint -d relaxed: clean exit across all five YAML files.
- All six `build.args:` blocks expose HTTP_PROXY, HTTPS_PROXY, NO_PROXY
with default-empty ${VAR:-} substitution.
- Both release.yml docker/build-push-action steps expose the same
three keys sourced from ${{ secrets.HTTP_PROXY }}, etc.
- Dockerfiles contain 5 proxy ARG declarations total (Dockerfile has 2
stages × 3 ARGs = 6 lines, Dockerfile.agent has 1 stage × 3 ARGs = 3
lines); lowercase ENV aliases verified present in every stage.
- git diff --shortstat: 6 files changed, 117 insertions(+), 0 deletions.
Pure additive.
Docker-live verification (`docker build`, `docker compose config`)
deferred to CI / post-commit smoke because the sandbox has no Docker
runtime. hadolint, go, golangci-lint, govulncheck likewise unavailable
in the sandbox; per-layer CI coverage gates (service 55%, handler 60%,
domain 40%, middleware 30%) are trivially unaffected as M-4 touches
zero Go source files.
why-certctl.md said March 1, CHART_SUMMARY.md said March 28. The
LICENSE file is authoritative: Change Date is March 14, 2033.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrote docs/features.md from scratch as authoritative feature inventory
(1255 lines, every claim verified against source files).
Audited README.md and architecture.md against repo — fixed 19 stale
references: K8s Secrets status, issuer counts, dashboard page counts,
CI thresholds, missing connectors in Mermaid diagrams, OpenAPI operation
count, GetCACertPEM behavior, and V2/V4 roadmap accuracy.
Also includes related fixes discovered during audit:
- Scheduler skips expired/failed/revoked certs from auto-renewal
- Seed demo expiry dates moved outside 31-day scheduler query window
- Agent pages use correct last_heartbeat_at field name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1717-line Go test file covering all 52 Parts of testing-guide.md against the
Docker Compose demo stack. ~120 automated subtests (API, DB, source, perf),
11 skipped Parts with reasons, ~270 manual gaps documented. Audited against
actual router, seed data, domain structs, and migrations — 8 factual bugs
caught and fixed during review. Companion guide at docs/qa-test-guide.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New deploy/ENVIRONMENTS.md: comprehensive walkthrough of all 4 compose
files with service-by-service explanations, beginner-friendly Docker
concepts, and expert-level networking/config details
- Fix docker-compose.dev.yml: agent LOG_LEVEL → CERTCTL_LOG_LEVEL (was
silently ignored without the CERTCTL_ prefix)
- Add CERTCTL_CONFIG_ENCRYPTION_KEY to base and test compose (enables
M34/M35 dynamic issuer/target config encryption)
- Add CERTCTL_DISCOVERY_DIRS to base compose agent (enables filesystem
certificate discovery in default deployment)
- Cross-link ENVIRONMENTS.md from README doc table and quickstart.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4-step wizard (Connect CA → Deploy Agent → Add Certificate → Done) shown
on fresh installs when no user-configured issuers or certificates exist.
Auto-seeded env var issuers (source="env") are excluded from first-run
detection. Wizard state latches to prevent query refetches from dismissing
it mid-flow. Split docker-compose into clean default (wizard-compatible)
and demo override (seed_demo.sql). Added missing migrations 000009/000010
to test compose.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactors deploy/test/run-test.sh into a typed Go test file with
crypto/x509 certificate parsing, eliminating fragile openssl text
scraping. 12 phases, 35 subtests covering Local CA, ACME, step-ca,
revocation, discovery, renewal, EST, S/MIME, and API spot checks.
- testClient HTTP helper with Bearer auth
- testDB PostgreSQL helper (port 5432 now exposed)
- waitFor/waitForJobsDone polling helpers
- crypto/x509 for EKU, KeyUsage, SAN verification
- crypto/tls for NGINX deployment verification
- //go:build integration tag (not in CI yet)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add S/MIME (emailProtection EKU) end-to-end test coverage:
- ValidateCommonName() now accepts email addresses for S/MIME certs
- S/MIME test profile (prof-test-smime) in seed data
- Phase 11 test: issuance, EKU, KeyUsage, email SAN verification
- EST config enabled in test Docker Compose
- Portable KeyUsage parsing (awk, works on BSD/GNU)
- Full test environment documentation (docs/test-env.md)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:
ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)
step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)
NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional
Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission
Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Default to "changeme" so helm lint and helm template pass with stock
values. Operators override at install time via --set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use gt (int .Values.server.replicas) 1 to avoid incompatible type
comparison between YAML integer and template literal
- Remove fail directive for empty apiKey — lint runs with defaults,
operators set the key via --set at install time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use type conversion DigestStatusCount(c) instead of struct literal
- Replace with...else-if (invalid in Go templates) with if...else-if chain
- Add *.bak and cmd/agent/*.key/*.pem to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs fixed:
- Docker Compose only mounted migration 000001; migrations 000002-000007
(profiles, agent groups, revocation, discovery, network scans) never ran,
breaking half the demo features. Now mounts all 7 migrations in order.
- Network Scans page crashed with pq.Array scan error because lib/pq
doesn't support []int, only []int64. Changed Ports field accordingly.
- Dashboard pie chart displayed "RenewalInProgress" without spaces.
Added formatStatus() helper for PascalCase → spaced display.
Also adds first-run demo experience improvements:
- 9 discovered certificates (filesystem + network scan mix)
- 3 discovery scans with recent timestamps
- 2 AwaitingApproval renewal jobs for approval workflow demo
- CERTCTL_NETWORK_SCAN_ENABLED=true in Docker Compose
- Network scan targets seeded with last_scan results
- Version badge updated to v2.0.5
- Docs updated (quickstart, advanced demo) to reference seeded data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- POSTGRES_PASSWORD and CERTCTL_API_KEY read from .env file
- Added deploy/.env.example with documentation
- Agent key volume (agent_keys) for key persistence across restarts
- Agent healthcheck via pgrep
- Resource limits: server 1CPU/512M, agent 0.5CPU/256M
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Private keys never leave agent infrastructure. Agents generate ECDSA P-256
key pairs locally, store them with 0600 permissions, and submit only the CSR
(public key) to the control plane. New AwaitingCSR job state pauses
renewal/issuance jobs until the agent submits its CSR. Server-side keygen
retained behind CERTCTL_KEYGEN_MODE=server for demo/development.
Key changes:
- Dual keygen mode via CERTCTL_KEYGEN_MODE (agent default, server for demo)
- AwaitingCSR job state with CommonName/SANs in work response
- Agent ECDSA P-256 keygen, local key storage, CSR-only submission
- CompleteAgentCSRRenewal server-side flow for agent-submitted CSRs
- DeploymentRequest.KeyPEM for agent-provided keys during deployment
- Dockerfile.agent creates /var/lib/certctl/keys with correct ownership
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runtime fixes:
- Fix env var mismatch (CERTCTL_DB_URL → CERTCTL_DATABASE_URL)
- Fix table name mismatches (certificates → managed_certificates, notifications → notification_events)
- Add renewal_policy_id to certificate queries
- Remove non-existent created_at from notification queries
- Add env var fallback for agent CLI flags
- Graceful degradation for missing notifiers/issuers in demo mode
- Copy web/ directory in Dockerfile for dashboard serving
Service layer:
- Implement handler-service interface pattern across all services
- Wire up certificate, agent, job, policy, team, owner, audit, notification services
Documentation:
- Add concepts.md: beginner-friendly guide to TLS, CAs, private keys
- Rewrite quickstart.md with accurate API examples matching actual handlers
- Add demo-advanced.md: interactive demo with cert issuance and automated script
- Update architecture.md with correct table names and connector interfaces
- Update connectors.md to match actual Go interface signatures
- Update demo-guide.md with cross-references to new docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>