certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 21:41:39 +00:00

Author	SHA1	Message	Date
shankar0123	7a9ae3157f	fix(seed): repair deployment_targets FK violation crashing fresh demo boot The Rank 5 cloud-target seed rows in `seed_demo.sql` referenced a non-existent `ag-server` agent_id. On every fresh-clone `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up` the server crash-looped at the demo-seed step: pq: insert or update on table "deployment_targets" violates foreign key constraint "deployment_targets_agent_id_fkey" Origin: commit `9a7e818` ("docs, seed: cloud-target operator runbook + AWS ACM / Azure KV demo seed rows") added the rows but didn't insert or rebind to a matching agents row. The `ag-server` ID never existed in seed_demo.sql or anywhere else. Fix: bind the two cloud targets to the existing cloud sentinel agents that were already inserted at lines 78-79 (alongside `cloud-gcp-sm`): - tgt-aws-acm-prod → cloud-aws-sm - tgt-azure-kv-prod → cloud-azure-kv These cloud sentinels were inserted in commit 9a7e818's same family specifically to back agentless cloud targets — exact semantic match. Why the existing test didn't catch this: TestRunDemoSeed_AppliesIdempotently in internal/repository/postgres/seed_test.go calls the same RunSeed + RunDemoSeed pair the server uses at boot, so it WOULD have caught the FK violation. But the test depends on a live PostgreSQL container via testcontainers-go and is gated under `testing.Short()` → the default `go test ./... -short` lane that `make verify` runs always skipped it. The dedicated integration lane that strips `-short` either wasn't run on commit `9a7e818` or the failure was missed. Promoting the test out from under `-short` is a separate hardening conversation (CI runs need docker-in-docker which isn't free); that's out of scope for this hotfix. Static FK audit confirms the fix: Defined agent IDs (12): ag-{data,edge-01,iis,k8s,lb,mac-dev, web-prod,web-staging}-prod, cloud-{aws-sm,azure-kv,gcp-sm}, server-scanner Referenced agent_id values in deployment_targets after fix: ag-data-prod, ag-edge-01, ag-iis-prod, ag-k8s-prod, ag-lb-prod, ag-web-prod, ag-web-staging, cloud-aws-sm, cloud-azure-kv Unresolved: zero. Acceptance gate (operator-side): - docker compose -f deploy/docker-compose.yml \ -f deploy/docker-compose.demo.yml up -d --build against a fresh clone — server boots clean within 30s, dashboard at https://localhost:8443 shows the seeded demo data. v2.0.71	2026-05-05 21:03:18 +00:00
shankar0123	1720e11109	docs: fix broken single-file demo invocation in README + qa-prerequisites + ENVIRONMENTS The README's Quick Start, the qa-prerequisites contributor doc, and the landing page (separate repo, separate commit) all shipped a copy-paste command that produces: service "certctl-server" has neither an image nor a build context specified: invalid compose project The bug landed silently with commit `a3d8b9c` (the U-3 master). Pre-U-3, docker-compose.demo.yml was self-contained and could be invoked with a single -f flag. U-3 deliberately reduced it to a 27-line overlay — its only payload today is `CERTCTL_DEMO_SEED=true` on the certctl-server service — because the demo seed now applies at boot via postgres.RunDemoSeed, not via /docker-entrypoint-initdb.d/. The overlay no longer carries an image: or build: of its own, so it MUST be passed alongside the base file. The README/qa-doc/landing-page never picked up the rename of the contract. Every operator who copy-pasted the Quick Start since U-3 has hit the "invalid compose project" error and bounced. The operator caught it running the demo locally today. This commit fixes the three certctl-repo sites: README.md (Quick Start) docker compose -f deploy/docker-compose.demo.yml up -d --build → docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build Plus the "drop the -f flag for clean install" prose now spells out the correct fallback (`-f deploy/docker-compose.yml` alone). docs/contributor/qa-prerequisites.md (Step 1) Same single-file → two-file fix, plus an inline note explaining why the override-only file requires the base (so the next person who reads it understands the contract instead of re-discovering it). deploy/ENVIRONMENTS.md (Demo Overlay → What it adds) Replaced the stale "One line: mounts seed_demo.sql into PostgreSQL's init directory" claim — that hasn't been true since U-3 — with the accurate "One env var: CERTCTL_DEMO_SEED=true; server applies seed_demo.sql at boot via postgres.RunDemoSeed" description, plus the historical context for why the overlay can't stand alone. The certctl.io landing page hits the same bug (line 759); fix shipping in a separate commit in that repo. Acceptance gate (manual): - copy/paste the new README Quick Start command end-to-end against a fresh clone — succeeds, dashboard at https://localhost:8443 shows the seeded demo data within ~30s. - clean-install fallback (`docker compose -f deploy/docker-compose.yml up -d --build`) starts a working stack with no demo data.	2026-05-05 20:55:26 +00:00
shankar0123	f40e975439	gui(certificates): surface profile contract in create-cert form (closes P3-3, P3-4, P3-5) Closes findings P3-3, P3-4, P3-5 from the 2026-05-05 CLI/API/MCP↔GUI parity audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). The audit flagged three "hidden defaults" in the create-certificate form: environment='production', shortLived=false, selectedEkus=['serverAuth']. Re-grounding against the live source: P3-3 was a false positive. The form already exposes an environment selector with three options (Production / Staging / Development) and defaults to Production. No change needed — covered by new test pin. P3-4 + P3-5 misread the architecture. allow_short_lived and allowed_ekus are NOT per-cert form-state fields; they are properties of the CertificateProfile that the operator binds via the existing Profile dropdown. Adding form-level toggles for them would contradict the profile-as-primitive design (the profile carries the policy contract — TTL, EKUs, key-algo allow-list, short-lived eligibility — so the cert can inherit a coherent set rather than letting operators hand-mix invalid combinations). The genuine UX gap was opacity: operators picked a profile without seeing what allow_short_lived / allowed_ekus the profile carried. This commit closes the spirit of the finding by surfacing the selected profile's load-bearing properties in a read-only "Profile contract" panel that appears below the Profile dropdown once a profile is selected. The panel shows: - allowed_ekus list (so operators see whether a profile is serverAuth, emailProtection, codeSigning, or a mix) - allow_short_lived flag (highlighted when true so operators know they're picking a profile that allows TTL < 1h CRL/OCSP-exempt certs per the M15b regime) - explanatory text that EKUs and short-lived eligibility are profile-level (not per-cert), guiding operators to edit the profile or pick a different one Test pins (web/src/pages/CertificatesPage.test.tsx): - environment selector renders with 3 options, defaults to production - environment selector toggles to staging / development on change - Profile contract panel is hidden until a profile is selected - Profile contract panel surfaces allowed_ekus when a TLS-server profile is picked - Profile contract panel surfaces emailProtection EKU when an S/MIME profile is picked (closes the "S/MIME flows can't be initiated from the GUI" sub-finding — they can, by picking an emailProtection profile) - Profile contract panel flags allow_short_lived=true when an IoT short-lived profile is picked (closes the "operators can't issue short-lived certs through the GUI" sub-finding — they can, by picking an allow_short_lived profile) Implementation notes: - data-testid='cert-form-environment' + 'cert-form-profile' + 'cert-form-profile-detail' added to make the test selectors stable across DOM-restructuring refactors. No production behaviour change from the test IDs. - No new dependencies; no form-library introduction (per the prompt's out-of-scope list); uses the existing bare React state pattern. - No API changes — Certificate.allowed_ekus / allow_short_lived already exist on the CertificateProfile type in web/src/api/types.ts. Acceptance gate (verified): - npm test on src/pages/CertificatesPage.test.tsx: 12/12 pass (6 pre-existing T-1 tests + 6 new P3-3..P3-5 pins). - All sibling page tests (AuditPage, TargetDetailPage, ShortLivedPage, etc.) still pass.	2026-05-05 19:49:59 +00:00
shankar0123	0e06f6c4fc	cli: promote --force on renew + require --reason on revoke (closes P3-1, P3-2) Closes findings P3-1 and P3-2 from the 2026-05-05 CLI/API/MCP↔GUI parity audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Both findings flagged hidden defaults that the CLI was sending without exposing them to operators: `force=false` baked into every renew payload, and a silent fallback to `reason="unspecified"` whenever --reason was omitted. P3-1 — promote --force on `certs renew` (full end-to-end plumbing) The pre-2026-05-05 CLI sent `{"force": false}` in the renew body. The API handler never decoded it — a textbook "lying field" per the operator's CLAUDE.md "complete path, not the easy path" rule: the body field stored a value, claimed to do something, and silently did nothing because the wire never reached the consumer. Adding a --force flag that also went unread would have created another lying field. This commit takes the complete path: service.CertificateService.TriggerRenewal grew a `force bool` parameter (internal/service/certificate.go). When force=true, the RenewalInProgress block is overridden so operators can recover stuck in-flight renewals where a previous job hung without releasing the status flag. Archived and Expired remain terminal blockers regardless of force — those are semantic dead-ends that --force should not paper over (archived = decommissioned, expired = issue a new cert instead of renewing a dead one). handler.CertificateHandler.TriggerRenewal parses force from ?force=true (or ?force=1) query param, OR {"force": true} JSON body, whichever the client picks. Defaults to false. Passes through to the service. internal/cli/client.go::RenewCertificate(id, force bool) sends ?force=true on the URL when --force is set. The historical hardcoded `{"force": false}` body is gone — no more lying field. cmd/cli/main.go dispatches `certs renew <id> [--force]` (ID-first flag-second convention matches the existing `agents retire <id> [--force]`). P3-2 — require --reason on `certs revoke` (Option A: strict refusal) The pre-2026-05-05 CLI dropped to `--reason unspecified` whenever the operator omitted the flag. Compliance reporting (RFC 5280 §5.3.1, PCI- DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful; silent fallback defeats the audit trail because every revocation looks identical. cmd/cli/main.go dispatch refuses to send when --reason is empty, prints the canonical RFC 5280 §5.3.1 reason-code menu, and exits non-zero. internal/cli/client.go exposes ValidRevokeReasons() returning the canonical camelCase list (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, removeFromCRL, privilegeWithdrawn, aaCompromise) and NormalizeRevokeReason() that accepts both camelCase and snake_case inputs and normalises to the canonical wire form. Off-list reasons are rejected at dispatch with the menu re-printed. Test pins: internal/cli/client_test.go::TestClient_RenewCertificate_ForceFlag — --force=true sends ?force=true with empty body; --force=false sends no query and no body. internal/cli/client_test.go::TestNormalizeRevokeReason + TestValidRevokeReasons — canonical-camelCase + snake_case + reject- off-enum behaviour. cmd/cli/dispatch_test.go::TestHandleCerts_Revoke_RequiresReason + TestHandleCerts_Revoke_RejectsUnknownReason + TestHandleCerts_Renew_ForceFlag — dispatch-layer pins for the same contracts. internal/api/handler/certificate_handler_test.go::TestTriggerRenewal_ ForceQueryParam — query-param passthrough (no-flag, force=true, force=1, force=false) flows through to the service-layer parameter. internal/service/certificate_test.go::TestTriggerRenewal_ ForceOverridesInProgress — force=false preserves the RenewalInProgress block; force=true clears it. Existing TestTriggerRenewal_Archived extended to assert force=true still blocks Archived (terminal-state guarantee). Docs: docs/reference/cli.md updated with the --force example for renew and the strict --reason semantics for revoke (including snake_case input acceptance). Acceptance gate (verified): - go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/... clean. - go vet ./... clean. - go test -short -count=1 ./... pass repo-wide. - bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178, OpenAPI 144, exceptions 36 — unchanged; we add parameter parsing, not routes). - gofmt -l clean.	2026-05-05 19:49:34 +00:00
shankar0123	ff75361553	mcp(coverage): add 34 tools across 7 domains to close 2026-05-05 parity audit P1 findings Closes findings P1-1..P1-35 from the 2026-05-05 CLI/API/MCP↔GUI parity audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Before this bundle, 35 operator-facing API endpoints had GUI surfaces but no MCP counterpart — operators using AI assistants for cert lifecycle work in regulated environments had to drop to curl for approve/reject, health-check acknowledgement, renewal-policy CRUD, network-scan triggering, discovery triage, intermediate-CA management, and job verification. Tool count: 87→121 in tools.go (+34), 6 unchanged in tools_est.go. Re-derive via grep -cE 'gomcp\\.AddTool\\(' internal/mcp/tools.go internal/mcp/tools_est.go. The 7 phases (matching the bundle prompt at cowork/mcp-coverage-expansion-prompt.md): Phase A — Approvals (P1-28..P1-31, 4 tools) list_approvals, get_approval, approve_request, reject_request. Two-person-integrity contract (ErrApproveBySameActor → HTTP 403) is preserved automatically: the decided_by actor is derived server-side from middleware.UserKey, NOT from request body, so the MCP server's authenticated API-key identity becomes the audit-trail actor. The MCP input schema deliberately omits any actor_id field to prevent client-side spoofing. Phase B — Health Checks (P1-20..P1-27, 8 tools) list, summary, get, create, update, delete, history, acknowledge. Mirrors the existing target-resource shape; acknowledge takes optional 'actor' string captured in the audit row (handler defaults to 'unknown' if absent). Phase C — Renewal Policies (P1-1..P1-5, 5 tools) Standard CRUD against /api/v1/renewal-policies. Distinct from the legacy 'policy' tools that point at the same path — these expose the renewal-policy domain explicitly with full alert_channels + alert_severity_map field shape. Phase D — Network Scan Targets (P1-14..P1-19, 6 tools) CRUD + trigger_scan. trigger_network_scan returns the discovery- scan body so the AI can chain into list_discovered_certificates filtered by agent_id. Phase E — Discovery read-side (P1-10..P1-13, 4 tools) list_discovered_certificates, get_discovered_certificate, list_discovery_scans, discovery_summary. Complements the pre-existing claim/dismiss tools (registered alongside Health historically per the I-2 closure). Phase F — Intermediate CAs (P1-6..P1-9, 4 tools) list, create (root + child via discriminator on body shape), get, retire. The handler is admin-gated via middleware.IsAdmin; the least-privilege boundary is enforced at the API layer (HTTP 403 for non-admin Bearer callers) — not by transport carve-out. Phase G — Verification + deployments (P1-32, P1-34, P1-35, 3 tools) list_certificate_deployments, verify_job, get_job_verification. P1-33 (POST /api/v1/agents/{id}/discoveries) is intentionally excluded — machine-to-machine push channel for agents reporting filesystem-scan results, not an operator-driven flow. Documented inline in the RegisterTools dispatch. Implementation: - 14 new input types in internal/mcp/types.go with jsonschema struct tags driving LLM tool discovery. - 7 register* functions in internal/mcp/tools.go each handling one phase, wired into RegisterTools dispatch in declaration order. - 34 new entries in tools_per_tool_test.go::allHappyPathCases — the existing in-process MCP harness (TestMCP_AllTools_HappyPath + TestMCP_AllTools_ErrorPath + TestMCP_RegisterTools_DispatchableToolCount) auto-extends coverage to cover every new tool: happy-path round- trip with fence-shape assertion, 5xx error-path with MCP_ERROR fence propagation, and 'every registered tool is dispatchable' guard. - docs/reference/mcp.md 'Available Tools' table expanded from 16 to 22 resource domains with current per-domain tool counts. Acceptance gate (verified): - go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/... clean across all four production binaries. - go vet ./... clean. - go test -short -count=1 ./internal/mcp/... pass (TestMCP_AllTools_* expanded to 127 tool round-trips). - go test -short -count=1 ./... pass repo-wide. - bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178, OpenAPI 144, exceptions 36 — unchanged; we add MCP wrappers, not routes). - gofmt -l clean across the four touched files.	2026-05-05 19:29:57 +00:00
shankar0123	e0aaa967c9	docs(README): add MCP server bullet to capabilities list The README's 'What it does' section enumerated 11 capability bullets (issuers / targets / ACME server / SCEP server / EST server / hierarchy / approvals / discovery / revocation / alerts) but had zero mention of the MCP server. The 2026-05-05 CLI/API/MCP ↔ GUI parity audit confirmed 93 MCP tools shipped today (87 in internal/mcp/tools.go + 6 in internal/mcp/tools_est.go) covering the full API surface. That's a real differentiator hidden from anyone landing on the README. Adds a 12th bullet positioning the MCP server with concrete example queries operators can ask their AI client (expiring certs, revoke with key-compromise reason, agent offline check). Frames the architectural facts: separate binary at cmd/mcp-server/, stateless stdio transport, no extra auth surface beyond the existing API key, no extra attack surface. Links to docs/reference/mcp.md for setup details.	2026-05-05 19:10:27 +00:00
shankar0123	17455d2ea2	deps(web): pin picomatch to >=4.0.4 via npm override; clears 4 dependabot alerts Dependabot flagged four picomatch vulnerabilities in web/package-lock.json: #8 GHSA-?, ReDoS via extglob quantifiers #9 GHSA-?, ReDoS via extglob quantifiers (related to #8) #10 CVE-2026-33672 / GHSA-3v7f-55p6-f55p, method injection via POSIX character classes (related; affecting < 2.3.2) #11 CVE-2026-33672 / GHSA-3v7f-55p6-f55p, method injection via POSIX character classes — same advisory as #10, separate Dependabot row because it surfaces against a second copy of picomatch in the dep tree All four close on the same fix: every resolved picomatch instance must be >= 4.0.4 (or >= 3.0.2, or >= 2.3.2 — the patch shipped on all three release lines). Pre-fix the lockfile carried at least two vulnerable copies: node_modules/picomatch v2.3.1 (vuln) node_modules/vitest/node_modules/picomatch v4.0.3 (vuln for #11) node_modules/vite/node_modules/picomatch v4.0.4 (ok) node_modules/tinyglobby/node_modules/picomatch v4.0.4 (ok) Reachability check before fixing: - picomatch is a build-time glob-matching tool (used by tailwindcss → readdirp/anymatch/micromatch chain, plus by vite + vitest internals). - All instances in our tree are dev=true. None are bundled into the React production output (web/dist/assets/*.js) — that's just the React SPA, no node_modules at runtime. - The CVE only affects code that processes UNTRUSTED glob patterns. Our build pipeline only globs operator-controlled file patterns (TSX source files, Tailwind 'content' globs). Not network-reachable. So the CVE was not reachable from any shipped certctl artefact. Fix anyway because the alerts are noise. Fix mechanism: add an npm 'overrides' entry pinning picomatch to ^4.0.4 across all consumers. npm collapses every transitive picomatch resolution to the override, so the lockfile shrinks from 4 picomatch entries to 1, all on v4.0.4 (patched). Verification: npm install --package-lock-only → up to date, 0 vuln npm audit → found 0 vulnerabilities Diff: 2 files, 7 insertions / 43 deletions (net negative — the override de-duplicates the picomatch tree). Closes: GHSA-3v7f-55p6-f55p, CVE-2026-33672 (alerts #10, #11) + the two related ReDoS picomatch alerts (#8, #9)	2026-05-05 18:40:10 +00:00
shankar0123	f2c77ba3fb	deps: bump testcontainers-go v0.35.0 → v0.42.0; drops docker/docker dep entirely (clears CVE-2026-34040) Dependabot flagged GHSA-x744-4wpc-v9h2 / CVE-2026-34040 (Moby AuthZ plugin bypass on oversized request bodies, incomplete fix for CVE-2024-41110) on the transitive github.com/docker/docker v27.1.1+incompatible pulled in via testcontainers-go v0.35.0. Reachability check before fixing: - certctl does not run dockerd or configure AuthZ plugins. - go list -deps ./cmd/{server,agent,cli,mcp-server}/... finds zero docker/docker references in any production binary's transitive set. - testcontainers is consumed only by *_test.go files under internal/repository/postgres/ + deploy/test/ for ephemeral Postgres containers. So the CVE was not reachable from any shipped certctl artefact. Bump anyway because Dependabot noise is noise; the upgrade is mechanical. Bumping testcontainers-go v0.35.0 → v0.42.0 (latest, 2026-04-09) removes the direct docker/docker dependency entirely — testcontainers v0.42.0 reorganized away from the Moby SDK. After 'go mod tidy', docker/docker is GONE from both go.mod and go.sum, not merely bumped. The Dependabot alert closes automatically on push. Co-bumped transitives (cascading from testcontainers' new dep tree): go.opentelemetry.io/otel v1.24.0 → v1.41.0 go.opentelemetry.io/otel/{metric,trace} v1.24.0 → v1.41.0 go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 → v0.60.0 go.opentelemetry.io/auto/sdk added @ v1.2.1 golang.org/x/crypto v0.45.0 → v0.48.0 golang.org/x/net v0.47.0 → v0.49.0 golang.org/x/sync v0.18.0 → v0.19.0 golang.org/x/sys v0.40.0 → v0.42.0 golang.org/x/text v0.31.0 → v0.34.0 Verification (all green): go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... \ ./cmd/mcp-server/... → exit 0 go test -run=NONE -count=1 ./internal/repository/postgres/ → ok go test -tags=integration -run=NONE -count=1 ./deploy/test/ → ok go vet ./internal/repository/postgres/... → clean go list -deps ./cmd/{server,agent,cli,mcp-server}/... \| grep docker → zero hits Diff: 2 files (go.mod, go.sum), 129 insertions / 144 deletions. Closes: GHSA-x744-4wpc-v9h2, CVE-2026-34040	2026-05-05 18:34:31 +00:00
shankar0123	d2b62880ce		2026-05-05 18:18:38 +00:00
shankar0123	75097909e9		2026-05-05 18:18:29 +00:00
shankar0123	7c5cc57d75		2026-05-05 15:39:08 +00:00
shankar0123	9acf609ac9	docs: convert ASCII flow diagram to Mermaid in test-environment.md Per operator audit: every diagram in docs/ should be Mermaid except in the repo-root README.md. The 'Key Generation Flow (Agent-Side)' section in docs/contributor/test-environment.md was rendered as a plain code fence with arrow-prose: Server creates job (AwaitingCSR) → Agent polls, sees job → Agent generates ECDSA P-256 key pair locally → ... That was the only non-Mermaid diagram-shaped block left in docs/. Converted to a Mermaid sequenceDiagram with 5 participants (certctl-server, issuer connector, certctl-agent, local agent FS, shared volume) covering the full AwaitingCSR → CSR-submit → Deployment-job → cert-write → Completed lifecycle. Audit + verification script: cowork/docs-audit-2026-05-05/mermaid-audit.md. Re-running the detection script post-fix returns zero non-Mermaid diagram-like blocks across all 76 docs/ markdown files. Total Mermaid coverage in docs/ now: 14 docs / 40 blocks.	2026-05-05 06:18:24 +00:00
shankar0123	622cd29f20	docs: factuality sweep — fix 3 broken links + 12 count claims (audit findings 2026-05-05) Per the cowork/docs-audit-2026-05-05/ end-to-end factuality audit (20 confirmed findings across 76 docs, 7 parallel subagents + audit-of-the-audit). Hot + Warm tier fixes ship here; STALE findings (qa-test-suite.md test-count snapshot) need 'make qa-stats' which is operator-side. BROKEN links repaired (3): - docs/reference/api.md L195: [Quick Start](quickstart.md) → ../getting-started/quickstart.md (404 pre-fix) - docs/reference/api.md L196: [Connector Guide](connectors.md) → connectors/index.md (Phase 4 rename, was 404 pre-fix) - docs/reference/protocols/scep-intune.md L377: [legacy-est-scep.md](legacy-est-scep.md) → scep-server.md (file was deleted in Phase 7 commit `e9b1510`) INCORRECT count claims repaired (12): - api.md L5 + L18-19 + L155: '78 API operations' / '# 78' / 'all 78 documented operations' → re-derive via grep -cE '^\s+operationId:' (actual at HEAD: 144) - architecture.md L66 (Mermaid label) + L502 + L1047 + L1253: '8 always-on + 4 optional loops' / '12-loop topology' → 9 always-on + 5 opt-in loops (14 total). Always-on/opt-in breakdown derived from cmd/server/main.go startup wiring: always-on are agentHealthCheck, crlGeneration, jobProcessor, jobRetry, jobTimeout, notificationProcess, notificationRetry, renewalCheck, shortLivedExpiryCheck (9); opt-in are networkScan, digest, healthCheck, cloudDiscovery, acmeGC (5). Re-derive count via grep -cE '^func $s \Scheduler$ [a-zA-Z]+Loop' internal/scheduler/scheduler.go. - configuration.md L31: '12 loops, 8 always-on + 4 opt-in' → '14 loops, 9 always-on + 5 opt-in'. Self-introduced regression from commit `3275f9f` (2026-05-05). - mcp.md L11 + L65: 'all 78 API endpoints' / '78 available tools' → re-derive via grep -cE 'mcp\.AddTool\(' (actual at HEAD: 87 MCP tools, 144 API operations). - connectors/index.md L111: '9 built-in' issuer connectors → '12 built-in', extending the inline enumeration to include Entrust, GlobalSign, EJBCA (which had been added since the L111 prose was written). Local-CA framing extended to mention tree mode + ADCS sub-CA mode-doc. - connectors/index.md L112: '14 built-in' target connectors → '15 built-in', adding AWS ACM target + Azure Key Vault target (which had been added since the L112 prose was written). - why-certctl.md L37 + the inline list: 'Nine issuer connectors ship today' → 'Twelve issuer connectors', adding AWS ACM PCA, Entrust, GlobalSign, EJBCA to the list and removing the misleading 'EST enrollment' bullet (EST is a protocol surface, not an issuer; clarified in trailing note). - why-certctl.md L66: '13 deployment targets' → '15', adding Kubernetes Secrets, AWS ACM, and Azure KV to the inline list. - why-certctl.md L92: 'supports 9 issuer types' → '12 issuer types'. - quickstart.md L135: '35 demo certificates across 5 issuers' → re-derive cert count via 'grep -oE "mc-[a-z0-9_-]+" migrations/seed_demo.sql \| sort -u \| wc -l' (actual: 32, matches README L86; quickstart was off-by-3). - quickstart.md L452 (Demo Data Reference table): Certificates '35' → '32' (matches the cert count from seed_demo.sql). Verification: - grep confirms no remaining stale refs across the touched files (8 files, 31 insertions / 28 deletions). - All 24 ci-guards/.sh pass locally. - The audit's STALE findings (S-1, S-2 qa-test-suite.md Bundle-P snapshot) are operator-side: run 'make qa-stats' to refresh the Test Suite Health table. Companion: cowork/docs-audit-2026-05-05/RESULTS.md captures the full audit with subagent false positives and missed findings called out.	2026-05-05 06:15:35 +00:00
shankar0123	d809874fa1	docs: retire compliance subtree + sweep framework name-drops from prose Per operator decision the framework-mapping docs are gone. They were aspirational (no audit, no certification, no validated mapping); keeping them around was misleading. Files deleted (1,883 lines): - docs/compliance/index.md - docs/compliance/soc2.md - docs/compliance/pci-dss.md - docs/compliance/nist-sp-800-57.md Hyperlinks removed: - README.md: 'Auditor / compliance' row in the doc table; the '(compliance mapping included)' parenthetical in the positioning paragraph - docs/README.md: the '## Compliance' section table; the 'Auditor / compliance team' reading-order-by-role row Prose name-drops swept across 24 files: - README.md: 'FedRAMP boundary CAs / financial-services policy CAs' → '4-level boundary CAs / 3-level policy CAs'; 'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA' → cut entirely - getting-started/{quickstart,concepts,examples,why-certctl, advanced-demo}.md: 'compliance' → 'audit' / 'policy'; 'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut; ''pci': 'true'' tag example → ''environment': 'production'' - migration/cert-manager-coexistence.md: 'compliance rules' → 'policy rules' - operator/approval-workflow.md: 'Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' → 'Operators'; entire 'Compliance control mapping' table (PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1 / HIPAA §164.308(a)(4)) deleted; 'compliance contract' → 'two-person-integrity contract'; 'compliance auditors' → 'reviewers' - operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5' audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5 attestation' section retitled to 'TLS posture summary' and rewritten without framework framing; 'PCI-DSS, NIST, and major browsers will eventually deprecate TLS 1.2' → 'Major browsers and OS vendors will eventually deprecate TLS 1.2' - operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref → CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS Req 4 v4.0 prose footing → cut - operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI procurement-team deliverable' → 'on-call deliverable'; 'compliance auditors' → 'reviewers' - reference/connectors/{acme,aws-acm,azure-kv,globalsign, local-ca,openssl,ssh,index}.md: 'compliance reporting (PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting'; 'Compliance environments (PCI-DSS Level 1, FedRAMP High, HIPAA)' → 'Regulated environments'; 'compliance audits' → 'audit'; 'FedRAMP boundary CA' pattern names → '4-level boundary CA' (technically descriptive) - reference/protocols/est.md: 'compliance-hook seam' → 'device-state hook seam'; 'compliance gating' → 'device-state gating'; 'est_compliance_failed' → 'est_device_state_failed' - reference/protocols/scep-intune.md: 'Optional compliance check' → 'Optional device-state check'; failure-counter 'compliance_failed' → 'device_state_failed'; 'Conditional Access compliance gating' → 'Conditional Access device-state gating' - reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA deployments where the regulator requires...' → 'Boundary-CA deployments where you want separation of policy and issuing authorities'; pattern A retitled '4-level FedRAMP boundary CA' → '4-level boundary CA' - reference/architecture.md: broken Related-docs link to compliance.md removed; the rest of that block had stale pre-Phase-2 paths (quickstart.md, demo-advanced.md, connectors.md, openapi.md, testing-guide.md, test-env.md) — retargeted to current locations - reference/deployment-model.md: 'SOC 2 evidence-report generator' → 'Audit-evidence report generator' - reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this into evidence packs' → 'reviewers paste this into vendor-evaluation packs' - contributor/qa-test-suite.md: 'compliance exist' coverage description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)' risk-class label → 'Audit-relevant' What was kept: - CWE references (legitimate technical pointers) - Microsoft API/feature names that happen to use 'compliance' literally ('Microsoft Graph compliance API', 'device-compliance validators' — these are MS product names, not framework name-drops) - 'NIST PQC' on the landing page (Post-Quantum Cryptography is the actual NIST standard family, not a compliance framework) Verified: zero hyperlinks into docs/compliance/ remain. All 24 ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean. Net diff: 26 files / -1,883 deletions in compliance/ + -32 net across the prose sweep. Companion edits in cowork/ (CLAUDE.md doc-tree summary + WORKSPACE-CHANGELOG.md retirement note) land separately.	2026-05-05 05:26:44 +00:00
shankar0123	5ea8fb48eb	ci: restore +x bit on scripts/ci-guards/.sh (sandbox stripped exec bit) Pure mode-change commit. The previous `3275f9f` commit dropped the executable bit (100755 → 100644) on five files in scripts/ci-guards/ plus scripts/qa-doc-seed-count.sh and scripts/dev-setup.sh — a sandbox-tooling artefact, not intentional. The CI pipeline calls each guard via 'bash "$g"' so the missing exec bit didn't break anything operationally, but operators who run a guard directly via './scripts/ci-guards/<id>.sh' would hit a permission-denied. Restore to 100755 to match the rest of scripts/ci-guards/.sh. No content changes.	2026-05-05 04:56:43 +00:00
shankar0123	3275f9f1e0	ci: post-Phase-2-docs-overhaul cleanup of stale guards + missing config doc CI run on the `ecb8896` push surfaced two real failures rooted in the 2026-05-04 docs overhaul: 1. G-3 env-docs-drift caught two phantom CERTCTL_* env vars I'd introduced in the Phase 4 follow-on connector pages (CERTCTL_CA_CERT_PATH_NEW in adcs.md was a placeholder I made up; CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS in ejbca.md does not exist in source). Both removed. 2. QA-doc Part-count drift guard tried to grep docs/qa-test-guide.md and docs/testing-guide.md, both of which were renamed/deleted in Phase 2/Phase 5. The Part-count drift class died with testing-guide.md (Phase 5 prune dispersed its content); the seed-count drift class is still live but pointed at the wrong path. Fixes: - Removed the QA-doc Part-count drift guard from ci.yml (premise dead) plus its standalone scripts/qa-doc-part-count.sh peer. - Retargeted the QA-doc seed-count drift guard from docs/qa-test-guide.md → docs/contributor/qa-test-suite.md (the Phase 2 target). Updated both ci.yml inline copy and scripts/qa-doc-seed-count.sh. - Updated Makefile qa-stats: target to drop the testing-guide.md Parts metric (file is gone). - Updated Makefile verify-docs: target to drop the part-count step. G-3 was also failing in the second direction (env vars defined in config.go but never documented anywhere). 16 vars surfaced — features.md (deleted Phase 6) and testing-guide.md (deleted Phase 5) had been their canonical home. Created docs/reference/configuration.md as the new home: a compact operator-facing env-var reference covering scheduler intervals, job lifecycle, rate limiting, audit, deploy verify, database, agent-side, and SCEP profile binding. Added to docs/README.md Reference table. Doc-side updates to qa-test-suite.md to reframe its references to the deleted testing-guide.md (it's now self-contained: the Part-by-Part Coverage Map IS the canonical Part inventory). Cosmetic comment-only updates in ci.yml + scripts/ci-guards/.sh + scripts/dev-setup.sh to point at the new audience-organized doc paths (docs/operator/security.md, docs/operator/tls.md, docs/reference/architecture.md, etc.) instead of the pre-Phase-2 flat layout. Verified: all 24 ci-guards/.sh pass locally; qa-doc-seed-count.sh clean. Net diff: 178 additions / 112 deletions across 13 files. One file deleted (qa-doc-part-count.sh) and one file added (docs/reference/configuration.md).	2026-05-05 04:56:26 +00:00
shankar0123	ecb8896b1c	docs: cleanup pre-existing broken links in connector pages Phase 4 structural (commit `633e440`) moved 6 connector files into the new docs/reference/connectors/ subdirectory but didn't update all inter-doc references for the new path layout. Phase 11 caught the high-traffic ones; this sweep gets the rest, found by the Phase 4 follow-on verification pass. Mappings applied (relative to docs/reference/connectors/): deployment-atomicity.md → ../deployment-model.md deployment-vendor-matrix.md → ../vendor-matrix.md architecture.md → ../architecture.md est.md → ../protocols/est.md scep-intune.md → ../protocols/scep-intune.md async-polling.md → ../protocols/async-ca-polling.md quickstart.md → ../../getting-started/quickstart.md demo-advanced.md → ../../getting-started/advanced-demo.md legacy-est-scep.md → ../protocols/scep-server.md connectors.md → index.md Plus prose backtick references (`docs/architecture.md` etc.) updated to the new subdirectory paths. Files touched: apache, f5, iis, k8s, nginx, index. 33 line changes. Full link-check across docs/reference/connectors/*.md is now clean (0 broken inter-doc references).	2026-05-05 04:10:09 +00:00
shankar0123	f179eab071	docs: expand docs/README.md connectors section to enumerate all 28 deep-dive pages After the Phase 4 follow-on (commits `fd94205` → `de06141` → `082b8cf` → `969853e`), the docs/reference/connectors/ tree carries 13 issuer per-pages + 15 target per-pages alongside the index. Update the top-level docs navigation to surface them all. Replaced the previous 5-row connectors table with two single-paragraph indexes (issuers, targets) listing every per-page in alphabetical order. The connectors index.md is still the canonical catalog (interfaces, registry, scanners + inline reference per built-in); the deep-dive pages cover operator-grade material on top. Net: docs/README.md gains coverage of 23 new pages without bloating the file (two prose paragraphs vs a 28-row table).	2026-05-05 04:08:08 +00:00
shankar0123	969853ee53	docs: Phase 4 follow-on batch 4 — 5 final target per-pages Extracts the remaining target connectors: - ssh.md (194 lines) — agentless SSH/SFTP deploy with full host-key-acceptance threat model (what's accepted, what's not, mitigations including known_hosts enforcement and SSH cert auth); V3-Pro forward path - wincertstore.md (118 lines) — non-IIS Windows services via local PowerShell or WinRM proxy mode; store selection (My / Root / WebHosting); private-key permissions guidance - jks.md (189 lines) — JKS / PKCS#12 via keytool with full atomic snapshot+rollback contract (Bundle 8 'snapshot → delete → import → reload'), keytool argv password exposure threat model + mitigations - aws-acm.md (208 lines) — ACM target with full IAM policy, IRSA / instance-profile / SSO auth recipes, atomic-rollback contract, ALB attachment Terraform recipe, procurement-checklist crib - azure-kv.md (195 lines) — Key Vault target with managed-identity / workload-identity / service-principal auth recipes, version- semantics rollback caveat (no in-place restore without soft-delete), App Gateway / Front Door attachment recipe Index forward-list expanded to enumerate all 15 target connectors (5 from Phase 4 structural + 5 from batch 3 + 5 from this batch) in alphabetical order. This is part 4 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 904 lines. No content removed from index.md. End-state of Phase 4 follow-on: - 13 issuer per-pages (5 batch 1 + 8 batch 2) - 15 target per-pages (5 Phase 4 structural + 5 batch 3 + 5 batch 4) - index.md keeps its inline reference content; per-pages add operator depth on top, matching the pattern set by apache/f5/iis/k8s/nginx in Phase 4 structural	2026-05-05 04:07:21 +00:00
shankar0123	082b8cf660	docs: Phase 4 follow-on batch 3 — 5 file-based target per-pages Extracts the file-based deploy target connectors: - haproxy.md (107 lines) — combined-PEM (cert+chain+key) deploy with haproxy -c validate; multi-frontend + crt-list directory guidance - traefik.md (105 lines) — file-provider zero-reload deploy; file watcher latency notes; mixing with built-in ACME guidance - caddy.md (100 lines) — admin API mode (recommended) vs file mode; admin-API exposure threat model - envoy.md (112 lines) — file SDS mode (recommended) vs static bootstrap; service-mesh interactions - postfix.md (175 lines) — dual-mode (Postfix MTA / Dovecot IMAPS) connector with daemon-specific quirks (STARTTLS chain expectations, no shared session cache); Bundle 11 test pins Index forward-list expanded to enumerate all 10 target connectors (5 from Phase 4 structural + 5 from this batch) in alphabetical order. This is part 3 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 599 lines. No content removed from index.md.	2026-05-05 04:02:25 +00:00
shankar0123	de06141ce5	docs: Phase 4 follow-on batch 2 — 8 remaining issuer per-pages Extracts the rest of the issuer per-connector deep-dive pages: - local-ca.md (170 lines) — Local CA self-signed / sub-CA / tree mode, CRL+OCSP endpoints, EKU support, MaxTTL enforcement, L-014 file-on- disk threat model carve-out - acme.md (235 lines) — RFC 8555 v2 client (HTTP-01 / DNS-01 / DNS-PERSIST-01), ARI per RFC 9773, EAB + ZeroSSL auto-EAB, Let's Encrypt profile selection, revoke-by-serial Top-10 fix #7 - step-ca.md (99 lines) — Smallstep JWK-provisioner synchronous issuance with MaxTTL enforcement - openssl.md (157 lines) — script-based shell-out with full threat model (what's accepted, what's not, mitigations, V3-Pro forward path) - sectigo.md (98 lines) — Sectigo SCM REST with bounded async polling - google-cas.md (89 lines) — GCP managed private CA with OAuth2 service-account auth + IAM-role guidance - entrust.md (96 lines) — Entrust CA Gateway mTLS-authenticated with approval-pending support and mTLS keypair caching - globalsign.md (122 lines) — Atlas HVCA dual auth (mTLS + API key/secret), region-aware base URLs, mTLS keypair caching Index forward-list expanded to enumerate all 13 issuer connectors (including the 5 pages from batch 1) in alphabetical order. This is part 2 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 8 files, 1,066 lines. No content removed from index.md.	2026-05-05 03:59:35 +00:00
shankar0123	fd94205cfa	docs: Phase 4 follow-on batch 1 — 5 issuer per-pages Extract the first 5 issuer per-connector deep-dive pages: - vault.md (128 lines) — Vault PKI synchronous issuance, token TTL + auto-renewal loop, MaxTTL enforcement, rotation playbook - digicert.md (106 lines) — CertCentral DV/OV/EV with bounded async polling for vetting workflows - aws-acm-pca.md (165 lines) — managed private CA on AWS with full IAM policy, IRSA wiring, troubleshooting matrix - ejbca.md (116 lines) — open-source / Keyfactor EJBCA with mTLS or OAuth2 auth, mTLS keypair caching, approval-pending guidance - adcs.md (111 lines) — Active Directory Certificate Services as enterprise root via Local CA sub-CA mode, sub-CA rotation playbook Index updated with forward-list entries and the index-purpose blurb revised so the index now positions itself as 'navigate from here; deeper material lives in siblings' rather than 'docs to be extracted later'. Each per-page follows the WHAT/HOW/WHY pattern: what the connector is, how authentication and issuance work, and when to choose this vs an alternative. Cross-links to the connector index, async-ca-polling primitive, and adjacent operator runbooks. This is part 1 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 626 lines. No content removed from index.md (the index keeps its inline reference; per-pages add operator depth on top, matching the pattern set by apache/f5/iis/k8s/nginx in Phase 4 structural).	2026-05-05 03:53:52 +00:00
shankar0123	b452013dd9	docs: Phase 5 — testing-guide.md prune (8268 → 0 lines, content dispersed) Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/ and the section-by-section plan in testing-guide-tumor.md. testing-guide.md was 30% of all docs/ content (8268 lines) but was integration test code written in markdown, not operator documentation. The audit's tumor analysis disposed of every Part: - ~65% DELETE (test cases that already exist in code) - ~22% MOVE to inline test code - ~8% KEEP-COMPRESSED into focused operator-runbook docs - Title + contents + release sign-off ~5% KEEP This commit ships the KEEP-COMPRESSED dispersal: docs/contributor/qa-prerequisites.md (NEW, ~120 lines): From testing-guide.md "Prerequisites" section. Stack boot procedure, demo data baseline, reference IDs operators reuse across QA docs. docs/contributor/gui-qa-checklist.md (NEW, ~105 lines): From testing-guide.md "Part 35: GUI Testing". Manual GUI verification pass for release sign-off. 25-row table covering every dashboard page. docs/contributor/release-sign-off.md (NEW, ~130 lines): From testing-guide.md "Release Sign-Off" section (originally 1009 lines of per-test detail tables). Compressed to a release-day checklist organized by gate category: code state, automated gates, manual QA passes, release artefact verification, branch protection, post-release. docs/operator/performance-baselines.md (NEW, ~100 lines): From testing-guide.md "Part 39: Performance Spot Checks". Four operator-runnable benchmarks (API request handling, inventory list pagination, scheduler tick, bulk revoke) with baseline numbers and when-to-re-baseline guidance. docs/operator/helm-deployment.md (NEW, ~120 lines): From testing-guide.md "Part 52: Helm Chart Deployment". Operator runbook for the bundled deploy/helm/certctl/ chart: prereqs, install, four cert-source patterns, verify, upgrade, troubleshooting. docs/reference/cli.md (NEW, ~120 lines): From testing-guide.md "Part 28: CLI Tool". certctl-cli command reference with command-group breakdown, common workflows (list/filter, renew, revoke, bulk import, EST enrollment, status), output formats, CI/CD integration patterns. docs/README.md navigation index updated to include the 6 new docs: Reference section gains: cli.md, release-verification.md (was added in Phase 13) Operator section gains: helm-deployment.md, performance-baselines.md Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md, release-sign-off.md docs/testing-guide.md deleted. Git history preserves the 8268 lines — if any specific test case is found missing from inline test code or the destination docs during future work, lift from `git show HEAD~1:docs/testing-guide.md`. Net: docs/ total line count drops by ~7700 lines (28%), from 26,369 to 18,742. testing-guide.md was the single largest doc; pruning it is the single biggest content-edit win of the entire restructure. Phase 5 is the last major content phase. Remaining: Phase 4 follow-on (per-connector page extractions from reference/connectors/index.md), Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).	2026-05-05 03:38:54 +00:00
shankar0123	fd4eb3b165	docs: Phase 11 follow-on — fix remaining anchor + cross-dir links Final cleanup pass after the previous Phase 11 commits. Catches the anchor-bearing and cross-directory links that earlier sed passes missed: docs/reference/protocols/acme-server.md (3 fixes): (./tls.md) → (../../operator/tls.md) (./architecture.md) → (../architecture.md) (./architecture.md#agents) → (../architecture.md#agents) docs/migration/from-certbot.md (1 fix): (./quickstart.md#network-discovery-agentless) → (../getting-started/quickstart.md#network-discovery-agentless) docs/migration/cert-manager-coexistence.md (1 fix): (./architecture.md#agents) → (../reference/architecture.md#agents) After this commit, the Phase 11 sweep is functionally complete for the operator-facing surfaces. Remaining valid sibling links (`(./<name>.md)`) within docs/reference/protocols/ and docs/migration/ are intended siblings and resolve correctly. The remaining open Phase 11 items are: - testing-strategy.md → testing-guide.md link, still valid because testing-guide.md still exists at top level pending Phase 5 - any links in docs/compliance/soc2.md and docs/compliance/nist-sp-800-57.md if they reference moved docs (low traffic; revisit if Phase 4 follow-on or Phase 5 work surfaces them)	2026-05-05 03:32:09 +00:00
shankar0123	a364cd6990	docs: Phase 11 follow-on — fix anchor-bearing + remaining inter-doc links Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Sweeps the anchor-bearing inter-doc links that the previous Phase 11 sed pass missed (anchors after .md# weren't matched), plus a few remaining cross-refs in docs/reference/. Per source file: docs/migration/acme-from-caddy.md (1 anchor link): (./acme-server.md#certificate-readyfalse-with-rejectedidentifier) → (../reference/protocols/acme-server.md#certificate-readyfalse-...) docs/migration/acme-from-cert-manager.md (3 anchor links): Same shape; all (./acme-server.md#...) → (../reference/protocols/acme-server.md#...) docs/reference/connectors/index.md (5 walkthrough + reference links): (./acme-server.md) → (../protocols/acme-server.md) (./acme-server-threat-model.md) → (../protocols/acme-server-threat-model.md) (./acme-cert-manager-walkthrough.md) → (../../migration/acme-from-cert-manager.md) (./acme-caddy-walkthrough.md) → (../../migration/acme-from-caddy.md) (./acme-traefik-walkthrough.md) → (../../migration/acme-from-traefik.md) docs/reference/protocols/acme-server.md (3 walkthrough links): (./acme-cert-manager-walkthrough.md) → (../../migration/acme-from-cert-manager.md) (./acme-caddy-walkthrough.md) → (../../migration/acme-from-caddy.md) (./acme-traefik-walkthrough.md) → (../../migration/acme-from-traefik.md) docs/reference/protocols/acme-server-threat-model.md (1 cross-dir): (./tls.md) → (../../operator/tls.md) After this commit, every grep for old-style `./<old-doc-name>.md` links returns clean across docs/migration/, docs/reference/, and docs/operator/.	2026-05-05 03:31:47 +00:00
shankar0123	12d7b1f51d	docs: Phase 11 follow-on — fix inter-doc cross-references in deeper subdirs Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Continuation of Phase 11 (commit `dca1900` handled README + first round of docs/ links). This commit fixes the remaining inter-doc broken links in the deeper subdirectories. Per source directory: docs/getting-started/quickstart.md (1 fix): (connectors.md) → (../reference/connectors/index.md) docs/contributor/test-environment.md (2 fixes): (tls.md) → (../operator/tls.md) (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) docs/contributor/testing-strategy.md (4 fixes): `docs/security.md` → `docs/operator/security.md` (security.md) → (../operator/security.md) `docs/testing-guide.md` (kept; testing-guide.md still at top level pending Phase 5 prune) (testing-guide.md) → (../testing-guide.md) docs/migration/acme-from-traefik.md (2 sites, multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/cert-manager-coexistence.md (1 fix): (./quickstart.md) → (../getting-started/quickstart.md) docs/migration/from-acmesh.md (2 fixes): (connectors.md) → (../reference/connectors/index.md) (./examples.md) → (../getting-started/examples.md) docs/migration/acme-from-caddy.md (multi-link): (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md) (./acme-server.md) → (../reference/protocols/acme-server.md) docs/migration/acme-from-cert-manager.md (multi-link): (./acme-server.md) → (../reference/protocols/acme-server.md) (./acme-server-threat-model.md) → (../reference/protocols/acme-server-threat-model.md) (./acme-caddy-walkthrough.md) → (./acme-from-caddy.md) (./acme-traefik-walkthrough.md) → (./acme-from-traefik.md) docs/migration/from-certbot.md (2 fixes): (./concepts.md) → (../getting-started/concepts.md) (./examples.md) → (../getting-started/examples.md) docs/operator/tls.md (3 sites): (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md) (quickstart.md) → (../getting-started/quickstart.md) (test-env.md) → (../contributor/test-environment.md) docs/operator/runbooks/disaster-recovery.md (5 fixes): (crl-ocsp.md) → (../../reference/protocols/crl-ocsp.md) (tls.md) → (../../operator/tls.md) (security.md) → (../../operator/security.md) (scep-intune.md) → (../../reference/protocols/scep-intune.md) (est.md) → (../../reference/protocols/est.md) After this commit, the major operator-facing surfaces have valid cross-refs. Some lower-traffic docs (compliance/soc2.md, compliance/ nist-sp-800-57.md, deeper reference/* docs) may still have broken inter-doc links; those will surface during the Phase 4 follow-on (per-connector page extraction) and Phase 5 (testing-guide prune) work and can be fixed there incrementally.	2026-05-05 03:31:05 +00:00
shankar0123	19c8fafe84	docs: Phase 14 — Last reviewed line sweep across docs/ Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Adds a `> Last reviewed: 2026-05-05` line right after the H1 heading of every doc that didn't already have one (41 files). This dates the freshness clock for the future Phase 4 per-doc review. The discipline going forward: when a doc's content gets a meaningful edit, bump the date. When the date gets old (e.g., >6 months), the doc earns a freshness-review pass. Mechanical insertion via awk one-liner, applied to every docs/*.md that didn't already match `grep -q 'Last reviewed:'`. Files that already carried the line from earlier Phase 2 work (the navigation index, the new connector docs, the new SCEP server / legacy-clients- TLS-1.2 / release-verification docs, and the 5 per-connector deep dives) were skipped to avoid duplicate insertion. Net: every doc in docs/ now has a Last reviewed line.	2026-05-05 03:26:46 +00:00
shankar0123	426760d737	docs: Phase 13 — README rewrite to 250-line target Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. README went from 457 lines to a target of 250 (operator decision in Phase 1 conversation). Focus shifts from feature-catalog + landing-page duplicate to "developer cloning the repo needs orientation + quickstart + entry points to docs." What stayed: - Logo + title + badges (~15 lines) - Elevator paragraph + 47-day cliff context (3 paragraphs, compressed) - Active-maintenance callout - Documentation table — restructured from 22 entries linking to flat docs/ to ~6 audience-organized rows linking through the new docs/README.md navigation index - Screenshots grid (4 tiles) - "What it does" — compressed from 33 lines of prose to 8 capability bullets, each linking to the canonical doc - Architecture paragraph — compressed to one paragraph linking to docs/reference/architecture.md - Quick Start (Docker Compose, Agent install, Helm, container images) - Examples table (5 turnkey scenarios) - Development commands - License paragraph - Dependencies block - Footer CTA What got moved out: - Cosign verification / SLSA / SBOM section (67 lines) → docs/reference/release-verification.md (NEW). README links to it in a 3-line "Verifying a release" section. What got removed entirely: - "Why certctl" + "Architecture" + "Security-first" + "Key design decisions" prose walls — duplicated landing page + architecture.md + security.md content. README no longer wades through 11 dense paragraphs. - "Supported Integrations" 4 sub-tables (Issuers / Targets / Protocols / Standards / Notifiers, ~80 lines of dense per-row marketing copy) — content lives at docs/reference/connectors/index.md and docs/reference/protocols/. README mentions counts ("12 issuers, 15 targets, 6 notifiers") with a single link. - "Roadmap" section entirely — V1 + V2 history rotted fastest of any section; replaced with implicit "see Releases + Issues for active work" via the existing footer CTA. - "What It Does" 10-subsection wall (33 lines) — replaced with the 8-bullet capability list, each linking to its canonical doc. - CLI section (20 lines of inline command examples) — links to the contributor docs. - MCP Server section (30 lines of setup) — links to docs/reference/mcp.md. New surface added: - docs/reference/release-verification.md — moved cosign/SLSA/SBOM procedure with one expanded "Why this matters" paragraph explaining the keyless OIDC trust anchor. Every docs/ link in the new README verified to resolve to an existing file. Cross-references from other docs / certctl.io to the deleted sections (if any) need follow-up Phase 11 sweeps.	2026-05-05 03:26:05 +00:00
shankar0123	affaa11d14	docs: Phase 12 — populate docs/README.md navigation index Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. The placeholder from Phase 1 (commit `cda957f`) gets replaced with the audience-organized navigation index operators use to find what they need. Structure follows the recommended Phase 2 directory tree: - Getting Started (5 entries) - Reference — architecture, API, MCP, hierarchy, deployment model, vendor matrix, plus subsections for connectors (6 pages) and protocols (7 docs) - Operator (5 entries + 3 runbooks) - Migration (6 entries — 3 from-X plus 3 ACME walkthroughs) - Compliance (index + 3 frameworks) - Contributor (4 entries) - Archive (2 version-specific upgrade guides) Every link verified to resolve to an existing file. Reading-order-by-role section at the bottom suggests sequencing with rough time-to-complete: - First-time operator: ~90 minutes - Production operator: ~4 hours - PKI engineer: ~6 hours - Auditor / compliance: ~4 hours - Contributor: ~3 hours Future Phase 4 follow-on commits (per-connector page extraction) and Phase 5 (testing-guide.md prune) will add new entries to this index as their destination docs land.	2026-05-05 03:21:53 +00:00
shankar0123	dca1900815	docs: Phase 11 (partial) — fix cross-references after Phase 2 moves Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Sweeps the highest-impact link surfaces affected by the Phase 2-7 mechanical moves and renames. Covers README.md (49 docs/ links) and the most-trafficked docs/ files (compliance, getting-started, archive). README.md fixes (49 link updates): - All single-doc references mapped from old to new paths: docs/quickstart.md → docs/getting-started/quickstart.md docs/architecture.md → docs/reference/architecture.md docs/connectors.md → docs/reference/connectors/index.md docs/acme-server.md → docs/reference/protocols/acme-server.md docs/{soc2,pci-dss,nist}.md → docs/compliance/{soc2,pci-dss,nist-sp-800-57}.md ... (full mapping in the sed pipeline) - 3 references to deleted features.md replaced with pointers to architecture.md + connectors/index.md. docs/compliance/index.md (3 sibling renames): compliance-soc2.md → soc2.md compliance-pci-dss.md → pci-dss.md compliance-nist.md → nist-sp-800-57.md docs/compliance/pci-dss.md (3 external refs need ../): architecture.md → ../reference/architecture.md connectors.md → ../reference/connectors/index.md quickstart.md → ../getting-started/quickstart.md docs/getting-started/concepts.md (4 external refs): crl-ocsp.md → ../reference/protocols/crl-ocsp.md architecture.md → ../reference/architecture.md mcp.md → ../reference/mcp.md openapi.md → ../reference/api.md docs/getting-started/quickstart.md (4 external refs + 1 sibling): tls.md → ../operator/tls.md upgrade-to-tls.md → ../archive/upgrades/to-tls-v2.2.md architecture.md → ../reference/architecture.md demo-advanced.md → advanced-demo.md (sibling rename) docs/getting-started/examples.md (4 external refs): migrate-from-certbot.md → ../migration/from-certbot.md migrate-from-acmesh.md → ../migration/from-acmesh.md certctl-for-cert-manager-users.md → ../migration/cert-manager-coexistence.md connectors.md → ../reference/connectors/index.md docs/archive/upgrades/to-tls-v2.2.md (3 external refs need ../../): tls.md → ../../operator/tls.md quickstart.md → ../../getting-started/quickstart.md test-env.md → ../../contributor/test-environment.md docs/archive/upgrades/to-v2-jwt-removal.md (2 external refs need ../../): architecture.md → ../../reference/architecture.md tls.md → ../../operator/tls.md Verified all README.md docs/ links resolve to existing files. The only remaining top-level link is testing-guide.md which still exists at the top of docs/ (Phase 5 will prune it later). Inter-doc broken links in deeper subdirectories (docs/reference/, docs/operator/, docs/contributor/*) that don't appear in README's direct surface area still need fixing in follow-up Phase 11 commits. This commit handles the operator-facing entry points.	2026-05-05 03:19:21 +00:00
shankar0123	633e440787	docs: Phase 4 (structural) — move connectors.md + 5 deep dives into reference/connectors/ Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Phase 4 in the audit recommended a full split of connectors.md (2055 lines) into an index + 27 per-connector pages (12 issuer + 15 target). This commit lands the structural half of that work; full per-target page extraction is deferred to follow-up commits. Renames (all blame-preserving): docs/connectors.md → docs/reference/connectors/index.md docs/connector-apache.md → docs/reference/connectors/apache.md docs/connector-f5.md → docs/reference/connectors/f5.md docs/connector-iis.md → docs/reference/connectors/iis.md docs/connector-k8s.md → docs/reference/connectors/k8s.md docs/connector-nginx.md → docs/reference/connectors/nginx.md Edits: - docs/reference/connectors/index.md gets a top-of-doc note explaining the per-connector deep-dive sibling pattern + a forward list of the 5 per-target pages. - The 5 per-connector deep-dive pages each get a `Last reviewed: 2026-05-05` header + a back-link to the index. Deferred to future commits (Phase 4b/c follow-on): - Extracting the 12 issuer sections from index.md into per-issuer pages at reference/connectors/{acme,awsacmpca,digicert,ejbca, entrust,globalsign,googlecas,local,openssl,sectigo,stepca,vault}.md - Extracting the 10 remaining target sections from index.md into per-target pages at reference/connectors/{caddy,traefik,envoy, haproxy,postfix-dovecot,ssh,javakeystore,wincertstore,awsacm, azurekv}.md The pragmatic split makes this Phase 4 work incrementally landable — each per-connector extraction is a small follow-up commit that doesn't change the docs/ tree shape further. Cross-references from README.md and other docs to docs/connectors.md still need fixing in Phase 11.	2026-05-05 03:14:39 +00:00
shankar0123	cee008207b	docs: delete features.md (Phase 6 disperse, content already in canonical docs) Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. features.md was a 1606-line feature catalog with ~80% overlap with canonical docs already in the tree: - "API Surface" section (rate limiting, CORS, body size limits) → docs/operator/security.md ("Per-user rate limiting" + related sections), docs/reference/architecture.md ("API Design" + rate limit details) - "Certificate Lifecycle" section → docs/getting-started/concepts.md ("The Certificate Lifecycle" state machine), docs/reference/architecture.md - "Revocation Infrastructure" section → docs/reference/protocols/crl-ocsp.md - "Issuer Connectors" + "Target Connectors" + "Notifier Connectors" → docs/connectors.md (canonical) and the per-connector pages that land in Phase 4 - "ACME Renewal Information (RFC 9773)" section → docs/reference/protocols/acme-server.md - "Discovery" section → docs/getting-started/concepts.md, docs/reference/architecture.md - "Observability" section → docs/operator/security.md, docs/reference/architecture.md - "Job System" + "Background Scheduler" → docs/reference/architecture.md - "Web Dashboard" → docs/getting-started/concepts.md - "CLI" section → docs/reference/cli.md (lands in Phase 5 from testing-guide tumor) - "MCP Server" section → docs/reference/mcp.md - "Agent" section → docs/reference/architecture.md, docs/getting-started/concepts.md - "Deployment" section → docs/reference/deployment-model.md - "Database Schema" section → docs/reference/architecture.md - "Security" section → docs/operator/security.md - "CI/CD" section → docs/contributor/ci-pipeline.md - "Test Suite" section → docs/contributor/testing-strategy.md - "Examples" section → docs/getting-started/examples.md - "Compliance Mapping" section → docs/compliance/index.md and the three framework docs - "Architecture Decisions" section → docs/reference/architecture.md The catalog format failed both beginners (overwhelming wall of text) and experts (grep on source is faster than reading 1606 lines of prose). Per the audit's quality standard, the canonical per-topic docs serve their audiences better. Git history preserves features.md content. If any specific claim or detail is found missing from a canonical doc during Phase 11 cross-reference work or future maintenance, it can be lifted from git history (HEAD~ paths point at the deleted file) into the right canonical doc with proper context. Cross-references from README.md and other docs to docs/features.md still need fixing in Phase 11.	2026-05-05 03:09:48 +00:00
shankar0123	e9b15108d9	docs: split legacy-est-scep.md into two purpose-aligned docs The 519-line legacy-est-scep.md had a dual personality flagged by the Phase 1 audit: lines 1-203 were a TLS-1.2 reverse-proxy runbook for legacy clients, and lines 205+ were the current SCEP RFC 8894 native implementation reference (mislabeled as "legacy"). Two separate audiences, two separate purposes. Split: Lines 1-203 (TLS-1.2 reverse-proxy runbook): → docs/operator/legacy-clients-tls-1.2.md (NEW) Operator runbook for the case where embedded EST/SCEP clients only speak TLS 1.2. Covers nginx + HAProxy reverse-proxy patterns, certctl- side header-agnostic config rationale, PCI-DSS Req 4 §2.2.5 attestation, deprecation timeline. Also got a fresh "What this is" framing. Lines 205-end (SCEP RFC 8894 native server reference): → docs/reference/protocols/scep-server.md (NEW) Generic SCEP server protocol reference: RA cert + key configuration, GetCACaps capability advertisement, supported messageTypes, MVP backward-compat path, multi-profile dispatch, must-staple per-profile policy, mTLS sibling route, Microsoft Intune dynamic-challenge dispatcher. Cross-links to scep-intune.md for Intune-specific deployment guidance. Both new docs carry a `Last reviewed: 2026-05-05` line. Internal links within each new doc updated to the new sibling paths. Cross-references from other docs to legacy-est-scep.md still need fixing in Phase 11. Original docs/legacy-est-scep.md deleted (git history preserves).	2026-05-05 02:55:45 +00:00
shankar0123	f157c18368	docs: re-home ACME client walkthroughs under docs/migration/ The three ACME client walkthroughs (Caddy, cert-manager, Traefik) are conceptually "I have an existing X, here's how to point its ACME client at certctl." They belong with the migration docs, not with the acme-server protocol reference. Renames: docs/acme-caddy-walkthrough.md → docs/migration/acme-from-caddy.md docs/acme-cert-manager-walkthrough.md → docs/migration/acme-from-cert-manager.md docs/acme-traefik-walkthrough.md → docs/migration/acme-from-traefik.md Each walkthrough's lede gets a "Use this walkthrough when..." paragraph that closes the WHY-weak gap flagged in the Phase 1 audit. The new framing tells the reader when to pick this walkthrough versus the alternatives: - Caddy: "you're running Caddy 2.7+ and want it to ACME-issue from certctl instead of Let's Encrypt" - cert-manager: explicit pointer to cert-manager-coexistence.md for the keep-cert-manager-running case (vs replacement) - Traefik: "you're running Traefik 3.0+ and want certctl as your ACME source of truth" Cross-reference updates from other docs and README still pending in Phase 11.	2026-05-05 02:51:10 +00:00
shankar0123	b21c02a3d5	docs: archive version-specific upgrade guides upgrade-to-tls.md and upgrade-to-v2-jwt-removal.md are version-specific runbooks for past releases. Late upgraders still need them; current operators don't. Move both to docs/archive/upgrades/ with one-line archive headers pointing readers at the current canonical docs. Renames: docs/upgrade-to-tls.md → docs/archive/upgrades/to-tls-v2.2.md docs/upgrade-to-v2-jwt-removal.md → docs/archive/upgrades/to-v2-jwt-removal.md Each gets a top-of-doc archive notice with the date and a forward pointer to the relevant steady-state doc: to-tls-v2.2.md → docs/operator/tls.md to-v2-jwt-removal.md → docs/operator/security.md The relative link inside to-v2-jwt-removal.md (was "upgrade-to-tls.md", now "to-tls-v2.2.md") updated to point at its archived sibling. Cross-reference updates from other docs and README still pending in Phase 11.	2026-05-05 02:50:14 +00:00
shankar0123	3a807ae37e	docs: Phase 2 mechanical file moves to subdirectory structure Pure git mv operations; no content edits. Internal links remain pointing at old paths and will be fixed in Phase 11. Per the Phase 1 audit recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/. 35 files moved across 8 audience-organized subdirectories: docs/getting-started/ (5): quickstart.md, concepts.md, examples.md, advanced-demo.md (was demo-advanced.md), why-certctl.md docs/reference/ (6): architecture.md, api.md (was openapi.md), mcp.md, intermediate-ca-hierarchy.md, deployment-model.md (was deployment-atomicity.md), vendor-matrix.md (was deployment-vendor-matrix.md) docs/reference/protocols/ (6): acme-server.md, acme-server-threat-model.md, scep-intune.md, est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md) docs/operator/ (4): security.md, tls.md, database-tls.md, approval-workflow.md docs/operator/runbooks/ (3): cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md (was runbook-expiry-alerts.md), disaster-recovery.md docs/migration/ (3): from-certbot.md (was migrate-from-certbot.md), from-acmesh.md (was migrate-from-acmesh.md), cert-manager-coexistence.md (was certctl-for-cert-manager-users.md) docs/compliance/ (4): index.md (was compliance.md), soc2.md (was compliance-soc2.md), pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was compliance-nist.md) docs/contributor/ (4): testing-strategy.md, test-environment.md (was test-env.md), ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md) Deferred to later Phase 2 sub-phases: - connectors.md split (Phase 4): docs/connectors.md + docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level - testing-guide.md prune (Phase 5): docs/testing-guide.md still at top level - features.md disperse (Phase 6): docs/features.md still at top level - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md still at top level - ACME walkthrough re-homing (Phase 8): three docs/acme--walkthrough.md still at top level - Upgrade docs archive (Phase 3): two docs/upgrade-.md still at top level Cross-reference updates (Phase 11) will happen after all moves and content edits land. Internal links to docs/* paths are temporarily broken until that phase completes.	2026-05-05 02:49:28 +00:00
shankar0123	cda957f302	docs: Phase 2 prep — placeholder navigation index Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/. Phase 2 organizes docs/ into eight audience-aligned subdirectories (getting-started, reference, operator, migration, compliance, contributor, archive). docs/README.md will be the navigation index linking into each. This commit only adds the placeholder. Subdirectories materialize as Phase 2 file moves land. Index gets populated in Phase 12 once all moves and content edits are complete. Audit folder: cowork/docs-overhaul-phase-1-audit-2026-05-04/ Phase 2 prompt: cowork/docs-overhaul-phase-2-restructure-prompt.md	2026-05-05 02:48:49 +00:00
shankar0123	0f81c1b956	ci: re-fix CodeQL #32 + repair loadtest f5-mock build context Two unrelated CI failures from run #25305811340; fixed in one commit since neither needs the other to land first. CodeQL alert #32 (go/log-injection at middleware.go:68) reopened after `b0fc067`. The previous fix introduced a scrubLogValue helper backed by strings.NewReplacer; CodeQL's taint tracker only recognizes the literal strings.ReplaceAll pattern as a sanitizer (matches the OWASP example in the rule docs). Wrapper helpers and NewReplacer don't trigger the recognition, so the analyzer kept flagging. Fix: drop the helper. Inline strings.ReplaceAll chains directly at the call site for r.Method and r.URL.Path. Same runtime semantics (strip CR/LF/NUL); CodeQL pattern-matches the literal call so the alert can finally close. Loadtest CI failure (run #25305811340 'k6 throughput run' job at make loadtest): ERROR: failed to compute cache key: failed to calculate checksum of ref ...: "/deploy/test/f5-mock-icontrol": not found The f5-mock-icontrol Dockerfile has `COPY deploy/test/f5-mock-icontrol/ ./` which assumes the build context is the repo root. The docker-compose.test.yml f5-mock-icontrol service correctly uses the long-form build: build: context: .. # = repo root from deploy/docker-compose.test.yml dockerfile: deploy/test/f5-mock-icontrol/Dockerfile The loadtest compose at deploy/test/loadtest/docker-compose.yml used the shorthand: build: ../f5-mock-icontrol That sets context = the f5-mock-icontrol directory itself, breaking the Dockerfile's COPY (it tries to find the directory inside itself). Fix: change the loadtest compose to the long-form pattern matching docker-compose.test.yml, with context: ../../.. (= repo root from deploy/test/loadtest/) and explicit dockerfile path. Verified locally: gofmt: clean. go vet ./internal/api/middleware/...: exit 0. go test -short -count=1 ./internal/api/middleware/...: ok 0.253s. python3 -c 'import yaml; yaml.safe_load(...)' on the compose file: parses clean. grep -rnE 'scrubLogValue' internal/api/: zero references (helper fully dropped). References: https://github.com/certctl-io/certctl/security/code-scanning/32 CI run https://github.com/certctl-io/certctl/actions/runs/25305811340 Closes CodeQL #32 + restores loadtest CI. v2.0.70	2026-05-04 17:26:24 +00:00
shankar0123	ff6ffcda1b	refactor(web): drop 5 unused imports across 4 pages (CodeQL #6 , #7 , #8 , #9 ) Four CodeQL js/unused-local-variable alerts in one sweep — all Note severity, all pure dead-import cleanup verified by grep (each removed symbol had exactly 1 occurrence in its file: the import line itself). Alert #6 — web/src/pages/AgentFleetPage.tsx:3: Drop Legend from recharts named-import list. The fleet pie chart renders without a legend (the slice colors are labeled inline via Tooltip). Alert #7 — web/src/pages/DashboardPage.tsx:9: Drop getAgents + getNotifications from the api/client named- import list. The dashboard summary card now uses getDashboardSummary (single endpoint) instead of fanning out to per-resource list calls; the agents + notifications full list is reachable via dedicated pages. Alert #8 — web/src/pages/CertificatesPage.tsx:6: Drop revokeCertificate from the api/client named-import list. The page uses bulkRevokeCertificates for the multi-cert UX; single-cert revoke happens on CertificateDetailPage which imports revokeCertificate independently. Alert #9 — web/src/pages/DiscoveryPage.tsx:15: Drop the StatusBadge default-import line. Discovered-cert status renders inline (text label colored via the row's state-class) without the StatusBadge component. Verified locally: Each flagged symbol: 0 occurrences in its file post-edit. tsc --noEmit: exit 0. No behavioral change — pure import-list cleanup. References: https://github.com/certctl-io/certctl/security/code-scanning/6 https://github.com/certctl-io/certctl/security/code-scanning/7 https://github.com/certctl-io/certctl/security/code-scanning/8 https://github.com/certctl-io/certctl/security/code-scanning/9 Closes all four alerts.	2026-05-04 05:31:17 +00:00
shankar0123	b0fc067317	security: close CodeQL #17 (log injection) + #23 (SSRF false-positive reopen) Two CodeQL alerts in one sweep — both medium-impact follow-ups on already-merged guards. Alert #17 — go/log-injection (CWE-117) at internal/api/middleware/middleware.go:58: log.Printf("[%s] %s %s %d %v", requestID, r.Method, r.URL.Path, ...) r.Method and r.URL.Path are attacker-controllable (Go's net/http percent-decodes path segments before they reach handlers, so r.URL.Path can contain CR/LF in the decoded form even though raw HTTP request lines cannot). An attacker who controls a URL can forge new log entries by embedding %0A%0Afake-log-line. Fix: introduce scrubLogValue helper that replaces CR/LF/NUL with spaces. Apply to both r.Method and r.URL.Path. Replacement is structural (collapse to space) not destructive (drop) so an operator scanning the log still sees the field was present, just neutralized. Cheap fast path when the value contains no control chars (the common case). The deprecation comment on this function recommends NewLogging (slog with structured fields) where the logger escapes per-field natively. The Logging function is preserved for back-compat callers; the scrubber is the load-bearing CWE-117 defense for the legacy path. Alert #23 — go/request-forgery (CWE-918) at scep_probe.go:271: CodeQL reopened the alert after commit `e6919cd`. The commit's in-function validator dispatch went through a function-pointer override hook: validateURL := s.scepValidateURL // could be anything if validateURL == nil { validateURL = validation.ValidateSafeURL } if err := validateURL(rawURL); err != nil { ... } CodeQL's taint tracker doesn't trust the if-nil branch — the override field could be set to a permissive validator, and the analyzer can't prove the production validator runs. Fix: invert the dispatch. Always call validation.ValidateSafeURL literally first; only consult the test-override hook to grant an EXEMPTION when the production validator rejects: if err := validation.ValidateSafeURL(rawURL); err != nil { if s.scepValidateURL == nil \|\| s.scepValidateURL(rawURL) != nil { return ... validate url error } } Same applies to ProbeSCEP's entry-point validator. Both call sites now have the literal validation.ValidateSafeURL call in-scope of the sink (client.Do), which CodeQL recognizes as a sanitizer. Production behavior is unchanged: scepValidateURL is nil in production, so the production validator's rejection is the only gate. Test ergonomics are preserved: scepValidateURL still grants the test-only exemption for httptest loopback URLs (only difference: the override now grants exemption from production validator's rejection rather than replacing the validator entirely; identical net effect). Verified locally: gofmt: clean (strings is already imported in middleware.go). go vet ./internal/api/middleware/... + ./internal/service/...: exit 0. go test -short ./internal/api/middleware/...: ok 0.244s. go test -short ./internal/service/...: ok 4.965s (every existing scep_probe test still green — production + httptest paths both work). References: https://github.com/certctl-io/certctl/security/code-scanning/17 https://github.com/certctl-io/certctl/security/code-scanning/23 Closes CodeQL #17. Re-closes CodeQL #23 with a fix CodeQL's taint tracker can verify.	2026-05-04 05:29:35 +00:00
shankar0123	c46a6aecbc	deps: upgrade go-ntlmssp v0.0.0-20221128 → v0.1.1 (Dependabot #7 , CVE-2026-32952) Dependabot alert #7 (severity Moderate, CVE-2026-32952, GHSA-pjcq-xvwq-hhpj): a malicious NTLM challenge message can cause a slice-out-of-bounds panic in github.com/Azure/go-ntlmssp, crashing any Go process using ntlmssp.Negotiator as an HTTP transport. Pre-v0.1.1 versions are vulnerable. Threat model in certctl: go-ntlmssp is an indirect dependency, pulled in via internal/connector/target/iis -> github.com/masterzen/winrm -> github.com/Azure/go-ntlmssp. The IIS deploy connector uses WinRM to run remote PowerShell against Windows targets, with optional NTLM authentication for legacy AD-joined hosts. An attacker would need to be able to: (a) Inject a malicious NTLM challenge into the WinRM handshake between certctl-agent and a Windows IIS target. (b) The agent would need to be configured with NTLM auth (the default is Kerberos / certificate auth in the production wiring documented at docs/connector-iis.md). Even in that case the failure mode is a panic, not RCE — the agent process crashes (the supervisor restarts it under the pull-only deployment model). Availability impact only (matches the CVSS 'Availability: Low' rating). Fix: go get github.com/Azure/go-ntlmssp@v0.1.1 Stale go.sum lines for the old v0.0.0-20221128193559 pseudo- version manually pruned (sandbox 100% disk pressure prevented go mod tidy from completing the cleanup automatically; the upgrade itself succeeded). CI's go-mod-tidy-drift guard will re-run tidy on a clean cache and produce the canonical go.sum state. Verified locally: go.mod: require github.com/Azure/go-ntlmssp v0.1.1 // indirect go.sum: only the v0.1.1 entries remain. go mod why github.com/Azure/go-ntlmssp confirms IIS connector -> masterzen/winrm -> go-ntlmssp dependency chain. go build ./internal/connector/target/iis/... + wincertstore/... exit 0 (the only consumers). go vet on both packages: exit 0. go test -short -count=1 ./internal/connector/target/iis/...: ok 0.016s. go test -short -count=1 ./internal/connector/target/wincertstore/...: ok 0.012s. Reference: https://github.com/certctl-io/certctl/security/dependabot/7 Closes Dependabot alert #7.	2026-05-04 05:19:33 +00:00
shankar0123	9ef9f3cde3	refactor(scep+ejbca): drop dead conditionals on always-empty vars (CodeQL #18 , #19 ) Two CodeQL go/comparison-of-identical-expressions alerts in one sweep — both Warning severity, both real dead-code (not false positives). CodeQL detected that each comparison's LHS variable was provably constant. Alert #18 — internal/api/handler/scep.go:612 (extractCSRFields): challengePassword := "" transactionID := "" // ... loop populates challengePassword from CSR.Attributes ... for _, attr := range csr.Attributes { if attr.Type.Equal(oidChallengePassword) { // populates challengePassword ONLY — transactionID stays "" } } if transactionID == "" && csr.Subject.CommonName != "" { // ← always true transactionID = csr.Subject.CommonName } transactionID was initialized to "" and never reassigned before the check. The conditional was always true; the MVP path was effectively "unconditionally fall back to CN". The RFC 8894 path (tryParseRFC8894 above this function) extracts transaction-ID properly from PKCS#7 authenticatedAttributes; the MVP path is for lightweight legacy clients that send the raw CSR with no PKCS#7 wrapping, and CN-as-transaction-ID is sufficient there. Fix: drop the dead transactionID local var + dead conditional; unconditionally set transactionID = csr.Subject.CommonName. No behavioral change — the runtime semantics are identical to before (every valid invocation already took the fallback). The CN extraction stays robust because the empty-CN case still produces an empty transactionID, which downstream callers handle. Alert #19 — internal/connector/issuer/ejbca/ejbca.go:415 (RevokeCertificate): serial := request.Serial issuerDN := "" // (comment: "if we have time..." — TODO never followed up) revokeURL := fmt.Sprintf("%s/certificate/%s/%s/revoke", apiURL, issuerDN, serial) if issuerDN == "" { // ← always true revokeURL = fmt.Sprintf("%s/certificate/%s/revoke", apiURL, serial) } issuerDN was hardcoded to "" two lines above. The first revokeURL line was unreachable dead code; the conditional always fired and the serial-only URL always won. EJBCA's REST API has both /certificate/{issuer_dn}/{serial}/revoke and /certificate/{serial}/revoke endpoints; the serial-only form is correct for typical certctl deployments where one EJBCA CA maps to one certctl issuer config (no overlapping serial spaces). Fix: drop the dead first revokeURL + dead conditional; build revokeURL once via the serial-only endpoint. No behavioral change — the runtime URL was always the serial-only one. Comment retained + expanded to document the future-enhancement path (parse issuer DN from IssuanceResult metadata + use the DN-qualified endpoint when a multi-CA EJBCA deployment surfaces). Verified locally: gofmt: clean. go vet ./internal/api/handler/... + ./internal/connector/issuer/ejbca/...: exit 0. go test -short -count=1 ./internal/api/handler/... + ejbca/...: PASS. Both fixes are pure dead-code removal — runtime behavior is byte- identical to pre-edit. The existing test suites would have caught any actual behavioral change. References: https://github.com/certctl-io/certctl/security/code-scanning/18 https://github.com/certctl-io/certctl/security/code-scanning/19 Closes both alerts.	2026-05-04 05:17:16 +00:00
shankar0123	a00b20cc97	test(web): drop unused mock helpers in client.error.test.ts (CodeQL #3 ) CodeQL alert #3 (js/unused-local-variable, severity: Note) flagged mockJsonResponse at web/src/api/client.error.test.ts:39 as dead. Audit: client.error.test.ts is the error-path companion to client.test.ts. Every test in this file drives a non-2xx response through the client function under test via mockErrorResponse (52 call sites). Both mockJsonResponse AND mockBlobResponse were drafted alongside the scaffolding but never used — the success-path coverage lives in client.test.ts, not this file. CodeQL only flagged mockJsonResponse, but mockBlobResponse is the same shape (defined, never called). Cleaning both up for consistency with the file's error-only scope. Replaced with a one-paragraph comment explaining the file's scope so future contributors don't re-add the helpers expecting them to be used. Verified locally: tsc --noEmit: exit 0. grep -c mockJsonResponse + mockBlobResponse: 1 each (the comment mention only). No behavioral change. Reference: https://github.com/certctl-io/certctl/security/code-scanning/3 Closes CodeQL alert #3 (js/unused-local-variable).	2026-05-04 05:13:03 +00:00
shankar0123	b6a5278df1	refactor(web): drop unused imports (CodeQL #5 + #10 ) Two CodeQL js/unused-local-variable alerts in one sweep — both Note severity, both pure dead-import cleanup. Alert #10 (web/src/pages/NotificationsPage.tsx:8): formatDateTime imported but only timeAgo used. Verified via repo-wide grep — formatDateTime appears on the import line only. Drop from the import statement; leave timeAgo in place. Alert #5 (web/src/api/client.test.ts:2): Five unused imports in the test file's import block (the test file imports nearly the full API client surface): - acknowledgeHealthCheck - createPolicy - deleteHealthCheck - getHealthCheckHistory - updateHealthCheck Each appears only on the import line — verified via grep -c. Removing them doesn't change test coverage (the corresponding client functions are exported and exercised in their own tests elsewhere, but the integration covered by client.test.ts doesn't reach them yet). Verified locally: tsc --noEmit: exit 0. grep -c on each removed symbol in its file: 0 occurrences. No behavioral change — pure import-list cleanup. References: https://github.com/certctl-io/certctl/security/code-scanning/10 https://github.com/certctl-io/certctl/security/code-scanning/5 Closes both alerts.	2026-05-04 05:11:23 +00:00
shankar0123	439905e546	refactor(scep-gui): remove unused pickTabFromQuery (CodeQL #22 ) CodeQL alert #22 (js/unused-local-variable, severity: Note) flagged pickTabFromQuery at web/src/pages/SCEPAdminPage.tsx:584 as dead code. Audit: this function is a leftover from an incomplete refactor. The SCEP admin page picks its initial tab via pickInitialTab (line 594 post-edit), which subsumes the same query-string check that pickTabFromQuery did: pickInitialTab honors three signals (precedence high → low): 1. ?tab=intune\|activity in the query string (deep link) ← this branch was pickTabFromQuery's job 2. Pathname ending in /scep/intune (legacy alias from Phase 9.4) 3. Default to 'profiles' pickTabFromQuery only handled signal (1); pickInitialTab inlined the same logic on its first branch and added (2) + (3). Nothing references pickTabFromQuery (verified via repo-wide grep). Pure dead code. Fix: delete the function. No behavioral change — pickInitialTab already does the work. Verified locally: tsc --noEmit: exit 0. grep -nE 'pickTabFromQuery' web/src/: zero references. Reference: https://github.com/certctl-io/certctl/security/code-scanning/22 Closes CodeQL alert #22 (js/unused-local-variable).	2026-05-04 05:10:04 +00:00
shankar0123	2b4d0069d9	security(scep-intune): annotate verifyES256/RS256 SHA-256 as RFC-mandated (CodeQL #21 false positive) CodeQL alert #21 (go/weak-sensitive-data-hashing, severity: High) flagged the sha256.Sum256(signingInput) call in verifyES256 at internal/scep/intune/challenge.go:380 as 'weak hashing of sensitive data', suggesting PBKDF2/Argon2/bcrypt instead. This is a CodeQL false positive. The CodeQL query triggers when SHA-256 is used near *x509.Certificate (the trust pool) and infers 'this might be password hashing.' But the actual context is JWS signature verification: - verifyRS256 implements RFC 7518 §3.3 — 'RSASSA-PKCS1-v1_5 using SHA-256'. SHA-256 is spec-mandated. - verifyES256 implements RFC 7518 §3.4 — 'ECDSA using P-256 and SHA-256'. SHA-256 is spec-mandated. - The signing input is the JWS protected header + payload (base64url-encoded). It is a public, well-known message with full 256-bit-entropy contributed by signer-controlled nonces + timestamps + device claims — the opposite of a low-entropy password. - The output is verified against an asymmetric signature (rsa.VerifyPKCS1v15 / ecdsa.Verify), not compared to a pre-computed hash digest. This is signature verification, not password hashing. - Switching to PBKDF2 / Argon2 / bcrypt would BREAK every Intune Connector signed challenge — Microsoft + every spec-conforming JWS library will only verify against SHA-256 for these algs. Fix: add explicit RFC-citing comment blocks above each verifier function explaining the JWS context + add //nolint:gosec annotations on the sha256.Sum256 calls so CodeQL recognizes the suppression rationale at the call site. The annotation cites the specific RFC clause (7518 §3.3 / §3.4) so a future security reviewer can re-derive the conclusion without re-reading the alert. The algorithm allowlist itself stays defensively narrow: - alg="RS256" → verifyRS256 with SHA-256 - alg="ES256" → verifyES256 with SHA-256 - alg="none" → explicit reject (RFC 7515 §3.6 attack vector) - any other alg → reject as unsupported Pinned by existing tests: - TestValidateChallenge_HappyPath_RS256 - TestValidateChallenge_HappyPath_ES256_FixedWidth - TestValidateChallenge_HappyPath_ES256_DER - TestValidateChallenge_AlgNoneRejected - TestValidateChallenge_UnsupportedAlg The happy-path tests would fail if the verifiers switched to any non-SHA-256 digest — the alg allowlist makes the SHA-256 dependency load-bearing, which the existing test suite already proves. Verified locally: gofmt: clean. go vet ./internal/scep/intune/...: exit 0. go test -short -count=1 ./internal/scep/intune/...: PASS (every existing challenge_test.go subtest still green). Reference: https://github.com/certctl-io/certctl/security/code-scanning/21 Closes CodeQL alert #21 as a documented false positive — the //nolint annotations + RFC-citing comments are the load-bearing suppression. Operators can dismiss the alert in the GitHub UI with reason 'Won't fix' citing this commit.	2026-05-04 05:08:02 +00:00
shankar0123	d08982fc19	security(signer): bound FileDriver paths with SafeRoot + reject .. (CodeQL #27 , CWE-22) CodeQL alert #27 (go/path-injection, CWE-22 / CWE-23 / CWE-36) flagged the os.WriteFile sink at internal/crypto/signer/file_driver.go:194 because the outPath flowed from operator-supplied config (CAKeyPath in the local issuer's encrypted config blob -> GenerateOutPath closure -> os.WriteFile) without a containment check. Threat model: Production wiring (cmd/server/main.go) constructs &signer.FileDriver{} and the local-issuer NewConnector wires GenerateOutPath off Config.CAKeyPath. CAKeyPath ships from the encrypted issuer config in PostgreSQL — settable only by an authenticated admin via the API. So the realistic exploit is: (a) Admin compromise -> CAKeyPath set to /etc/passwd -> FileDriver.Generate overwrites system files. (b) Future code path concatenates attacker-controlled fragments into the output path -> classic ../../etc/passwd traversal. Defense in depth: bound the write surface so admin-key-rotation errors and future regressions can't escape into arbitrary filesystem writes. Fix: internal/crypto/signer/file_driver.go gains: - SafeRoot string field on FileDriver. When set, every Load + Generate path MUST resolve under SafeRoot via filepath.Abs + strings.HasPrefix on cleaned paths. - validateSafePath helper that: * rejects empty paths * filepath.Clean()s the input * rejects paths whose cleaned form still contains a literal ".." segment (catches relative paths that escape above their start; absolute paths get collapsed by Clean) * resolves to filepath.Abs and (when SafeRoot non-empty) verifies containment via filepath.Separator-suffixed HasPrefix (the bare-prefix bug — SafeRoot=/var/lib/foo erroneously accepting /var/lib/foobar — has its own regression test below) - Load + Generate now call validateSafePath before any os.ReadFile / os.WriteFile. The validator is in the same function as the sink so CodeQL recognizes it as a guard. Tests (internal/crypto/signer/signer_test.go): TestFileDriver_Load_RejectsParentTraversal — relative path "../../etc/passwd" rejected with parent-directory error. TestFileDriver_Load_RejectsEmptyPath — empty path rejected. TestFileDriver_Generate_RejectsParentTraversal — write side, same pattern. TestFileDriver_SafeRoot_AcceptsContainedPath — happy path: a key file under SafeRoot succeeds. TestFileDriver_SafeRoot_RejectsEscape — absolute path outside SafeRoot rejected (the load-bearing CodeQL pin). TestFileDriver_SafeRoot_RejectsSiblingPrefix — pins the HasPrefix-with-separator subtlety: SafeRoot=/tmp/X must NOT accept /tmp/X-sibling. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/crypto/signer/...: ok 1.605s go test -short -count=1 ./internal/connector/issuer/local/...: ok 4.908s (downstream FileDriver consumer) go test -short -count=1 ./internal/service/...: ok 4.029s Backwards-compat: when SafeRoot is unset, only the structural .. + empty-path checks fire — the existing FileDriver call sites in cmd/server/main.go and the existing unit tests pass unchanged. Production wiring SHOULD set SafeRoot via cmd/server/main.go in a follow-up commit (env-var-supplied CERTCTL_CA_KEY_DIR or similar). Reference: https://github.com/certctl-io/certctl/security/code-scanning/27 Closes CodeQL alert #27 (go/path-injection).	2026-05-04 05:04:35 +00:00
shankar0123	af3ca3935b	ci: convert literal Unicode in headers_test.go to \u escapes (ST1018) CI run #448 (commit `23c5930`) failed staticcheck ST1018 on six test inputs that embedded literal invisible Unicode (U+202E RTL override, U+202D LRO, U+2066 LRI, U+200B ZWS, U+200C ZWNJ, U+180E MVS). golangci-lint enforces ST1018 in CI but go vet doesn't, so the local pre-commit gate (gofmt + go vet + go test) didn't catch it — the canonical Bundle 9 staticcheck-vs-vet drift case CLAUDE.md explicitly warns about. Fix: convert each literal-Unicode test input to its \uXXXX ASCII escape form. Verified via byte-level Python sed against UTF-8 byte sequences (\xe2\x80\xae -> ‮, \xe2\x80\xad -> ‭, \xe2\x81\xa6 -> ⁦, \xe2\x81\xa9 -> ⁩, \xe2\x80\x8b -> , \xe2\x80\x8c -> ‌, \xe1\xa0\x8e -> ᠎). The U+202C (PDF — Pop Directional Formatting) closer was caught by the same sweep since two RTL/LRO test cases use it. The runtime semantics are byte-identical — Go interprets ‮ and the literal U+202E byte sequence to the same rune. Only the source text changed. Verified locally: gofmt -l internal/validation/: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/validation/...: ok 0.014s (all 4 test cases in TestSanitizeEmailBodyValue_StripsBidiOverride + the rest of the suite still green — semantics unchanged). Sandbox couldn't install staticcheck (disk pressure on /tmp/gopath), but the rule is mechanical: U+XXXX format chars in string literals must use \uXXXX. Every flagged literal is fixed. Reference: CI run https://github.com/certctl-io/certctl/actions/runs/25301809013 Closes the staticcheck regression on commit `23c5930` (security(email): sanitize body fields against content injection).	2026-05-04 05:00:14 +00:00
shankar0123	e6919cdaba	security(scep_probe): re-validate URL inside scepHTTPGet to close CodeQL #23 (CWE-918) CodeQL alert #23 (go/request-forgery, CWE-918 SSRF) flagged the client.Do(req) sink at internal/service/scep_probe.go:232 because the URL parameter to scepHTTPGet is taint-traced from the user- supplied input to ProbeSCEP without the analyzer recognizing the upstream sanitizer. The defense-in-depth was already in place: 1. validation.ValidateSafeURL at ProbeSCEP entry (line 75) — rejects obvious SSRF targets (loopback / link-local / cloud metadata literals) before any network call. 2. validation.SafeHTTPDialContext on the http.Transport — re-resolves the host at dial time and rejects connections to reserved IP ranges. This is the authoritative SSRF + DNS- rebinding guard. Even if step 1 was bypassed, the dial would still fail. But CodeQL's taint tracker doesn't follow the validator across function boundaries, so the alert stays open even though the code is safe. This commit re-runs validation.ValidateSafeURL inside scepHTTPGet immediately before http.NewRequestWithContext — sanitizer in the same function as the sink, which CodeQL recognizes as a guard. Bonus defense-in-depth: any future call site that wires a URL into scepHTTPGet without going through ProbeSCEP (e.g. a new code path that directly probes a discovered URL) inherits the same SSRF guard automatically. Fail-closed by default. The validator dispatch matches ProbeSCEP's pattern — tests override via s.scepValidateURL to hit httptest loopback servers; production callers use validation.ValidateSafeURL. The probe's existing httptest-based tests continue to work unchanged. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short ./internal/service/...: ok 4.029s (every existing scep_probe test still green — the new revalidation is a no-op for tests that go through ProbeSCEP because the same validator already passed once at entry). Reference: https://github.com/certctl-io/certctl/security/code-scanning/23 Closes CodeQL alert #23 (go/request-forgery).	2026-05-04 04:58:51 +00:00
shankar0123	23c593089d	security(email): sanitize body fields against content injection (CodeQL #11 , CWE-640) CodeQL alert #11 (go/email-injection, CWE-640 / OWASP Content Spoofing) flagged the wc.Write(message) sink at internal/connector/notifier/email/ email.go:208 because attacker-controllable fields flow into the email body unchecked. Threat model: Headers (From, To, Subject) were already protected by validation.ValidateHeaderValue (CWE-113 SMTP header injection, closed in commit `3853b74`). The remaining gap was the body. An attacker controls multiple fields that surface to the body of alert/event notifications: - alert.Subject, alert.Message - event.Subject, event.Body, event.CertificateID - alert.Metadata + event.Metadata key/value pairs These can carry CR/LF (forged 'Reply-To: attacker@evil.com' inside the body that recipients skim), NUL bytes (RFC 5321 4.5.2 violation that some MTAs truncate at), bidi-override Unicode (visually- spoofable URLs), zero-width / invisible Unicode (phishing), or malformed UTF-8 (Go emits U+FFFD which becomes a glyph in mail clients). The HTML email path (digest service) already uses html/template upstream and is safe via contextual auto-escape. This commit closes the plaintext path. Fix: internal/validation/headers.go gains SanitizeEmailBodyValue — a sanitizer that NEVER errors (the right contract for body content; over-eager rejection drops operator notifications) and scrubs: - NUL bytes (stripped entirely) - bare CR / LF (replaced with space — single fields should never carry their own line breaks; the surrounding template handles legitimate CRLFs) - C0 control chars < 0x20 except TAB - DEL (0x7F) + C1 control chars (0x80-0x9F) - U+FFFD (defense in depth: malformed UTF-8 -> Go emits this; strip so attacker-planted invalid bytes don't survive as an arbitrary glyph) - Bidi-override Unicode (U+202A..U+202E, U+2066..U+2069) - Zero-width / invisible Unicode (U+200B..U+200D, U+2060..U+2063, U+FEFF, U+180E) - Catch-all unicode.IsControl for anything not enumerated above Codepoint table uses numeric ranges rather than rune-literal switch cases — Go source rejects literal invisible characters (BOM U+FEFF) mid-file, so the table compares against numeric values. internal/connector/notifier/email/email.go applies the sanitizer at every interpolation site: - formatAlertBody: alert.ID/Type/Severity/Subject/Message (CreatedAt is time.Time -> RFC3339, structural, not sanitized) - formatEventBody: event.ID/Type/Subject/Body, CertificateID (CreatedAt structural, not sanitized) - formatMetadata: both keys and values The sendEmail / formatEmailMessage call sites continue to validate headers (From / To / Subject) via the existing ValidateHeaderValue fail-closed gate; the new sanitizer is body-side only. Tests (internal/validation/headers_test.go): TestSanitizeEmailBodyValue_PreservesSafeInput Pin: ordinary ASCII, UTF-8 multibyte (résumé / 日本語 / مرحبا), tabs, common cert DNs, URLs all flow through unchanged. TestSanitizeEmailBodyValue_StripsControlChars Table-driven across NUL, bare LF/CR, CRLF, BEL, backspace, DEL, C1 (U+0080 / U+009F), U+FFFD, TAB-preserve. TestSanitizeEmailBodyValue_StripsBidiOverride 7 attacker payloads (RLO, LRO, LRI, zero-width space, ZWNJ, BOM, MVS) — each must produce a non-identity output. TestSanitizeEmailBodyValue_ContentSpoofingScenario The CodeQL example case: 'alert\r\nReply-To: attacker@evil.com\r\n Click https://evil.example.com/reset' — verify NO CR/LF survives. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/validation/...: ok 0.374s go test -short -count=1 ./internal/connector/notifier/email/...: ok 0.186s Reference: https://github.com/certctl-io/certctl/security/code-scanning/11 Closes CodeQL alert #11 (go/email-injection).	2026-05-04 04:56:13 +00:00

1 2 3 4 5 ...

767 Commits