mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:31:30 +00:00
dcd82d062f03e8c3dde64600eb37b70849bb7130
182 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dcd82d062f |
docs: convert all 9 ASCII diagrams to mermaid
Audit of docs/ found 32 diagrams: 23 already in mermaid, 9 in ASCII
art (box-drawing chars / +-pipe boxes). Converting all 9 to mermaid
so GitHub renders them as actual diagrams in the docs preview.
Files affected (9 diagram blocks across 6 files):
docs/architecture.md block 1 line 706 EST request flow
docs/architecture.md block 2 line 798 SCEP request flow
docs/architecture.md block 3 line 893 Per-profile TrustAnchor +
Intune challenge dispatch
docs/architecture.md block 4 line 935 signer.Driver interface +
4 implementations
docs/ci-pipeline.md block 1 line 20 On-push pipeline tree
docs/est.md block 1 line 254 WiFi 802.1X / EAP-TLS flow
docs/legacy-est-scep.md block 1 line 40 TLS-version-bridging proxy
docs/qa-test-guide.md block 1 line 41 qa_test.go to demo stack
docs/scep-intune.md block 1 line 39 Intune cloud chain
Conversion notes:
- Linear flows → flowchart TD/LR. Per-step annotations that the
ASCII had as floating text between arrows are now edge labels —
cleaner and easier to read.
- architecture.md block 4 (signer drivers) → flowchart LR with a
subgraph for the Driver interface. Cleaner than a class diagram
for the "code uses one of these implementations" semantics.
- ci-pipeline.md tree → flowchart TD. Adds a dotted '-.depends
on.->' arrow making the go-build-and-test → deploy-vendor-e2e
dependency visually obvious (was a parenthetical in the ASCII).
- est.md WiFi/RADIUS → flowchart LR with EAP, Radius, trusts,
and EST as four distinct labeled arrows. The 'trusts' annotation
was floating off to the side in the ASCII; now it's the arrow
label between Radius and certctl CA.
- All semantic detail preserved: every node label, arrow direction,
inline annotation, and multi-line cell content carries through.
Verified: post-conversion audit shows 32 mermaid blocks, 0 ASCII.
Diff is symmetric — 108 inserts, 123 deletes — because mermaid is
slightly more compact than the box-drawing characters it replaces.
GitHub renders mermaid blocks natively in markdown previews since
2022, so all 9 diagrams now render as real flowcharts in the docs
view rather than as monospaced character art.
|
||
|
|
e06447b763 |
Revert CodeQL custom config + sanitizer model — leave alert #23 open
Reverts: |
||
|
|
1122f5a097 |
ci(codeql): teach analyzer about ValidateSafeURL SSRF barrier
Closes CodeQL alert #23 (go/request-forgery, Critical) at the structural level — by telling CodeQL what the runtime code already does — rather than via per-line `// codeql[...]` suppressions. Background. internal/service/scep_probe.go:232 calls client.Do(req) where the request URL is built from operator-supplied input. The runtime defense is two-layer: 1. validation.ValidateSafeURL(rawURL) at scep_probe.go:86 rejects non-http(s) schemes, empty hosts, literal-IP hosts in reserved ranges (loopback, link-local incl. cloud metadata 169.254.169.254, multicast, broadcast, unspecified, IPv6 link-local), and DNS names whose A/AAAA resolution returns any reserved IP. RFC 1918 is intentionally NOT blocked — see internal/validation/ssrf.go:17-21 for the design rationale. 2. validation.SafeHTTPDialContext on the http.Transport (line 254) re-resolves at dial time, applies the same reserved-IP set, and pins the dial to a literal non-reserved IP — defeating DNS rebinding between validate and dial. CodeQL's go/request-forgery query is a syntactic taint-tracking rule with no built-in knowledge of either validator, so it reports the finding even though the runtime is correctly defended. The fix. Add a Models-as-Data (MaD) extension at .github/codeql/ declaring ValidateSafeURL as a request-forgery barrier. The barrier applies to Argument[0] (the URL parameter), which means the analyzer treats every URL flowing through ValidateSafeURL as sanitized for the request-forgery taint set. After this lands: - Alert #23 dismisses at scep_probe.go:232. - The same model applies to the second site of this exact shape — webhook notifier's outbound client.Do (internal/connector/ notifier/webhook/webhook.go) — without per-line annotations. - Future code that flows operator URLs through ValidateSafeURL inherits the barrier automatically. This is the structural fix, not a band-aid: - Band-aid (rejected): `// codeql[go/request-forgery]` suppression on line 232. Suppresses one alert; doesn't teach the analyzer. Webhook notifier would need the same comment when its sibling rule landing fires. - Structural (this change): teach CodeQL via models-as-data, in config checked into the repo, that lives next to the workflow that uses it. The validators ARE sanitizers in the runtime — this PR makes the analyzer's model match reality. Files: - .github/codeql/qlpack.yml — local model pack manifest, declares extensionTargets: codeql/go-all: '*' - .github/codeql/models/request-forgery-sanitizers.model.yml — barrierModel row for validation.ValidateSafeURL Argument[0] / request-forgery taint kind / manual provenance - .github/codeql/codeql-config.yml — references the local pack + keeps security-and-quality query suite scope - .github/workflows/codeql.yml — Initialize CodeQL step picks up config-file: ./.github/codeql/codeql-config.yml. The existing `queries: security-and-quality` line stays so even if the config file fails to load, the suite scope is preserved. - docs/architecture.md::Input Validation and SSRF Protection — extended to name the egress validators (ValidateSafeURL + SafeHTTPDialContext) and the call sites (SCEP probe + webhook notifier). Closes the docs gap surfaced during the audit; the egress threat-model previously lived only in source comments. Requires CodeQL CLI ≥ 2.25.2 for the barrierModel extensible predicate (Go MaD support added 2026-04-21). github/codeql-action@v3 ships a recent enough CLI by default; if a future analysis fails with "unknown extensible predicate barrierModel", the action's CLI has regressed below 2.25.2 — pin a newer action version rather than reverting this pack. Documented inline in qlpack.yml. References: - https://codeql.github.com/docs/codeql-language-guides/customizing-library-models-for-go/ - https://github.blog/changelog/2026-04-21-codeql-now-supports-sanitizers-and-validators-in-models-as-data/ |
||
|
|
30970ab8a1 |
ci-pipeline-cleanup Phase 12: docs/ci-pipeline.md + bundle artefacts
Bundle: ci-pipeline-cleanup, Phase 12. NEW docs/ci-pipeline.md (operator-facing guide to the on-push pipeline): - Trigger model (push, daily, tag) - Per-job deep-dive for all 5 CI jobs + 2 CodeQL jobs - The 20 regression guards table with what each catches - Coverage threshold management - Three-tier make convention (verify, verify-deploy, verify-docs) - Adding a new check (where it goes, auto-pickup) - Troubleshooting matrix - Status check accounting (19 → 7) - Required GitHub branch protection list (operator action) NEW cowork/ci-pipeline-cleanup/v2.X.0-release-notes.md — operator-facing release notes covering all 13 phases + the operator action items post-merge. NEW cowork/ci-pipeline-cleanup/reddit-beat.md — Reddit / HN announce draft (don't auto-post; operator times manually after the tag lands). Active Focus updated in cowork/CLAUDE.md (workspace, separate edit since CLAUDE.md isn't in the repo) — added ci-pipeline-cleanup entry to 'Recently shipped bundles' + new env-var summary line + two new operator-decision items (RAM headroom + branch protection rules). |
||
|
|
b9a63a2521 |
ci-pipeline-cleanup Phase 6 follow-up: IIS operator playbook + matrix doc
Bundle: ci-pipeline-cleanup, Phase 6 follow-up. Phase 5+6 commit removed the deploy-vendor-e2e-windows matrix from ci.yml; this commit closes the Phase 6 deliverables that aren't ci.yml-side: 1. NEW docs/connector-iis.md::Operator validation playbook (Windows host) — the procedure operators run pre-release to flip the IIS / WinCertStore vendor-matrix cells from 'operator-playbook' → '✓'. Mirrors the Bundle II frozen decision 0.14 third-criterion (operator manual smoke required). 2. docs/deployment-vendor-matrix.md — IIS + WinCertStore rows status updated from 'pending' → 'operator-playbook' with link to the new playbook section. 3. deploy/docker-compose.test.yml — windows-iis-test sidecar comment updated to reflect that CI no longer activates this profile; sidecar definition preserved for operator local use via 'docker compose --profile deploy-e2e-windows up -d windows-iis-test'. Operator workflow going forward: - Pre-release: run the playbook on a Windows host - Record validation date + Windows Server version in cowork/<bundle>/iis-validation-receipts.md - Update docs/deployment-vendor-matrix.md cells if applicable |
||
|
|
0834bc1ad5 |
docs: deployment vendor matrix + per-connector deep-dive docs (NGINX + K8s + IIS + Apache + F5)
Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge_<vendor>_<edge>_E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml. |
||
|
|
b95a548f65 |
docs: deploy-hardening I — atomic deploy + post-verify operator guide + connectors / README updates
Phase 12 of the deploy-hardening I master bundle. NEW docs/deployment-atomicity.md (12 sections, ~280 lines): 1. Overview — the three procurement-checklist gaps closed 2. The atomic-write primitive (Plan / File / Apply algorithm) 3. Per-connector atomic contract table (all 13 connectors) 4. Post-deploy TLS verification (handshake + SHA-256 + retries) 5. Rollback semantics (3 triggers + escalation path) 6. ValidateOnly dry-run mode (per-connector matrix) 7. File ownership + mode preservation (precedence + per-distro defaults) 8. Per-target deploy mutex (Phase 2) 9. Idempotency via SHA-256 (defends against retry storms) 10. Troubleshooting matrix (one row per failure mode) 11. V3-Pro deferrals (multi-region, pin manifests, SOC 2 export) 12. Per-connector quick reference (paste-able config snippets) UPDATE README.md::Deployment Targets — every connector row now notes the atomic + verify + rollback semantics that landed in deploy-hardening I. Added a closing paragraph linking to the new docs/deployment-atomicity.md. UPDATE docs/features.md — two new env-var rows: - CERTCTL_DEPLOY_BACKUP_RETENTION (default 3, -1 disables) - CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT (default 60s) The G-3 docs-drift CI guard is satisfied: every new CERTCTL_DEPLOY_* env var documented here also appears in source (internal/deploy/types.go for BACKUP_RETENTION, k8ssecret config for KUBELET_SYNC_TIMEOUT). S-1 stale-counts guard: no literal-number current-state counts in the new doc — the per-connector tests are referenced via the file:line pattern (internal/connector/target/<name>/<name>_atomic_test.go) so the operator can grep for the actual count. Phase 13 next: pre-commit verification (full matrix + CI guard reproductions). |
||
|
|
faf580aa10 |
docs: production hardening II — DR runbook + crl-ocsp updates + features.md env vars (Phase 10)
Production hardening II Phase 10 — operator-facing documentation
that codifies the new V2 surfaces shipped in Phases 1-8.
NEW docs/disaster-recovery.md (8 sections, ~280 lines):
- Overview of automatic fail-safes already in code
- CRL cache recovery (delete row + scheduler regenerates)
- OCSP responder cert recovery (delete row + ensureOCSPResponder
re-bootstraps on next request)
- OCSP response cache recovery (delete row + read-through fallback)
- CA private-key rotation procedure (9-step playbook)
- Postgres restore (with explicit list of operator-managed
artifacts NOT in DB)
- Trust-bundle reload semantics (SCEP / EST / Intune SIGHUP-
equivalent fail-safe behavior)
- DR checklist (printable; pin near on-call)
This is the SOC 2 / PCI procurement-team deliverable. Auditors and
on-call operators get a single document that tells them what to do
when state corrupts, when keys need rotation, when Postgres needs
restoring. Nothing in the runbook requires new code — it codifies
behaviors already in the codebase.
UPDATED docs/crl-ocsp.md:
- New "Production hardening II additions" section: OCSP nonce
extension, OCSP pre-signed cache (with the load-bearing security
wire called out), per-source-IP OCSP rate limit, per-actor cert-
export rate limit, CRL HTTP caching headers (RFC 7232), CRL
DistributionPoints auto-injection, cert-export typed audit
codes, per-area Prometheus metrics with operator alert
recommendations.
- Pruned the V3-Pro deferral list to remove items that this
bundle SHIPPED (OCSP rate-limiting moved out; remaining V3-Pro:
delta CRLs, OCSP stapling, OCSP request signature verification,
HA / multi-region replication, IDP extension for sharded CRLs).
UPDATED docs/features.md:
- CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN row (default 1000)
- CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR row (default 50)
G-3 docs-drift CI guard reproduced clean: every new CERTCTL_* env
var documented in features.md AND consumed in Go source. S-1 stale-
counts guard clean (no literal-number prose for current-state
counts in README/docs).
|
||
|
|
6622883989 |
docs(est): EST RFC 7030 operator guide + WiFi/802.1X recipe + IoT bootstrap recipe + FreeRADIUS integration + architecture + README
EST RFC 7030 hardening master bundle Phase 12 — comprehensive operator- facing documentation for the Phases 1-11 backend work that shipped on 2026-04-29. NEW docs/est.md (19 sections, ~810 lines): Concepts (host vs user enrollment, profile-driven policy, multi-profile dispatch); 5-minute single-profile Quick start with curl + openssl recipes; Multi-profile dispatch (CERTCTL_EST_PROFILES=corp,iot,wifi setup with PathID rules enforced at boot); Authentication modes (mTLS / Basic / both / empty with cross-check semantics); RFC 9266 channel binding (failure-mode HTTP mapping table — ErrChannelBindingMissing/Mismatch/NotTLS13 → 400/409/426); WiFi/802.1X recipe with end-to-end FreeRADIUS integration (EAP-TLS supplicant config, mods-available/eap tls-common block, CRL distribution endpoint cross-ref, troubleshooting playbook); IoT bootstrap recipe (factory provisioning, first boot, steady-state renewal, compromise/decommission via bulk-revoke, recommended cert lifetimes per master prompt §7.7); serverkeygen for resource-constrained devices (CMS EnvelopedData wrap, RSA-only at this revision, zeroize discipline, Phase-1 cross-check refusing _SERVERKEYGEN_ENABLED=true with empty _PROFILE_ID); HSM-backed CA signing for EST cross-ref (signer interface seam); Operator GUI tabbed surface tour (/est: Profiles / Recent Activity / Trust Bundle); CLI + 6 MCP tools; Renewal device-driven model (RFC 7030 §4.2.2 mandate, renewal-trigger ratios for laptops/IoT, operator-push via webhook); Troubleshooting matrix (one row per typed audit-action constant in internal/service/est_audit_actions.go); TLS 1.2 reverse-proxy runbook cross-ref (channel-binding caveat explained); Threat model (load-bearing properties: trust-anchor reload fail-safety, per-profile counter isolation, mTLS cross-profile bleed defense, source-IP limiter process-locality, server-keygen heap residency, HTTP Basic in-process-only, legacy-anonymous-default back-compat carve-out); V3-Pro deferrals; Appendix A (libest sidecar reproducer + 5 integration test names); Appendix B (Cisco IOS 15.x + 16.x + Apple MDM + OpenWRT + libest <v3.0 wire-format quirks tested in internal/api/handler/cisco_ios_quirks_test.go). UPDATED docs/architecture.md: new "EST Server (RFC 7030) — Production Deployment" section under the existing baseline EST section. Mermaid diagram of multi-profile dispatch + mTLS sibling route + per-profile gate ordering + audit + GUI + SIGHUP-equivalent reload. Existing authentication paragraph updated with forward-ref to the hardening section. Audit paragraph updated to enumerate the 13 typed est_* action codes operators grep on. Trust-anchor reload semantics + libest interop tested in CI both called out. UPDATED README.md::Enrollment Protocols: replaced the one-line EST row with the full production-grade surface description matching the SCEP analog. Cross-references docs/est.md. UPDATED docs/connectors.md::EST/SCEP Integration: extended the EST-or-SCEP shared paragraph to point at the per-profile env-var form for both protocols + linked the new architecture.md section. NEW "Multi-profile EST dispatch + production hardening" subsection mirrors the SCEP equivalent: 9-row env-var table, cross-ref to docs/est.md. G-3 docs-drift CI guard reproduced locally clean — every CERTCTL_EST_* mention in docs maps back to internal/config/config.go, and every defined env var is documented. The `<NAME>` placeholder convention matches the SCEP idiom so the docs grep doesn't extract per-deploy profile names as phantom env vars. No new env vars introduced — this is a pure docs commit. |
||
|
|
8cc1153bd9 |
fix(docs/est): drop CERTCTL_EST_* wildcard prose to satisfy G-3 docs-drift guard
The previous commit (
|
||
|
|
827b9cb6c8 |
docs(est): document CERTCTL_EST_PROFILES + per-profile env-var family (G-3 fix)
The Phase 1 commit (
|
||
|
|
530593507b |
fix(scep-intune): close 11 audit gaps from 2026-04-29 pre-tag review
Closes the eleven gaps identified in the pre-v2.1.0 audit of the SCEP
RFC 8894 + Intune master bundle (cowork/scep-bundle-gap-closure-prompt.md).
Constitutional rule from cowork/CLAUDE.md::Operating Rules — 'Always
take the complete path, not the easy path' — drove this closure: each
gap was a load-bearing wire that crossed multiple layers (config →
validator → service wire-up → tests → docs) and shipping the bundle
without them would have produced lying-field footguns where operator-
visible config options stored values without affecting behavior.
WHAT LANDS:
Phase A — Clock-skew tolerance (master prompt §15 hazard closure)
internal/scep/intune/challenge.go: ValidateChallenge migrated from
positional args to ValidateOptions{} struct; new ClockSkewTolerance
field with default 0 (strict). 24 call sites updated mechanically.
Asymmetric application: now+tolerance >= iat AND now-tolerance < exp.
internal/config/config.go: SCEPIntuneProfileConfig.ClockSkewTolerance
default 60s + Validate() refusal when >= ChallengeValidity.
cmd/server/main.go: SetIntuneIntegration signature extended;
per-profile env-var loader honors CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CLOCK_SKEW_TOLERANCE.
internal/service/scep.go: intuneClockSkew field + IntuneStatsSnapshot
surfaces clock_skew_tolerance_ns. web/src/api/types.ts mirrors.
4 new tests in challenge_test.go covering accept-within-tolerance,
reject-beyond-tolerance, accept-expired-within-tolerance,
negative-treated-as-zero defensive normalization.
docs/scep-intune.md updated with the new env var + time-bounds rule.
Phase B — unknown-version-rejected golden test
internal/scep/intune/golden_helper_test.go: goldenUnknownVersionPayload
helper + signGoldenChallengeAny generic signer.
challenge_golden_test.go: TestGoldenChallenge_UnknownVersionRejected
uses an in-process ECDSA fixture (the on-disk PEM was generated with
a Go-stdlib version that produces different ecdsa.GenerateKey bytes
from the current call). TestRegenerateGoldenFixtures emits the new
unknown_version fixture file too.
Phase C — Two named Intune e2e tests
internal/api/handler/scep_intune_e2e_test.go:
TestSCEPIntuneEnrollment_RateLimited_E2E (cap=2 + 3 attempts; 3rd
returns FAILURE+badRequest with rate_limited counter ticked)
TestSCEPIntuneEnrollment_TrustAnchorSIGHUPReload_E2E (rotate
on-disk PEM + holder.Reload(); old-key challenge fails with
badMessageCheck; signature_invalid counter ticked)
intuneE2EFixture struct extended with trustHolder + trustPath fields
so tests can rotate.
Phase D — Four new ChromeOS hermetic tests (10 total now)
internal/api/handler/scep_chromeos_test.go:
_RAKeyMismatch — PKIMessage encrypted to wrong RA cert; handler
rejects without reaching service.
_3DESBackwardCompat — RFC 8894 §3.5.2 legacy fallback verified.
_RSACSR + _ECDSACSR — explicit matrix-pair pinning.
buildTestECDSACSR helper for ECDSA P-256 CSR construction;
tripleDESCBCEncrypt mirrors aesCBCEncrypt for 3DES-CBC;
assertChromeOSPositiveCertRep shared assertion.
Phase E — Per-profile counter isolation test
internal/api/handler/scep_profile_counter_isolation_test.go:
TestSCEPHandler_PerProfileIntuneCountersIsolated wires two
SCEPService instances + drives distinct PKIMessages + asserts
counter isolation. Guards against a future cmd/server/main.go
refactor that shares a *intuneCounterTab across profiles.
buildPerProfileIntuneFixture parameterized helper.
Phase F — Server-boot regression tests
cmd/server/preflight_scep_intune_test.go: 3 named tests covering
disabled-backward-compat, broken-config-with-PathID, expired-cert
refusal. preflightSCEPIntuneTrustAnchor signature extended with
pathID arg so error messages carry PathID= for operator log-grep.
Phase G — docs/connectors.md
Four new subsections under §EST/SCEP Integration: multi-profile
dispatch + mTLS sibling route + Intune Connector dispatcher + SCEP
probe in network scanner. Each has a one-paragraph operator
explanation + an env-var or endpoint table.
Phase H — Coverage uplift
internal/service/scep_probe_persist_test.go: 5 unit tests on
persistProbeResult (nil-safe + nil-repo-safe + repo-error swallow +
nil-logger guard) + ListRecentSCEPProbes (empty-slice-not-nil + repo
pass-through) + describeCertAlgorithm (RSA/ECDSA/QF1008-nil-curve
defensive branch/Ed25519/DSA/empty). CI gates (service ≥70, handler
≥75) PASS at 70.9% / 79.3%.
Phase I — deploy/test integration variant
deploy/test/scep_intune_e2e_test.go (//go:build integration):
TestSCEPIntuneEnrollment_Integration + _RateLimited_Integration
against the live docker-compose certctl container. Skip-when-
stack-missing semantics so sandbox + CI both work.
deploy/docker-compose.test.yml: new e2eintune SCEP profile env
vars + bind-mount of deploy/test/fixtures/.
deploy/test/fixtures/README.md: documents the deterministic trust
anchor regeneration recipe.
VERIFICATION (sandbox):
gofmt -d — clean for all changed files
staticcheck — clean for intune + handler + config + service +
cmd/server packages
go vet — clean for the same packages
go test -short — green for intune (95.3% cov), service (70.9%),
handler (79.3%), config (94.0%), cmd/server (boot
path; my preflight tests cover the directly-
testable function), pkcs7 (80.5% informational)
DEFERRED (per closure prompt §7 out-of-scope):
- V3-Pro Conditional Access gating + Microsoft Graph integration
- Standalone certctl-scan CLI binary
- OCSP rate-limiting, OCSP stapling, delta CRLs
Spec preserved at cowork/scep-bundle-gap-closure-prompt.md;
journal at cowork/scep-rfc8894-intune/progress.md (audit-closure
section appended).
|
||
|
|
0be889ff1d |
refactor(scep-gui): rebrand SCEP admin surface to per-profile tabbed interface (Profiles + Intune + Recent Activity)
Phase 9 follow-up to the SCEP RFC 8894 + Intune master bundle. The
Phase 9.4 GUI shipped 'SCEP Intune Monitoring' at /scep/intune, which
made the per-profile observability surface look Intune-only — operators
running EJBCA + Jamf would never click that nav link expecting per-
profile RA cert + mTLS observability. The page is per-profile keyed
under the hood; this commit rebrands + restructures so the surface
matches what operators actually need.
Spec: cowork/scep-gui-restructure-prompt.md.
User-visible change:
- Nav link renamed: 'SCEP Intune' → 'SCEP Admin'.
- Route: /scep is the new canonical path; /scep/intune kept as a
backward-compat alias that lands directly on the Intune tab.
- Page header: 'SCEP Administration'.
- Three tabs:
* Profiles (default) — per-profile lean cards with RA cert
expiry countdown, mTLS sibling-route status badge, Intune
enabled/disabled badge, challenge-password-set indicator.
'View Intune details →' link on Intune-enabled cards
deep-links into the Intune tab.
* Intune Monitoring — the existing Phase 9.4 deep-dive
(per-status counters, trust anchor expiry, recent failures
table, reload-trust button + confirmation modal).
* Recent Activity — full SCEP audit log filter merging all
four action codes (scep_pkcsreq + scep_renewalreq +
scep_pkcsreq_intune + scep_renewalreq_intune); chip filters
for All / Initial / Renewal / Intune / Static.
Backend:
* internal/service/scep.go — new SCEPProfileStatsSnapshot type +
IntuneSection sub-block + ProfileStats(now) accessor. Adds
raCertSubject/raCertNotBefore/raCertNotAfter + mtlsEnabled +
mtlsTrustBundlePath fields with SetRACert + SetMTLSConfig setters.
Existing IntuneStatsSnapshot + IntuneStats(now) preserved
UNCHANGED for /admin/scep/intune/stats backward compat (the
JSON shape stays byte-stable for external consumers — the
aliasing approach the prompt initially suggested doesn't work
because the new shape nests Intune while the old one is flat).
ChallengePasswordSet is derived from challengePassword != ''
(the secret value itself is never surfaced).
* internal/api/handler/admin_scep_intune.go — new Profiles handler
method on AdminSCEPIntuneHandler with the same M-008 admin gate.
AdminSCEPIntuneServiceImpl extended (in place; same
map[string]*service.SCEPService) to satisfy the new
AdminSCEPProfileService interface. Single handler file gets the
third method so the M-008 pin entry count stays steady (no new
file, no new triplet of admin-gate test files — just three new
Profiles tests inside the existing test file).
* internal/api/router/router.go — one new route
'GET /api/v1/admin/scep/profiles' registered to
reg.AdminSCEPIntune.Profiles. HandlerRegistry unchanged.
* api/openapi.yaml — new operation 'listSCEPProfiles' documenting
the request body / response shape / error mapping. Existing
Intune entries unchanged.
* cmd/server/main.go — per-profile loop now calls
scepService.SetMTLSConfig(profile.MTLSEnabled,
profile.MTLSClientCATrustBundlePath) right after SetPathID, and
scepService.SetRACert(raCert) right after loadSCEPRAPair returns
the leaf cert. Both setters are nil-safe.
* internal/api/handler/m008_admin_gate_test.go — extended the
existing admin_scep_intune.go entry's justification to mention
the third endpoint. No new map entry needed (file already
listed).
Backend tests (8 new):
* TestAdminSCEPProfiles_NonAdmin_Returns403
* TestAdminSCEPProfiles_AdminExplicitFalse_Returns403
* TestAdminSCEPProfiles_AdminPermitted_ForwardsActor — also pins
that Intune-enabled profiles emit an 'intune' sub-block while
Intune-disabled profiles OMIT it.
* TestAdminSCEPProfiles_RejectsNonGetMethod
* TestAdminSCEPProfiles_PropagatesServiceError
* TestAdminSCEPProfilesServiceImpl_NilMapReturnsEmpty
* (existing 16 Phase 9 admin tests still pass — backward-compat
preserved)
Frontend:
* web/src/api/types.ts — new SCEPProfileStatsSnapshot +
IntuneSection + SCEPProfilesResponse types. Existing
IntuneStatsSnapshot et al unchanged.
* web/src/api/client.ts — new getAdminSCEPProfiles helper.
* web/src/pages/SCEPAdminPage.tsx — full rewrite as the tabbed
surface. Reuses the existing ConfirmReloadModal and Intune
deep-dive card components verbatim; adds ProfileSummaryCard
(lean card for the Profiles tab) and ActivityTab. URL state
sync via useSearchParams so deep links survive reloads + browser
back/forward. The legacy /scep/intune route alias defaults the
activeTab to 'intune' on mount.
* web/src/main.tsx — new <Route path='scep' /> + preserved
<Route path='scep/intune' /> alias. Both render SCEPAdminPage.
* web/src/components/Layout.tsx — nav link rebranded:
label 'SCEP Intune' → 'SCEP Admin', to '/scep/intune' → '/scep'.
Frontend tests (20 — full rebuild):
* Admin gate (non-admin sees gated banner + zero admin API calls)
* Profiles tab default + Intune tab tabswitch + ?tab=intune deep
link + legacy /scep/intune alias all land on Intune
* Profiles tab status badges (Intune + mTLS + challenge-set)
reflect each profile's flags
* RA cert expiry tone bands (good ≥30d / warn 7-30d / bad <7d /
EXPIRED) verified across three fixture profiles
* 'View Intune details →' only renders for Intune-enabled
profiles AND switches tabs on click
* Empty-state banner when no profiles configured
* Intune tab counters render with the existing Phase 9 deep-dive
shape; reload modal Open/Confirm/Cancel/Error paths all pinned
* Recent Activity tab merges all four SCEP audit actions across
four parallel useQuery calls; filter chips
(all/initial/renewal/intune/static) narrow correctly
* Error path surfaces ErrorState on the active tab
Docs:
* docs/scep-intune.md — Operational monitoring section heading
expanded to '(SCEP Administration → Intune Monitoring tab)'.
Page-surface description rewritten for the tabbed shape;
admin-endpoints list extended with the new /admin/scep/profiles
entry.
* docs/architecture.md — Microsoft Intune Connector trust anchor
subsection updated to reference the Intune Monitoring tab inside
the SCEP Administration page + lists all three admin endpoints.
* docs/legacy-est-scep.md — forward-ref expanded with a parallel
sentence for the per-profile observability surface (independent
of Intune).
* README.md — Enrollment Protocols bullet for Intune updated to
'admin GUI SCEP Administration page at /scep' with the three
tabs called out.
Verification:
* gofmt clean on touched files
* go vet ./... clean
* staticcheck on intune+service+handler+router+cmd-server clean
* go test -short across intune+service+handler+router+cmd-server:
all green (existing Phase 9 tests + new Profiles tests)
* Frontend tsc --noEmit clean
* Vitest: 20/20 SCEPAdminPage tests + 3/3 sibling AuditPage tests
pass
* G-3 docs-drift CI guard reproduced locally: clean (no new env
vars; existing CERTCTL_SCEP_ allowlist prefix covers everything)
* M-009 hard-zero useMutation guard reproduced locally: clean
(the existing reload mutation already used useTrackedMutation
from the Phase 9 follow-up commit
|
||
|
|
5d080c86fd |
docs(scep-intune): deployment guide + troubleshooting + Microsoft support statement
Phase 11 of the SCEP RFC 8894 + Intune master bundle.
Phase 11.1 — docs/scep-intune.md (new, ~340 lines):
* TL;DR — drop-in NDES replacement framing; what an operator gets
over NDES (per-profile endpoints, audit-log forensics, SIGHUP
reload, GUI monitoring, per-device rate limit).
* Architecture diagram — Intune cloud → Connector → certctl SCEP
→ issuer connector. Explicit 'certctl replaces NDES, NOT the
Connector' framing; nine-gate dispatcher walk (shape pre-check,
JWS sig, version dispatch, time bounds, audience pin, CSR binding,
replay, per-device rate limit, optional compliance).
* Migration playbook (NDES + EJBCA / NDES + ADCS) — 9-step run-book:
install alongside, configure per-profile endpoint, extract trust
anchor, configure CONNECTOR_CERT_PATH + AUDIENCE, configure
issuer connector, migrate one profile, verify enrollment, roll
out fleet, decommission NDES.
* Intune SCEP profile field mapping table — every Intune admin
center field mapped to certctl's behavior (cert type, subject
name format, SAN, validity, key storage provider, key usage,
EKU, hash algorithm, SCEP server URL).
* Trust anchor extraction recipe — step-by-step certlm.msc export
of the 'CN=Microsoft Intune Certificate Connector' cert, PEM
rename, env-var configuration, HA Connector concatenation, SIGHUP
rotation flow.
* Troubleshooting matrix — 10 failure modes mapped to root causes
and operator actions: signature_invalid (trust anchor stale),
claim_mismatch (Intune profile SAN config), expired (clock skew /
Connector cert past NotAfter), not_yet_valid (reverse skew),
wrong_audience (URL mismatch), replay (retry-window collision),
rate_limited (limiter doing its job), unknown_version (Microsoft
shipped new format), malformed (proxy mangling body),
compliance_failed (V3-Pro hook returned non-compliant).
* Operational monitoring — admin GUI surface description, expiry
badge tone bands (≥30d green / 7-30d amber / <7d red / EXPIRED),
per-status counter polling cadence, audit log filter, recommended
Prometheus alert thresholds.
* Limitations — explicit V3-Pro deferrals: native Microsoft Graph
integration, Conditional Access compliance gating, per-tenant
trust anchors (MSP scoping), OCSP stapling at SCEP-response time,
auto-discovery of Connector signing cert.
* Microsoft support statement — three Microsoft Learn URLs (verified
live with HTTP 200): Connector overview, SCEP profile setup,
Connector install validation. Microsoft documents the Connector
as RFC-8894-compliant and supports its use against any RFC 8894
SCEP server.
Phase 11.2 — Cross-references:
* docs/legacy-est-scep.md — the previous forward-ref pointed at
'the Phase 11 doc this bundle ships'; updated to a richer pointer
that lists what scep-intune.md covers (architecture, migration,
profile mapping, extraction, troubleshooting, monitoring,
limitations, Microsoft support).
* README.md — new bullet under Enrollment Protocols table:
'Microsoft Intune SCEP fleet (drop-in NDES replacement)' with
the per-profile dispatcher feature list + link to scep-intune.md.
Procurement teams scanning the README see the Intune story
alongside ChromeOS / Jamf in the same table row.
* docs/architecture.md — new 'Microsoft Intune Connector trust
anchor (per-profile, opt-in)' subsection in the Security Model
section. ASCII diagram showing the dispatcher walk; calls out
the SIGHUP reload + admin-gated GUI surface; forward-link to
scep-intune.md.
Verification:
* All linked anchors inside scep-intune.md resolve to existing
headings: #limitations, #microsoft-support-statement,
#operational-monitoring, #trust-anchor-extraction.
* All linked doc paths resolve: legacy-est-scep.md, architecture.md,
features.md, tls.md.
* All three Microsoft Learn URLs return HTTP 200 (verified via curl).
* G-3 docs-drift CI guard reproduced locally and clean — the
migration playbook uses the <NAME> placeholder convention
consistently (matching features.md style) so the docs scanner
doesn't extract literal env-var names that aren't in config.go.
* Backend tests across intune+handler+service+router still green.
Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 11
cowork/scep-rfc8894-intune/progress.md
|
||
|
|
7612da783a |
feat(scep-intune): per-profile dispatcher + SIGHUP reload + per-device rate limit + compliance hook seam
Phase 8 of the SCEP RFC 8894 + Intune master bundle. Wires the internal/scep/intune validator from Phase 7 into the SCEPService dispatch path, with a SIGHUP-reloadable trust anchor holder, a per-(Subject, Issuer) sliding-window rate limiter, and a nil-default ComplianceCheck seam for V3-Pro. Operator-visible surface (per-profile, all default to off): CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CONNECTOR_CERT_PATH=/etc/certctl/intune.pem CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_AUDIENCE=https://certctl.example.com/scep/corp CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CHALLENGE_VALIDITY=60m CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_PER_DEVICE_RATE_LIMIT_24H=3 Per-profile dispatch (Phase 8.8): an operator running corp-laptops through Intune AND IoT devices through static challenge configures INTUNE_ENABLED=true on the corp profile only — the IoT profile's PKCSReq path skips the dispatcher entirely. Mirrors the per-profile shape established by Phase 1.5. Wire-in surfaces: * config.go (Phase 8.1): SCEPProfileConfig.Intune sub-config of type SCEPIntuneProfileConfig (Enabled/ConnectorCertPath/Audience/ ChallengeValidity/PerDeviceRateLimit24h). Loaded from the indexed CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_* env-var family. Per-profile Validate gate refuses INTUNE_ENABLED=true with empty ConnectorCertPath OR negative PerDeviceRateLimit24h. * cmd/server/main.go (Phase 8.2 + wire-in): preflightSCEPIntuneTrustAnchor helper mirrors preflightSCEPRACertKey/preflightSCEPMTLSTrustBundle shape — fail-loud at boot when the trust anchor file is missing / unreadable / empty / contains an expired cert. The per-profile loop builds the holder + replay cache + rate limiter, calls SetIntuneIntegration on the SCEPService, and starts the SIGHUP watcher. A deferred sweep stops every watcher at shutdown. * internal/scep/intune/trust_anchor_holder.go (Phase 8.5): TrustAnchorHolder mirrors cmd/server/tls.go::certHolder. RWMutex- guarded pool + Reload that swaps a fresh slice on success + WatchSIGHUP goroutine that responds to the same SIGHUP the existing TLS-cert watcher uses. A bad reload (parse error, expired cert) keeps the OLD pool in place so a half-rotation doesn't take Intune enrollment down — same fail-safe pattern. Operators rotate via the on-disk file then 'kill -HUP <certctl-pid>'. * internal/scep/intune/rate_limit.go (Phase 8.6): hand-rolled sliding-window-log limiter keyed by (Subject, Issuer). 100k-entry map cap (matches replay cache); at-cap drops the bucket whose newest timestamp is the oldest. Default 3 enrollments per 24h covers legitimate first-cert + recovery + post-wipe re-enrollment but blocks bulk enumeration from a compromised Connector signing key. maxN <= 0 disables the limiter for tests + the rare operator who wants no per-device cap. Empty subject short-circuits to allow (defense-in-depth: caller's claim validation rejects empty-subject upstream; no shared bucket on ''). Why hand-rolled instead of golang.org/x/time/rate: the rate package is in go.sum as an indirect transitive but not a direct dep. ~30 LoC of stdlib avoids creating a new direct dep. * internal/service/scep.go (Phase 8.3 + 8.4 + 8.7): - SCEPService gains intuneEnabled / intuneTrust / intuneAudience / intuneValidity / intuneReplayCache / intuneRateLimiter / complianceCheck fields. - SetIntuneIntegration() constructor-time injection wires the per-profile state. Profiles with INTUNE_ENABLED=false never call this method, so they pay zero overhead. - SetComplianceCheck() installs the V3-Pro plug-in (see Phase 8.7). - looksIntuneShaped(): JWT-shape pre-check (length > 200 + exactly two dots). Allowed to false-positive (validator catches malformed → ErrChallengeMalformed); MUST NOT false-negative on real Intune challenges. - dispatchIntuneChallenge(): the load-bearing core. Runs ValidateChallenge → CSR-binding via DeviceMatchesCSR → replay cache CheckAndInsert → per-device Allow → optional ComplianceCheck. Each failure leg increments a typed metric label and emits an audit-friendly Warn log line. - PKCSReq + PKCSReqWithEnvelope + RenewalReqWithEnvelope all call dispatchIntuneChallenge first; on outcome.decided=true they either short-circuit (with a typed-error → SCEPFailInfo mapping) or call processEnrollment with action='scep_pkcsreq_intune' (so audit greps can count Intune-vs-static enrollments). - mapIntuneErrorToFailInfo(): typed-error → SCEPFailInfo per RFC 8894 §3.2.1.4.5 (signature/replay/expired → BadMessageCheck; claim-mismatch → BadRequest; default → BadRequest). - intuneFailReason(): typed-error → metric label ('signature_invalid' / 'expired' / 'rate_limited' / etc.). Default 'malformed' so a previously-unseen error category still surfaces in the metric for follow-up. - ComplianceCheck (Phase 8.7): nil-default no-op gate. V3-Pro plugs in via SetComplianceCheck to call Microsoft Graph's compliance API. Returns (compliant, reason, err). nil-err + compliant=false → CertRep FAILURE + 'compliance' reason in audit. err != nil → fail-safe deny (V3-Pro module is responsible for any 'permit on API failure' policy). * internal/service/scep.go also gains parseCSRForIntune() — small private wrapper around encoding/pem + x509 used by the dispatcher for the claim ↔ CSR binding check (separated from the broader processEnrollment because we want to bind BEFORE consuming the replay-cache slot). Tests (gates: ≥85% coverage on intune package, ≥70% on service): * scep_intune_test.go (in internal/service): 14 dispatcher tests covering happy-path Intune enrollment + static-challenge fallback + tampered-challenge reject + claim-mismatch reject + replay detected + rate-limited + compliance-hook nil-default + compliance- hook denies non-compliant + compliance-hook error fails closed + IntuneEnabled accessor + 'no IntuneEnabled = static path unchanged' regression pin + intuneFailReason mapping for every typed error + looksIntuneShaped boundary cases. * trust_anchor_holder_test.go (in internal/scep/intune): NewLoadsBundle, NewRequiresLogger, NewSurfacesLoadError, ReloadHappyPath, ReloadKeepsOldOnFailure, ReloadKeepsOldOnExpired (the fail-safe semantics that make the SIGHUP path operator-friendly), WatchSIGHUPReloadsPool (real SIGHUP to self with poll-for-swap pattern mirroring cmd/server/tls_test.go), WatchSIGHUPStopIsClean (does NOT fire SIGHUP after stop — same caveat as the TLS test: the Go runtime would otherwise terminate the test runner on the next SIGHUP since signal.Stop has removed the handler). * rate_limit_test.go (in internal/scep/intune): AllowsUpToCap, DistinctKeysIndependent, WindowExpiry, DisabledBypass (maxN=0), NegativeCapDisabled, EmptySubjectShortCircuits (defense-in-depth against an empty-subject DoS chokepoint), DefaultCapsHonored, MapCapEvictsOldest (at-cap eviction branch), ConcurrentRaceFree (50 goroutines × 200 inserts), pruneOlderThan + the no-op case. Verification: * gofmt -l on all touched files: clean * go vet ./... : clean * staticcheck on intune/service/config/cmd-server: clean * go test -count=1 -cover ./internal/scep/intune/...: 94.8% (target ≥85%) * go test -short across intune+service+config+handler+cmd-server: all green * G-3 docs-drift CI guard reproduced locally: docs-only filtered= empty, config-only=empty. The new env vars match the existing CERTCTL_SCEP_ allowlist prefix. Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 8 cowork/scep-rfc8894-intune/progress.md Constitutional rule: 'Always take the complete path, not the easy path' (cowork/CLAUDE.md::Operating Rules) — operator can flip CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true and observe the dispatcher pick up Intune-shaped challenges end-to-end with no further code changes. Foundation + plumbing ship together. |
||
|
|
a12a437664 |
feat(scep): mTLS sibling route /scep-mtls/<pathID> (opt-in)
SCEP RFC 8894 + Intune master bundle — Phase 6.5 of 14 (opt-in,
enterprise-procurement-checkbox).
Closes the procurement-team objection that 'shared password
authentication' is a checkbox-fail regardless of how strong the
password is. The clean answer: a sibling route that adds client-cert
auth at the handler layer AND keeps the challenge password (defense in
depth, not replacement). Devices present a bootstrap cert from a
trusted CA (e.g. a manufacturing-time cert), then SCEP-enroll for
their long-lived cert. Same model Apple's MDM and Cisco's BRSKI use.
internal/config/config.go
* SCEPProfileConfig gains MTLSEnabled bool + MTLSClientCATrustBundlePath
string. Indexed env-var loader reads
CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED +
CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH.
* Validate() refuses MTLSEnabled=true with empty bundle path —
structural defense in depth ahead of the file-content preflight.
cmd/server/main.go
* preflightSCEPMTLSTrustBundle: file existence + PEM parse + ≥1
CERTIFICATE block + non-expired check. Returns the parsed
*x509.CertPool ready to inject into the per-profile SCEPHandler.
Failures os.Exit(1) with the offending PathID in the structured log.
* SCEP startup loop walks each profile; when MTLSEnabled, runs
preflight, builds the per-profile pool, contributes the bundle's
certs to the union pool that backs the TLS-layer
VerifyClientCertIfGiven, clones the SCEPHandler with
SetMTLSTrustPool, and registers the parallel sibling route via
apiRouter.RegisterSCEPMTLSHandlers.
* Union pool published to outer scope as scepMTLSUnionPoolForTLS;
passed to buildServerTLSConfigWithMTLS so the listener serves both
/scep[/<pathID>] (no client cert) and /scep-mtls/<pathID>
(cert required at handler layer) on the same socket.
* Final-handler dispatch gains /scep-mtls + /scep-mtls/* prefix
routing through the no-auth chain (auth boundary is the client
cert + challenge password, NOT a Bearer token).
cmd/server/tls.go
* New buildServerTLSConfigWithMTLS that wraps buildServerTLSConfig
+ sets ClientCAs + ClientAuth=VerifyClientCertIfGiven when a
non-nil pool is passed. nil pool = identical TLS shape to the
pre-Phase-6.5 builder (no behavior change for deploys without
mTLS profiles).
* Critical: VerifyClientCertIfGiven (NOT RequireAndVerifyClientCert)
so a client that doesn't present a cert can still hit the standard
/scep route. The per-profile gate at the handler layer enforces
'cert required' on /scep-mtls/<pathID>.
internal/api/handler/scep.go
* SCEPHandler gains mtlsTrustPool *x509.CertPool field +
SetMTLSTrustPool method. Per-profile pool injected by
cmd/server/main.go after preflight.
* HandleSCEPMTLS wrapper: gates on r.TLS.PeerCertificates non-empty
+ per-profile cert.Verify against THIS profile's pool. Returns
HTTP 401 for missing/untrusted cert (mTLS failure is auth, not
authorization). Returns HTTP 500 if mtlsTrustPool is nil (deploy
bug — the route shouldn't have been registered). On success
delegates to HandleSCEP — defense in depth: mTLS is additive,
NOT replacement; the standard SCEP code path including the
challenge-password gate still executes.
* Per-profile re-verification via cert.Verify(...) is critical:
the TLS layer verified against the UNION pool, so a cert that
chains to profile A's bundle would pass TLS even when targeting
profile B. The handler-layer gate prevents cross-profile
bleed-through.
internal/api/router/router.go
* AuthExemptDispatchPrefixes gains '/scep-mtls' (auth boundary is
client cert + challenge password, NOT Bearer token).
* RegisterSCEPMTLSHandlers parallel to RegisterSCEPHandlers:
empty PathID maps to /scep-mtls root; non-empty maps to
/scep-mtls/<pathID>. Each handler in the map MUST have had
SetMTLSTrustPool called.
internal/api/router/openapi_parity_test.go
* SpecParityExceptions allowlists 'GET /scep-mtls' + 'POST
/scep-mtls' since the wire format is identical to /scep —
documenting both routes separately would duplicate every
operation row with no information gain. Documented alternative
in docs/legacy-est-scep.md.
internal/api/handler/scep_mtls_test.go (new, ~210 LoC)
* 6 tests + 2 helpers covering the auth contract:
1. RejectsMissingClientCert — request with r.TLS=nil → 401
2. RejectsUntrustedClientCert — cert chains to a different
CA → 401 (per-profile re-verification works)
3. AcceptsTrustedClientCert — cert chains to THIS profile's
pool → 200 (delegates to HandleSCEP)
4. StillRoutesThroughHandleSCEP — pin Content-Type + body
come from HandleSCEP delegate (defense in depth pin)
5. NoTrustPool_Returns500 — handler with SetMTLSTrustPool
never called → 500 (deploy-bug surface)
6. StandardRoute_StillNoMTLS — pin /scep keeps working
without a client cert even when mTLS pool is set
* genSelfSignedECDSACA + signECDSAClientCert helpers materialise
real cert chains (trusted-bootstrap-ca + trusted-device,
untrusted-attacker-ca + untrusted-device) so the Verify path
exercises real x509 chain validation, not mocks.
docs/features.md
* SCEP env-vars table extended with the two new MTLS env vars
(CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED,
CERTCTL_SCEP_PROFILE_<NAME>_MTLS_CLIENT_CA_TRUST_BUNDLE_PATH).
Closes the G-3 'env var defined in Go but never documented' gate.
docs/legacy-est-scep.md
* New 'mTLS sibling route (Phase 6.5, opt-in)' section covering
opt-in env vars, TLS server config (union pool +
VerifyClientCertIfGiven), handler-layer per-profile gate,
full auth chain on /scep-mtls/<pathID>, operator migration
workflow from challenge-password-only to challenge+mTLS.
cowork/CLAUDE.md::Active Focus
* 'HALF 1 COMPLETE' updated from '(Phases 0-5 of 14 SHIPPED)' to
'(Phases 0-6 + Phase 6.5 of 14 SHIPPED)'.
Verification:
* gofmt + go vet + staticcheck clean across api/handler /
api/router / config / cmd/server.
* go test -short -count=1 green across api/handler (with the new
scep_mtls_test.go) / api/router / service / config / pkcs7 /
cmd/server / connector/issuer/local.
* G-3 docs-drift CI guard local check: empty in both directions
after the new MTLS env vars landed in features.md.
* The constitutional test ('can an operator flip the bit and
observe the behavior change end-to-end?') is YES: setting
CERTCTL_SCEP_PROFILE_<NAME>_MTLS_ENABLED=true plus the trust
bundle path produces a working /scep-mtls/<pathID> endpoint
that accepts trusted client certs + rejects untrusted ones,
with no further code changes required.
Phase 6.5 of 14 in SCEP RFC 8894 + Intune master bundle.
Half 1 (Phases 0-6 + 6.5) is now FEATURE-COMPLETE for the
ChromeOS / general-MDM use case. Half 2 (Phases 7-12) adds the
Microsoft Intune dynamic-challenge layer.
|
||
|
|
b857bdc560 |
docs(scep): close G-3 docs-only drift in legacy-est-scep.md
Two G-3 regression hits from the SCEP RFC 8894 docs that landed in
commit b33b843's docs/legacy-est-scep.md addition:
1. CERTCTL_SCEP_PROFILE_CORP_* (5 vars) — the multi-profile dispatch
recipe used literal CORP placeholders in the example block, which
the G-3 scanner treats as phantom env vars (the loader expands
<NAME> at runtime; CORP is never a literal env-var key in Go
source). Replaced the literal example with a prose description
that uses the <NAME> token explicitly + cross-references
docs/features.md where the per-profile suffix table lives. The
G-3 scanner sees only CERTCTL_SCEP_PROFILES + the prefix
CERTCTL_SCEP_ (already on the ALLOWED list per commit
|
||
|
|
23603f5174 |
docs(scep): RFC 8894 hardening — README + architecture + connectors
SCEP RFC 8894 + Intune master bundle — Phase 6 of 14.
Closes Half 1 of the bundle (Phases 0-6). The certctl SCEP server now
ships full RFC 8894 wire format (EnvelopedData decrypt + signerInfo POPO
verify + CertRep PKIMessage builder), tested against ChromeOS-shape
hermetic E2E requests, with multi-profile dispatch and must-staple
per-profile policy. Half 2 (Phases 7-12) adds the Microsoft Intune
dynamic-challenge layer; Phase 6.5 (mTLS sibling route) is independently
shippable as an opt-in enterprise-procurement feature.
README.md
* Standards & Revocation table SCEP row updated to mention full RFC
8894 wire format (EnvelopedData decryption, signerInfo POPO
verification, CertRep PKIMessage builder), PKCSReq + RenewalReq +
GetCertInitial messageType dispatch, multi-profile dispatch
(/scep/<pathID>), per-profile RA cert + key, MVP fall-through for
lightweight clients.
* Enrollment protocols paragraph extended with the same scope, plus
a link to docs/legacy-est-scep.md for the operator + device-
integration guide.
docs/architecture.md
* SCEP wire format paragraph rewritten to describe the two paths
(RFC 8894 first, MVP fall-through), the messageType dispatch
table, the EnvelopedData decrypt (constant-time PKCS#7 unpad
closing the padding-oracle leg), the SET-OF Attribute
re-serialisation quirk per RFC 5652 §5.4, and the CertRep
PKIMessage shape (cert chain encrypted to req.SignerCert, NOT
the RA cert).
* SCEP service interface updated to show the three new
*WithEnvelope variants alongside the legacy PKCSReq method.
* Added 'Capabilities advertised', 'Multi-profile dispatch', and
'Must-staple per profile' subsections covering the RFC 7633
extension policy.
docs/connectors.md
* EST/SCEP Integration section extended with the per-profile
issuer-binding env-var form (CERTCTL_SCEP_PROFILE_<NAME>_ISSUER_ID).
* New SCEP RA cert + key paragraph pointing operators at the
legacy-est-scep.md openssl recipe + ChromeOS Admin Console
pointer + must-staple per-profile policy.
cowork/CLAUDE.md::Active Focus
* 2026-04-29 SCEP RFC 8894 + Intune master bundle status updated
to 'HALF 1 COMPLETE (Phases 0-5 of 14 SHIPPED)' with the full
chain of commit SHAs (
|
||
|
|
b33b843908 |
feat(scep): RenewalReq + GetCertInitial + ChromeOS E2E + caps + must-staple
SCEP RFC 8894 + Intune master bundle — Phase 4 + Phase 5 of 14.
Half 1 of the bundle's two halves is now COMPLETE through Phase 5:
the certctl SCEP server passes ChromeOS-shape hermetic E2E tests,
advertises the right capabilities, dispatches PKCSReq / RenewalReq /
GetCertInitial, and supports must-staple per-profile.
== Phase 4: RenewalReq + GetCertInitial wiring ============================
internal/service/scep.go
* RenewalReqWithEnvelope (RFC 8894 §3.3.1.2) — re-enrollment with an
existing valid cert. Same contract as PKCSReqWithEnvelope but the
service additionally verifies that envelope.SignerCert chains to
the issuer's CA (verifyRenewalSignerCertChain). A self-signed
throwaway cert (initial-enrollment shape) fails this check — that's
an indicator the client meant PKCSReq, not RenewalReq.
* GetCertInitialWithEnvelope (RFC 8894 §3.3.3) — polling stub.
Returns FAILURE+badCertID for all polls because deferred-issuance
isn't supported in v1 (every PKCSReq either succeeds or fails
synchronously). Wiring stays in place for a future enhancement.
* Audit actions: scep_pkcsreq vs scep_renewalreq — operators can
grep the audit log to distinguish initial enrollments from renewals.
internal/api/handler/scep.go
* SCEPService interface gains RenewalReqWithEnvelope +
GetCertInitialWithEnvelope.
* pkiOperation RFC 8894 path now switches on envelope.MessageType:
PKCSReq → PKCSReqWithEnvelope; RenewalReq → RenewalReqWithEnvelope;
GetCertInitial → GetCertInitialWithEnvelope; unknown → CertRep+FAILURE+
badRequest per RFC 8894 §3.3.2.2.
== Phase 5.1: GetCACaps capability advertisement =========================
internal/service/scep.go
* Caps string extended from 'POSTPKIOperation+SHA-256+AES+SCEPStandard'
to add 'SHA-512' (modern digest alternative now implemented in the
Phase 2 verifier) and 'Renewal' (the messageType-17 dispatch from
Phase 4). ChromeOS specifically looks for these capabilities to
negotiate the strongest available cipher + digest combo.
* scep_test.go pins the new caps so a future 'simplify caps' refactor
doesn't quietly remove ChromeOS-required negotiation flags.
== Phase 5.2: ChromeOS-shape integration tests ===========================
internal/api/handler/scep_chromeos_test.go (new, ~570 LoC)
* 6 hermetic E2E tests + ~12 helpers. Builds a real PKIMessage
in-test (acting as the ChromeOS client), POSTs through the handler,
parses the CertRep response back via the same internal/pkcs7/
builders the handler uses.
* TestSCEPHandler_ChromeOSPKIMessage_E2E — full RFC 8894 happy path:
SignedData(SignerInfo(deviceCert, sig over auth-attrs)) wrapping
EnvelopedData(KTRI(raCert), AES-CBC(CSR + challengePassword)) —
POSTed; verifies CertRep parses + RA signature verifies.
* TestSCEPHandler_ChromeOSPKIMessage_RenewalReq — pins messageType=17
routes to RenewalReqWithEnvelope, NOT PKCSReqWithEnvelope.
* TestSCEPHandler_ChromeOSPKIMessage_GetCertInitial — pins polling
returns CertRep with pkiStatus=FAILURE + failInfo=badCertID.
* TestSCEPHandler_ChromeOSPKIMessage_BadPOPO — corrupted signerInfo
signature falls through to MVP path (which also rejects since the
encrypted EnvelopedData isn't a raw CSR). No silent acceptance.
* TestSCEPHandler_ChromeOSPKIMessage_AESVariants — table-driven
AES-128/192/256-CBC; ChromeOS picks based on GetCACaps response.
* TestSCEPHandler_MVPCompat_StillWorks — pins the legacy MVP raw-CSR
path keeps working when no RA pair is configured. Backward compat
is non-negotiable.
== Phase 5.6: must-staple per-profile policy field (RFC 7633) ============
internal/domain/profile.go
* Added MustStaple bool to CertificateProfile. Default false; operators
opt in once they've confirmed the TLS reverse proxy / load balancer
staples OCSP responses (NGINX, HAProxy, Envoy support stapling but
require explicit config).
internal/connector/issuer/interface.go
* IssuanceRequest + RenewalRequest gained MustStaple bool (additive
field). Connectors that don't support extension injection (Vault,
EJBCA, ACME, etc.) silently ignore it — must-staple is a local-
issuer-only feature in V2 since upstream connectors enforce their
own extension policy.
internal/connector/issuer/local/local.go
* Added oidMustStaple (1.3.6.1.5.5.7.1.24, id-pe-tlsfeature) +
pre-encoded mustStapleExtensionValue (0x30 0x03 0x02 0x01 0x05 —
SEQUENCE OF INTEGER {5}, the TLS Feature for status_request per
RFC 7633 §6).
* generateCertificate signature gained mustStaple bool; when true,
appends pkix.Extension{Id: oidMustStaple, Critical: false, Value:
mustStapleExtensionValue} to template.ExtraExtensions before
x509.CreateCertificate.
internal/connector/issuer/local/must_staple_test.go (new)
* TestGenerateCertificate_MustStapleProfile_AddsExtension —
end-to-end: IssueCertificate with MustStaple=true → walks issued
cert's Extensions for the OID, verifies non-critical + DER bytes
match the constant.
* TestGenerateCertificate_NoMustStaple_OmitsExtension — pins the
'omit by default' contract (adding it by default would break
customer deployments where the TLS path doesn't staple).
* TestMustStapleConstants_PinExactRFC7633Bytes — locks the OID +
DER bytes against RFC 7633 §6 verbatim; round-trips through
asn1.Unmarshal as []int{5}.
Note: full service-layer plumbing (CertificateProfile.MustStaple →
IssuanceRequest.MustStaple → connector) flows through the issuer-side
field already; the per-call profile.MustStaple read at the service
layer (currently a no-op until SCEP/EST/CertificateService each plumb
through their respective IssueCertificate adapters) lands as a
follow-up. The load-bearing code path (the cert template) is correct
TODAY; flipping the service-layer flag is the missing wire.
== Phase 5.4: docs/legacy-est-scep.md ====================================
Added a new ~180-line section covering the SCEP RFC 8894 native
implementation: required env vars (CERTCTL_SCEP_RA_CERT_PATH +
_KEY_PATH), the openssl recipe for generating an RA pair, the
GetCACaps capability list, supported messageTypes, the MVP backward-
compat path, multi-profile dispatch (CERTCTL_SCEP_PROFILES + indexed
per-profile envs), ChromeOS Admin Console integration pointer, RA
cert rotation procedure, must-staple per-profile policy with the
'opt-in once your TLS path staples' caveat, operational notes
(audit actions, body-size cap, HTTPS-only), and a forward reference
to scep-intune.md (Phase 11).
== Verification ==========================================================
* gofmt + go vet clean for the files I touched.
* staticcheck ./internal/api/handler/... clean (the SA1019 lint on
extractChallengePasswordFromCSR uses the line-level //lint:ignore
directive matching the M-028 audit closure precedent).
* go test -short -count=1 green across api/handler / api/router /
service / pkcs7 / connector/issuer/local / domain / cmd/server.
* G-3 docs-drift CI guard local check: empty diff in both directions.
Phase 4 + Phase 5 of 14 in SCEP RFC 8894 + Intune master bundle.
Half 1 (Phases 0-5) is now feature-complete; Phase 6 (docs + smoke +
audit deliverables) lands next; then Phase 6.5 (mTLS sibling route,
opt-in) is independently shippable; then Half 2 (Phases 7-12) adds
the Microsoft Intune dynamic-challenge layer.
Living progress at cowork/scep-rfc8894-intune/progress.md.
|
||
|
|
5c7c125d9d |
ci+docs(scep): close G-3 docs-only drift for SCEP placeholder + wildcard
Commit
|
||
|
|
294f6cff52 |
docs(scep): document multi-profile env vars (CERTCTL_SCEP_PROFILES + per-profile prefix)
Phase 1.5 added two new env-var literals to internal/config/config.go
that the G-3 docs-drift CI guard picked up but I forgot to document
when shipping commit
|
||
|
|
105c307d62 |
feat(scep): add RFC 8894 message-type constants + RA cert/key config
SCEP RFC 8894 + Intune master bundle — Phase 0 + Phase 1 of 14.
Phase 0 (recon, no code changes):
Baseline tests green at HEAD
|
||
|
|
2519da85f0 |
docs: README + concepts + features reflect CRL/OCSP responder bundle
Audit pass against cowork/crl-ocsp-responder-prompt.md found three
operator-facing docs still describing the pre-bundle CRL/OCSP surface
(GET-only OCSP, CA-key-direct signing, no scheduler-driven cache). Each
claim updated below was ground-truthed against repo HEAD before edit.
README.md
* Standards & Revocation table — CRL row now mentions
scheduler-pre-generated cache (CERTCTL_CRL_GENERATION_INTERVAL,
crl_cache table); OCSP row mentions GET + POST forms, dedicated
responder cert per RFC 6960 §2.6, id-pkix-ocsp-nocheck per
§4.2.2.2.1, 7d auto-rotation grace.
* Revocation paragraph — corrected the 'Embedded OCSP responder'
one-liner to call out the dedicated-responder-cert design (the CA
private key is never used directly for OCSP signing, which is the
load-bearing security property for the future PKCS#11/HSM driver
path) and added the link to the relying-party guide.
docs/concepts.md
* CRL paragraph — added the scheduler pre-generation + singleflight
coalescing detail. Kept the existing 24h validity claim (verified
against internal/connector/issuer/local/local.go:956 — 'NextUpdate:
now.Add(24 * time.Hour)').
* OCSP paragraph — corrected the description so it covers both GET
and POST forms (POST per RFC 6960 §A.1.1 is what production
clients use: Firefox, OpenSSL s_client -status, cert-manager,
Intune); added the dedicated-responder-cert + nocheck-extension +
auto-rotation explanation; cross-link to docs/crl-ocsp.md.
docs/features.md
* Revocation Infrastructure section — CRL Endpoint, OCSP Responder,
new Admin Cache Observability subsection, new GUI Revocation
Endpoints Panel subsection. Corrected the previously-wrong 'Signs
with the issuing CA key' OCSP claim — the bundle's load-bearing
security improvement is exactly that the CA key is NOT used
directly. Cross-link to crl-ocsp.md.
* Local CA env vars table — added all four new
CERTCTL_CRL_GENERATION_INTERVAL / CERTCTL_OCSP_RESPONDER_KEY_DIR
(with the prod 'MUST set' callout) / _ROTATION_GRACE / _VALIDITY
rows. Closes the G-3 'env var defined in Go but never documented'
drift that broke CI on commit
|
||
|
|
b4334edda1 |
docs: CRL/OCSP user guide + architecture cross-reference — Phase 6
Audit of cowork/crl-ocsp-responder-prompt.md against repo HEAD found
two prompt deliverables still missing after the Phase 5 + Phase 6 code
landed: the docs/crl-ocsp.md operator+relying-party guide (Phase 6.2)
and the docs/architecture.md cross-reference. This commit closes both.
docs/crl-ocsp.md (329 lines) covers:
* Conceptual overview — why both CRL and OCSP, why a separate
responder cert (RFC 6960 §2.6 / §4.2.2.2.1) keeps the CA key cold
* Endpoints — GET CRL, GET + POST OCSP, admin observability endpoint
(M-008 admin-gated) with full request/response shape examples
* Configuration — every CERTCTL_CRL_* / CERTCTL_OCSP_RESPONDER_*
env var with default + meaning + 'MUST set in prod' callout for
OCSP_RESPONDER_KEY_DIR
* OCSP responder cert lifecycle — first-request bootstrap, disk
self-healing when keydir is pruned out from under the DB row,
rotation grace, ExtraExtensions wiring for id-pkix-ocsp-nocheck
* Consumer integration recipes — cert-manager (AIA/CDP automatic),
Firefox (about:preferences quirk), OpenSSL (ocsp + s_client -status),
Intune (CRL pull cadence)
* V3-Pro deferred (delta CRLs, OCSP rate-limiting, OCSP stapling)
* Troubleshooting (404 on issuer that doesn't support CRL, hex
serial format, admin-gated 403, scheduler not running)
docs/architecture.md: extended the existing 'Certificate revocation'
paragraph to explicitly call out the new pipeline (crl_cache table,
OCSP responder cert per RFC 6960 §2.6, POST + GET OCSP endpoints,
auto-rotation grace) and added the 'See docs/crl-ocsp.md for the
operator + relying-party guide' link so future readers can find the
deep dive.
Closes the prompt's Phase 6.2 + 6.3 exit criteria. Combined with
the Phase 5 GUI panel (
|
||
|
|
9039cef390 |
crypto/signer: introduce Signer interface; refactor local issuer to use it
This is a load-bearing internal refactor with no user-visible behavior
change. The new internal/crypto/signer package abstracts CA private-key
signing behind a Signer interface (embeds stdlib crypto.Signer + adds
Algorithm()). The local issuer now consumes this interface; the
historical c.caKey crypto.Signer field is renamed c.caSigner signer.Signer.
What landed:
* internal/crypto/signer/ — new stdlib-only package
- Signer interface: crypto.Signer + Algorithm()
- Algorithm enum: RSA-2048, RSA-3072, RSA-4096, ECDSA-P256, ECDSA-P384
- Driver interface: Load / Generate / Name
- FileDriver: production driver, wraps file-on-disk PEM, hooks for
DirHardener + Marshaler so the local package can inject Bundle 9
keystore.ensureKeyDirSecure + keymem.marshalPrivateKeyAndZeroize
- MemoryDriver: in-memory test driver; safe for concurrent use
- parse.go: ParsePrivateKey moved here from local.go (PKCS#1, SEC 1, PKCS#8)
- 91.6% coverage (gate ≥85)
* internal/connector/issuer/local/local.go — refactor
- Rename c.caKey crypto.Signer → c.caSigner signer.Signer
- Rewire 4 signing call sites: leaf cert (line ~613), CRL (~849),
OCSP response (~887), CA bootstrap (~482) — all access the
interface; the bootstrap also switches to interface-level
Public() + Signer
- Wrap freshly-generated and freshly-loaded keys; reject Ed25519
and other unsupported algorithms at load time (was silently
accepted before, would have failed at first sign)
- Delete the duplicated parsePrivateKey helper (single source of
truth now lives in the signer package)
- Update the L-014 threat-model comment block (lines 1-29) with a
forward-reference paragraph: file-on-disk caveats apply only to
FileDriver-backed signers; alternative drivers close that leg
- Coverage 86.7 → 86.5 (above CI floor of 86); the 0.2pp drop is
mechanical from deleting parsePrivateKey, partially recovered by
a new test pinning the Wrap error path
* internal/crypto/signer/equivalence_test.go — Phase 3 safety net
- RSA byte-strict equality for leaf certs / CRLs / OCSP responses
(PKCS#1 v1.5 is deterministic)
- ECDSA TBS-strict equality (signature differs because of random k)
- Both signatures independently validate against the CA
- Negative sentinel proves the equivalence checker isn't trivially-
passing
* docs/architecture.md — new 'CA Signing Abstraction' section under
Security Model, with ASCII diagram of FileDriver / MemoryDriver /
future PKCS11Driver / future CloudKMSDriver
* Test file mechanical edits (only):
- bundle9_coverage_test.go: parsePrivateKey → signer.ParsePrivateKey
(function moved, not behavior changed)
- local_test.go: append one targeted test
(TestSubCA_LoadCAFromDisk_RejectsUnsupportedKeyAlgorithm) that
pins the new Wrap error path I introduced — recovers coverage
cost of the deletion above
What did NOT change (verified empty diffs):
* api/openapi.yaml
* migrations/
* internal/connector/issuer/interface.go
* go.mod / go.sum (no new dependencies; stdlib only)
This refactor is the prerequisite for three downstream items:
- PKCS#11/HSM driver (V3-Pro)
- CRL/OCSP responder (V2)
- SSH CA lifecycle (V2)
Each of those adds a new signing call site. Doing the abstraction now
costs once; deferring would cost three times.
|
||
|
|
51e0999888 |
Bundle P.2-extended CI follow-up: rephrase aspirational env-var references to fix G-3 guard
CI's G-3 env-var docs drift guard caught four aspirational env vars
referenced in the Bundle P.2-extended RFC test-vector subsections that
aren't actually defined in internal/config/config.go:
- CERTCTL_EST_KEYGEN_MODE -> typo for CERTCTL_KEYGEN_MODE (corrected)
- CERTCTL_OCSP_DELEGATED_RESPONDER_CERT_PATH -> not implemented (rephrased
as forward-looking; v2 only supports byName ResponderID)
- CERTCTL_CRL_VALIDITY_DURATION -> not implemented (rephrased; v2 has
a hard-coded 7-day validity)
- CERTCTL_CRL_PARTITIONED -> not implemented (rephrased; v2 emits
full CRLs only with no IDP extension)
The byKey ResponderID, partitioned-CRL IDP, and configurable CRL
validity test vectors remain documented but are now framed as 'becomes
a positive test once <feature> support lands' rather than as currently-
implemented configuration. Same applies to the OCSP delegated-responder
mode test vector.
This keeps the RFC conformance documentation intact while staying
honest about what's actually wired up in v2.
CI guard verification (locally simulated):
G-3 env-var docs drift guard: CLEAN
Bundle: P.2-extended-ci-fix
|
||
|
|
6cb007eaaa |
Bundle P.2-extended (Coverage Audit Extension): RFC test-vector subsections — M-008 closed
Pure doc work. Three new subsections added to docs/testing-guide.md: Part 21.99 — RFC 7030 EST test vectors - /cacerts response framing (§4.1.3) - /simpleenroll request framing (§4.2.1) - /serverkeygen multipart response (§4.4.2) Part 23.99 — RFC 5280 SAN/EKU test vectors - IPv4 SAN encoding (§4.2.1.6, [7] OCTET STRING 4 bytes) - IPv6 SAN encoding (§4.2.1.6, 16 bytes; v4-mapped canonicalization) - IDN dNSName (§4.2.1.6 + RFC 3490 Punycode) - otherName UPN (§4.2.1.6, [0] AnotherName SEQUENCE) - EKU encoding (§4.2.1.12, SEQUENCE OF OID + standard OIDs) - EKU criticality (§4.2.1.12 + CA/B Forum BR §7.1.2.7) Part 24.99 — RFC 6960 OCSP / RFC 5280 §5 CRL test vectors - OCSP response status (§4.2.2.3, tryLater vs HTTP 5xx) - OCSP ResponderID byName vs byKey (§4.2.2.2) - OCSP nonce extension (§4.4.1, browser-cache-friendly handling) - CRL TBSCertList nextUpdate (§5.1.2 + CA/B Forum BR §7.2.2) - CRL reason codes (§5.3.1, reserved 7 + out-of-range rejection) - CRL IDP extension (§5.2.5, partitioned vs full) - CRL no-delta (§5.2.4, certctl emits full CRLs only) Each vector cites RFC section + provides ASN.1 byte snippet where relevant + names the certctl pin location (file + test name) so a reviewer can spot wire-level drift without re-reading the RFC. Verification - grep -cE '^### [0-9]+\.99' docs/testing-guide.md == 3 (the new subs) - grep -cE '^## Part [0-9]+:' docs/testing-guide.md == 56 (unchanged) - file size: 8266 lines (+~190 from baseline) Audit deliverables - gap-backlog.md M-008 row: full strikethrough + closure note enumerating all three subsections + the 14 specific test vectors - coverage-audit-2026-04-27/extension-progress.md: P.2 marked DONE Closes: M-008 Bundle: P.2-extended (Coverage Audit Extension) |
||
|
|
95d0d85391 |
Bundle Q (Coverage Audit Closure): property-based pilot + hygiene — L-001/L-002/L-003/L-004/I-001 closed
Five small closures wrapping the Low-tier and Info-tier audit findings. Q.1 — cmd/cli round-out (L-001 closed) ====================================== cmd/cli/dispatch_test.go: ~30 dispatch tests across handleCerts / handleAgents / handleJobs / handleImport / handleStatus. httptest.NewTLSServer mocks the API; cli.NewClient(_, _, _, _, true) constructs an insecure-skip-verify client. Each test pins the missing-args usage-print path AND the happy-path delegation. Result: 7.1% -> 63.5% coverage (gate: >=30%). Q.2 — awssm round-out (L-002 closed) ====================================== internal/connector/discovery/awssm/awssm_edge_test.go: New() default constructor, extractKeyInfo (ECDSA/Ed25519/unknown — was RSA-only), processSecret filter arms (NamePrefix mismatch / TagFilter mismatch / empty-value / GetSecretValue error), realSMClient stub-contract pin (ListSecrets / GetSecretValue / NewRealSMClient), and EmailAddresses SAN extraction. Result: 78.2% -> 96.0% coverage (gate: >=85%). Q.3 — Property-based testing pilot (L-003 closed) ====================================== gopter@v0.2.11 added to go.mod (test-only). internal/crypto/encryption_property_test.go: - TestProperty_EncryptDecryptRoundTrip — 50 successful tests, DecryptIfKeySet(EncryptIfKeySet(x, k), k) == x - TestProperty_WrongPassphraseRejected — 30 successful tests, AEAD never returns nil-error AND bytes-equal plaintext under wrong passphrase Both skipped under -short to keep developer loop fast (PBKDF2 600k rounds × 50 iters ≈ 15s on -race CI). internal/pkcs7/length_property_test.go: - TestProperty_ASN1LengthRoundTrip — three sub-properties: decodeLength(encode(x)) == x for x ∈ [0, 2³¹−1]; short-form invariant (length<128 → 1 byte == length); long-form invariant (length>=128 → high bit set + N bytes follow). 500 successful tests in <10ms. Q.4 — Architecture diagram multi-agent update (L-004 closed) ====================================== docs/qa-test-guide.md::Architecture: ASCII diagram updated to show 'certctl-agent (×N)' + callout explaining seed_demo.sql provisions 12 agent rows (1 active, 2 retired, 9 reserved/sentinel) for Parts 04, 05, 55 + FSM coverage. Operators running parallel-agent topologies guided to AGENT_COUNT=N + 'make qa-stats'. Q.5 — Test-naming CI guard (I-001 closed) ====================================== .github/workflows/ci.yml: Test-naming convention guard added after the QA-doc seed-count drift guard. Greps for func Test<X>( missing the <X>_<Scenario> suffix. Prints first 20 non-conformant as ::warning:: annotations. continue-on-error: true (informational). Excludes TestMain + TestProperty_*. Promotion to hard-fail tracked as I-001-extended. Verification ====================================== - python3 yaml.safe_load on ci.yml: OK - go vet ./cmd/cli/... ./internal/connector/discovery/awssm/... ./internal/crypto/... ./internal/pkcs7/...: clean - go test -short -count=1 across all four packages: PASS - go test -count=1 (full property tests): PASS - crypto 15.4s (50 + 30 × 600k PBKDF2) - pkcs7 5ms Audit deliverables ====================================== - gap-backlog.md: strikethroughs on L-001/L-002/L-003/L-004/I-001 with per-finding closure note - closure-plan.md: ticks Bundle Q [x] with per-item breakdown Closes: L-001, L-002, L-003, L-004, I-001 Bundle: Q (Property-Based + Hygiene) |
||
|
|
30ac7910c2 |
Bundle P (Coverage Audit Closure): QA doc strengthening — M-007/M-009/M-010/M-011/M-012 closed; M-008 deferred
Six structural strengthenings to certctl QA documentation surface, raising acquisition-readiness QA-doc score 4.0 -> 4.7. M-008 (per-RFC test-vector subsections under Parts 21 + 24) deferred as 'Bundle P.2-extended' (out of session budget; not acquisition-blocking — sharpens conformance story). P.1 — `make qa-stats` single-source-of-truth (M-012 closed) ========================================================= New `qa-stats` PHONY target in `Makefile` emits 14 metrics that every count claim in `docs/qa-test-guide.md` and `docs/testing-guide.md` is derived from: backend test files / Test functions / t.Run subtests, frontend test files, fuzz targets, t.Skip sites, qa_test.go Part_ subtests, testing-guide.md Parts, and unique seed IDs (mc-* / ag-* / iss-* / tgt-* / nst-*). Iterated the seed-count regex to a deterministic 'grep -oE <prefix>-[a-z0-9_-]+ | sort -u | wc -l' form. Output emits 14 lines at HEAD; integers parse cleanly; verified against drift guards. P.2 — CI drift guards (M-011 closed) ========================================================= Two new CI steps in `.github/workflows/ci.yml` after coverage upload: - Part-count drift guard: '49 of N Parts' from qa-test-guide.md vs '^## Part N:' header count in testing-guide.md. Fails on mismatch. - Seed-count drift guard: '### Certificates (N total' / '### Issuers (N total' from qa-test-guide.md vs unique mc-* / iss-* IDs in seed_demo.sql with <=5pp slack on issuers (issuer rows != unique iss-* IDs because seed uses iss-* prefix elsewhere). Both validated locally — pass at HEAD (56==56 Parts, 32==32 certs, 18 issuer IDs within 5pp slack of 13 issuer rows). YAML lint clean. P.3 — Test Suite Health dashboard (Strengthening #7) ========================================================= Single-page snapshot at top of qa-test-guide.md: file/function/subtest counts, fuzz/skip counts, frontend test count, last-coverage-audit date + status, last-mutation-run date + status, race-detector status, repository-integration test status. Designed for first-look auditor / acquirer / new-engineer scanning. P.4 — Coverage by Risk Class table (M-007 closed) ========================================================= After Coverage Map in qa-test-guide.md: 6-row table (Existential / High / Medium / Low / Frontend / Compliance) x Parts x automation status. Cross-references each row to coverage-matrix.md. Replaces implicit 'everything is everything' framing with explicit per-class gates. P.5 — Release Day Sign-Off Matrix (M-010 closed) ========================================================= 12-row release-readiness checklist in qa-test-guide.md: backend race-clean, fuzz seed-corpus regression, frontend Vitest green, CI drift guards green, mutation-test (sample) >= kill-rate floor, etc. Each row cites verification command + gate value. Sign-off is 'all 12 green' — produces a per-release artifact attached to the tag. P.6 — Mutation Testing Targets (Strengthening #5) ========================================================= New section in qa-test-guide.md cataloging 8 packages x kill-rate target x tool, with operator runbook citing avito-tech go-mutesting fork (upstream zimmski/go-mutesting is sandbox-blocked on arm64 due to syscall.Dup2). Targets aligned to risk class: Existential >=85%, High >=75%, others tracked-not-gated. P.7 — Per-Connector Failure-Mode Matrix (M-009 closed, condensed) ========================================================= New 'Part 9.0 Per-Connector Failure-Mode Matrix' in docs/testing-guide.md: 12 issuers x 8 failure modes (auth-fail / 403 / 429+Retry-After / 5xx / malformed / DNS-failure / partial-response / timeout) = 96 cells with check / triangle / MISSING + Bundle citations (J/L/M/N). Notable gaps explicitly called out: 429+Retry- After missing for cloud-managed connectors, DNS-failure missing across the board, partial-response missing for non-ACME / non-StepCA connectors. Each gap is a follow-on-bundle candidate. Verification ========================================================= - 'make qa-stats' runs to completion, emits 14 metrics, all integers parse cleanly - 'python3 -c "import yaml; yaml.safe_load(...)"' clean on ci.yml - Both CI drift guards executed locally — both PASS at HEAD - git diff --stat: 5 files changed, +249 / -1 Audit deliverables ========================================================= - gap-backlog.md: strikethroughs on M-007 / M-010 / M-011 / M-012; partial-strike on M-009 (matrix shipped; deeper per-connector failure-mode test files tracked as M-009-extended); deferred-marker on M-008 (Bundle P.2-extended); Bundle P closure-log entry - closure-plan.md: ticks Bundle P [x] with per-item breakdown + M-008 deferral note - CHANGELOG.md: full Bundle P [unreleased] entry above Bundle O - testing-guide.md: new Part 9.0 Per-Connector Failure-Mode Matrix - qa-test-guide.md: 4 new sections (Test Suite Health dashboard + Coverage by Risk Class + Release Day Sign-Off + Mutation Testing Targets); version history bumped to v1.3 - Makefile: new qa-stats PHONY target - ci.yml: 2 new drift-guard steps after coverage upload Closes: M-007, M-010, M-011, M-012 Closes (condensed): M-009 (matrix shipped; deeper test files = M-009-extended) Deferred: M-008 (Bundle P.2-extended; not acquisition-blocking) Bundle: P (QA Doc Strengthening) |
||
|
|
834389621c |
Bundle I (Coverage Audit Closure): QA-doc drift cleanup — H-007 + H-008 closed
Applies Patches 1-7 from coverage-audit-2026-04-27/tables/qa-doc-patches.md
(Patch 5 re-anchored against actual HEAD seed counts after Phase 0 recon
discovered the original patch's anticipated counts were themselves drifted).
docs/qa-test-guide.md:
- Patch 1: 'all 54 Parts' -> '49 of 56 Parts' + not-yet-automated callout
- Patch 2: Totals line replaced with verified-2026-04-27 breakdown + recompute commands
- Patch 3: Coverage Map gains Parts 23, 24, 55, 56 (each '0 (NOT AUTOMATED)')
- Patch 4: 'Not Yet Automated' subsection added under 'What This Test Does NOT Cover'
- Patch 5: Seed Data Reference re-anchored to authoritative HEAD counts:
32 certs (already correct), 12 agents (was 9), 13 issuers (was 9),
8 targets (already correct), 4 nst (already correct).
Replaced narrow ID enumerations with sed | grep recompute commands.
Added maintenance-note pointer to Strengthening #6 (CI guard).
- Patch 6: Version History entry v1.2 added
- Bonus: integration_test comparison row updated (12 agents + 13 issuers)
deploy/test/qa_test.go (Patch 7):
4 new t.Run('PartN_*', ...) blocks for Parts 23, 24, 55, 56 — each calls
t.Skip with a docs/testing-guide.md::Part N pointer + automation candidates.
Skip-with-rationale form keeps Part numbering consistent + makes the
manual-test pointer machine-readable. Replacing each Skip with a real
test body is gap-backlog work.
Verification:
grep -cE '^## Part [0-9]+:' docs/testing-guide.md == 56 PASS
grep -cE 't\.Run("Part[0-9]+_' deploy/test/qa_test.go == 53 PASS
go vet -tags qa ./deploy/test/... PASS
go test -tags qa -run='__nope__' ./deploy/test/... PASS (compile)
(Full SKIP-grep gate requires the live demo stack; t.Skip bodies trivial.)
Audit deliverables:
findings.yaml: H-007 (-0014), H-008 (-0015) status open -> closed
gap-backlog.md: strikethrough both rows + Bundle I closure-log entry
tables/qa-doc-drift.md: 'PATCHES APPLIED' header marker (not retro-edited)
acquisition-readiness.md: QA-doc rigor 2.5 -> 4.0
closure-plan.md: Bundle I checklist box ticked
CHANGELOG.md: [unreleased] Bundle I entry
|
||
|
|
6b5af27546 |
Bundle G: Final audit closure — L-004 + D-003/4/5/7 closed; 54/55 + 7/7
Closes the 2026-04-25 audit's final-closure cluster. Score 51/55 -> 54/55
(98% closed); deferred 4/7 -> 7/7 (100%). All severity-graded findings now
closed except M-029 (frontend per-PR migration backlog, by design incremental).
L-004 (CWE-924) — dual-key API rotation overlap window:
internal/config/config.go::ParseNamedAPIKeys rewritten to allow same-name
duplicate entries iff admin flag matches. Mismatched-admin entries rejected
at startup (privilege escalation guard); exact (name,key) duplicates rejected
(typo guard — rotation requires DIFFERENT keys under the same name). Startup
INFO log per name with multiple entries surfaces the active rotation window.
NewAuthWithNamedKeys was already shaped correctly (constant-time hash compare
across all entries, same UserKey + AdminKey for either bearer); Bundle B's
M-025 per-user rate-limit bucket and audit-trail actor inherit consistency
across the rollover automatically. 8 new tests pin the contract end-to-end.
docs/security.md::API key rotation walks the 6-step zero-downtime rollover.
D-003 — Mutation testing wired:
security-deep-scan.yml gets a go-mutesting step covering ./internal/crypto/...,
./internal/pkcs7/..., ./internal/connector/issuer/local/... with per-package
summary lines extracted into go-mutesting.txt artefact.
D-007 — Frontend semgrep wired (recon found Bundle 7's wiring claim was false):
security-deep-scan.yml gets a 'semgrep p/react-security' step running
returntocorp/semgrep:latest --config=p/react-security against /src/web/src;
results uploaded as semgrep-react.json.
D-004 + D-005 — Operator runbook published:
docs/testing-strategy.md (NEW) consolidates per-tool local-run procedures,
acceptance thresholds, and triage paths for go-mutesting, ZAP baseline DAST,
testssl.sh, and semgrep p/react-security. Closes the 'wired CI-only, no
local-run validation' framing for D-004/D-005 by giving operators the same
commands the CI workflow runs.
Verification:
gofmt -l no diff
go vet ./internal/config/... ./internal/api/middleware/... clean
go test -short -count=1 ./internal/config/... ./internal/api/middleware/... PASS
python3 -c 'yaml.safe_load(...)' YAML OK
G-3 env-var docs guard no phantom env-vars
Audit deliverables:
audit-report.md: L-004 + D-003/4/5/7 boxes flipped [x]; score 51/55 -> 54/55
findings.yaml: 5 status flips; new bundle-G-final-closure closure_log entry
CHANGELOG.md: Bundle G entry under [unreleased]; supersedes Bundle E + F
L-004-deferred framing
|
||
|
|
389f6b8233 |
Bundle F follow-up: M-023 doc env-var cleanup (G-3 guard fix)
CI on the bundle-F merge (run #24972730564) failed the G-3 env-var docs guardrail because docs/legacy-est-scep.md mentioned CERTCTL_EST_PROXY_TRUSTED_SOURCES CERTCTL_EST_TRUST_PROXY_CLIENT_CERT_HEADER which are documented as future-feature env vars but don't exist in config.go. The G-3 guard treats any env-var name in docs that's not either defined in source OR on the documented integration-surface allowlist as drift. The runbook's 'certctl-side configuration' section was over-promising features that haven't shipped yet. Rewritten to be honest: - Current implementation is header-agnostic (X-SSL-Client-Cert is ignored). EST/SCEP authentication still works correctly because both protocols carry their own auth (CSR signature for EST, challengePassword for SCEP) inside the request body. - The reverse proxy is purely a TLS-version bridge. - Future-feature description retained in prose form (without literal env-var names) so an operator who needs proxy-supplied client identity knows to open an issue. The nginx config block's comment was also rewritten to reflect the header-agnostic default. The proxy still SETS the headers (cheap, no-op when ignored); a future commit can flip certctl to read them behind a fail-closed CIDR allowlist + opt-in toggle. Verification: grep -rnE 'CERTCTL_EST_PROXY|CERTCTL_EST_TRUST' README.md docs/ deploy/helm/ — empty (G-3 guard now passes for these names) |
||
|
|
8aff1c16f8 |
Bundle F: Compliance tail + CI gate hardening — 2 findings closed; audit closure complete
Closes M-023 + M-024 from comprehensive-audit-2026-04-25. Final
audit-bundle commit. Score 51/55 closed (93%); High 9/9 (100%);
Medium 26/27 (96%); Low 19/19 (100%); Deferred 4/7.
M-023 (PCI-DSS Req 4 §2.2.5) — Legacy EST/SCEP reverse-proxy runbook
docs/legacy-est-scep.md (NEW): operator runbook for embedded
EST/SCEP clients that only speak TLS 1.2 against a TLS-1.3-pinned
certctl listener. Sections:
- 3-condition gate for when this runbook applies
- Architecture diagram (legacy client -> proxy TLS 1.2 -> certctl TLS 1.3)
- Full nginx config with ssl_protocols TLSv1.2 TLSv1.3 + ECDHE
AEAD-only ciphers + mTLS optional verification + proxy_ssl_protocols
TLSv1.3 on the backend hop
- HAProxy alternative config with ssl-min-ver TLSv1.2 frontend +
ssl-min-ver TLSv1.3 backend
- certctl-side env vars: CERTCTL_EST_PROXY_TRUSTED_SOURCES (CIDR
allowlist of trusted proxies) + CERTCTL_EST_TRUST_PROXY_CLIENT_CERT_HEADER
(toggle header-as-identity). Dual-knob design forces operators
to think about header spoofing.
- PCI-DSS Req 4 v4.0 §2.2.5 attestation language
- Forward-look on TLS 1.2 deprecation watch
certctl listener stays pinned at TLS 1.3 minimum (cmd/server/tls.go:131);
the proxy-to-certctl hop is also TLS 1.3.
M-024 (NIST SSDF PW.7.2) — govulncheck hard gate
.github/workflows/ci.yml: 'Run govulncheck' step renamed to
'Run govulncheck (M-024 hard gate)' with updated comment block
documenting why no carve-out is needed.
Bundle E's transitive bumps (x/net 0.42->0.47, x/crypto 0.41->0.45)
cleared the 5 L-021 deferred-call advisories that the original
Bundle F prompt designed an exception list for. Plain
'govulncheck ./...' is now the right gate; default exit-code
semantics fail on any future called-vuln advisory. Deferred-call
advisories that legitimately can't be remediated should land in
a NIST SSDF deviation log in docs/security.md, not be silenced.
Audit endgame:
51/55 closed (93%). Remaining open items don't require further
bundle work:
- M-029 frontend per-page migration backlog — closes per-PR
- L-004 rotation infra — explicit scope-pivot defer
- D-003 mutation testing — sandbox-blocked
- D-004 DAST suite — wired CI-only via security-deep-scan.yml
- D-005 testssl.sh — wired CI-only
- D-007 frontend semgrep — wired CI-only
Audit deliverables:
audit-report.md: score 49/55 -> 51/55 closed; M-023 + M-024
boxes flipped [x] with closure notes.
findings.yaml: 2 status flips
CHANGELOG.md: Bundle F section + 'Audit endgame' summary
|
||
|
|
e720474fb7 |
Bundle D: Documentation & transparency sweep — 8 findings closed
Closes H-009 + L-001 + L-007 + L-008 + L-016 + L-017 + L-018 + M-027
from comprehensive-audit-2026-04-25.
H-009 — README JWT verified-already-clean
README has zero JWT mentions at audit time. docs/architecture.md
correctly documents JWT/OIDC integration via authenticating-gateway
pattern (line 905-912).
.github/workflows/ci.yml: new step
'Forbidden README JWT advertising regression guard (H-009)'
greps README for JWT-as-supported phrasing; passes verbatim
(gateway / pre-G-1) but fails build on net-new advertising.
L-001 (CWE-295) — InsecureSkipVerify per-site justification
Audit count was 8; recon found 13 production sites.
docs/tls.md: new 'InsecureSkipVerify justifications' table
enumerates each site by file:line with per-site rationale.
cmd/agent/verify.go:78, internal/tlsprobe/probe.go:54,
internal/service/network_scan.go:460: each previously-bare
InsecureSkipVerify: true now carries //nolint:gosec.
.github/workflows/ci.yml: new step
'Forbidden bare InsecureSkipVerify regression guard (L-001)'
fails build if any net-new ISV lands in non-test .go without
nolint:gosec on the same or preceding line.
L-007 — README dependency-audit commands
README.md: new Dependencies section with go list -m all | wc -l,
go mod why, govulncheck ./.... Honors operating-rules invariant.
L-008 — Release-time govulncheck gate
.github/workflows/release.yml: new 'Install govulncheck' +
'Run govulncheck (release gate)' steps in the matrix job.
Pinned to same install path as ci.yml. Default exit code
semantics (fail on called-vuln only, deferred-call advisories
tracked on master via L-021) keeps the gate appropriate.
L-016 — architecture.md drift fixes
docs/architecture.md: system-components diagram's '21 tables'
annotation removed (current 23; replaced with TEXT-keys
descriptor); connector-architecture '9 connectors' prose
replaced with grep ref + current 12-issuer list (added
Entrust/GlobalSign/EJBCA which were missing); API-design
'97 operations / 107 total' replaced with grep commands.
Connector subgraphs verified-current at 12/13/6.
L-017 — workspace CLAUDE.md verified-already-clean
Bundle B's pre-commit-gate refactor already converted current-
state numeric claims to grep commands. Phase 0 recon confirmed
zero remaining hardcoded counts.
L-018 — Defect age table
cowork/comprehensive-audit-2026-04-25/defect-age.md (NEW):
Tabulates all 9 High findings with first-mentioned commit,
closing bundle, days-open. Methodology snippet for re-running.
Key finding: 8 of 9 closed within 24h of audit publication.
M-027 — OpenAPI parity verified-already-clean
Audit's 'router 121 vs OpenAPI 125 — 4-op gap' was wrong
methodology. The 4-op 'gap' was exactly the 4 routes registered
via r.mux.Handle (auth-exempt allowlist) instead of r.Register.
When you count both dispatch shapes the totals match exactly.
internal/api/router/openapi_parity_test.go (NEW):
TestRouter_OpenAPIParity AST-walks router.go for both
Register and mux.Handle calls + walks api/openapi.yaml's
path/method nesting + asserts the sets match. Adding a route
without updating the spec fails CI permanently.
Audit deliverables:
audit-report.md: score 38/55 -> 46/55 closed
(High 7/9 -> 8/9; Medium 20/27 -> 21/27; Low 8/19 -> 14/19)
findings.yaml: 8 status flips open -> closed
defect-age.md: new file
certctl/CHANGELOG.md: Bundle D section
Verification:
TestRouter_OpenAPIParity PASS
L-001 grep guard self-test (after //nolint:gosec adds) PASS
H-009 grep guard self-test PASS
go test -count=1 -short on changed packages green
|
||
|
|
62a412c488 |
Bundle C: Renewal/reliability cluster — 7 findings closed
Closes M-006 + M-007 + M-008 + M-015 + M-016 + M-019 + M-020 from
comprehensive-audit-2026-04-25. M-028 was already closed by the
Bundle B CI follow-up.
M-006 (CWE-913) — Idempotent migration 000014
migrations/000014_policy_violation_severity_check.up.sql:
Prepended ALTER TABLE ... DROP CONSTRAINT IF EXISTS before the
ADD. Mirrors the down migration's existing IF EXISTS shape and
the M-7 idempotent-index idiom. Re-runs against partially-applied
DBs now succeed.
M-007 — Bulk-op partial-failure tests (3 new)
internal/api/handler/bulk_partial_failure_test.go:
TestBulkRevoke_PartialFailure_ReportsBoth
TestBulkRenew_PartialFailure_ReportsBoth
TestBulkReassign_PartialFailure_ReportsBoth
Each asserts HTTP 200 + both success/failure counters round-trip
+ per-cert errors[] preserved with non-empty messages so operators
can correlate each failure to its certificate ID.
M-008 — Admin-gated handler enumeration pin (verified-already-clean)
Recon: only one admin-gated handler — bulk_revocation.go — with
full 3-branch test triplet already in place. health.go calls
IsAdmin informationally to surface the flag to the GUI without
gating.
internal/api/handler/m008_admin_gate_test.go:
Walks every handler .go file, asserts every middleware.IsAdmin
call site is in AdminGatedHandlers (with required test triplet)
or InformationalIsAdminCallers (justified). Adding a new admin
gate without updating both the constant AND adding the test
triplet fails CI.
M-015 — Single-profile cardinality pin (verified-already-clean)
Audit claim 'no cardinality validation' was wrong — enforced at
struct level. domain.ManagedCertificate.{CertificateProfileID,
RenewalPolicyID,IssuerID,OwnerID} and RenewalPolicy.
CertificateProfileID are bare strings, not slices.
internal/domain/m015_cardinality_test.go:
reflect-based pin on kind=String. Schema change to N:N would
have to update renewal.go's lookup loop in the same commit.
M-016 (CWE-754) — Reap stale-agent jobs
internal/repository/postgres/job.go::ListJobsWithOfflineAgents:
JOIN jobs to agents on agent_id, filter (status=Running AND
a.last_heartbeat_at < cutoff), exclude server-keygen jobs.
internal/service/job.go::ReapJobsWithOfflineAgents:
Flips matched jobs to Failed reason agent_offline so I-001
retry loop re-queues them on a healthy agent. Records audit
event per reap.
internal/scheduler/scheduler.go:
Scheduler.runJobTimeout cycle now calls both reaper arms.
agentOfflineJobTTL default 5min (5x agent-health-check default);
SetAgentOfflineJobTTL knob for operator override.
internal/service/job_offline_agent_reaper_test.go: 6 unit tests
cover happy path, server-keygen-skip, non-Running-skip, non-
positive-TTL fail-loud, repo-error propagation, audit-event
recording.
M-019 — Configurable ARI HTTP timeout
Audit claim 'no fallback timeout' was wrong — ari.go:52 already
had a 15s timeout. Bundle C makes it configurable.
internal/connector/issuer/acme/acme.go:
Config.ARIHTTPTimeoutSeconds field with env path
CERTCTL_ACME_ARI_HTTP_TIMEOUT_SECONDS.
internal/connector/issuer/acme/ari.go:
Both HTTP clients (GetRenewalInfo + getARIEndpoint) now use the
new ariHTTPTimeout() helper. Zero / negative / nil-config all
fall back to the historic 15s default.
ari_timeout_test.go: 4 dispatch arm tests.
M-020 (CWE-770) — OCSP DoS hardening
Pre-bundle the noAuthHandler chain had no rate limit. An attacker
could DoS the OCSP responder, which for fail-open relying parties
is a revocation bypass.
cmd/server/main.go:
noAuthHandler refactored from fixed middleware.Chain(...) to a
conditional slice that appends middleware.NewRateLimiter when
cfg.RateLimit.Enabled. Per-IP keying applies; OCSP/CRL/EST/SCEP
are unauth.
docs/security.md (NEW):
Operator runbook documenting Must-Staple TLS Feature extension
RFC 7633 as the architectural fix for fail-open relying parties.
Profile-flip guidance + nginx/Apache/HAProxy/Envoy stapling
snippets + explicit scope statement on what the rate limiter
alone does NOT solve.
Audit deliverables:
cowork/comprehensive-audit-2026-04-25/audit-report.md: score
31/55 -> 38/55 closed (Medium 13/27 -> 20/27).
cowork/comprehensive-audit-2026-04-25/findings.yaml: 7 status
flips open -> closed with closure notes citing the Bundle C
mechanism.
certctl/CHANGELOG.md: Bundle C section under [unreleased].
Verification:
go vet ./internal/service ./internal/scheduler ./internal/connector/issuer/acme
./internal/api/handler ./internal/domain ./cmd/server clean
go test -count=1 -short on the same packages all green
helm template + helm lint clean
internal/repository/postgres setup-fail sandbox disk
pressure (same on master HEAD before this branch)
|
||
|
|
a172b6ed3b |
Bundle B CI follow-up: G-3 env-var docs + M-028 closure (final 5 SA1019 sites)
Two CI failures on master after Bundle B merge:
1. Frontend Build / G-3 env-var docs guardrail
Bundle B introduced CERTCTL_RATE_LIMIT_PER_USER_RPS and
CERTCTL_RATE_LIMIT_PER_USER_BURST without adding them to
docs/features.md. The guardrail step that scans Go source for
getEnv* calls and asserts each appears in a doc page failed.
Fix: docs/features.md rate-limit section extended with both new
env vars + a paragraph explaining the per-key keying contract
from M-025.
2. Go Build & Test / staticcheck SA1019 hits (6 errors)
The CI workflow runs staticcheck without continue-on-error. Bundle
7 opened M-028 to track 6 deprecated-API sites; Bundle 9 closed 1
of them (the elliptic.Marshal in local.go) but kept a deliberate
regression-oracle reference in bundle9_coverage_test.go protected
only by golangci-lint's //nolint comment — staticcheck-as-CLI does
not honor that, only its native //lint:ignore directive.
Closure of remaining 5 sites:
cmd/server/main_test.go:47, 163, 192, 465 — 4 × middleware.NewAuth
migrated to middleware.NewAuthWithNamedKeys with explicit
NamedAPIKey entries. The auth=none case at line 465 maps to a
nil NamedAPIKey slice (no-op pass-through, matches the
NewAuthWithNamedKeys contract for empty input). Audit count was
3; recon found a 4th at line 465 that was missed.
internal/api/handler/scep.go:266 — csr.Attributes is a real RFC
2985 §5.4.1 challengePassword carve-out. Go's stdlib deprecation
note explicitly applies only to OID 1.2.840.113549.1.9.14
(requestedExtensions), NOT to OID 1.2.840.113549.1.9.7
(challengePassword), for which there is no non-deprecated
stdlib API. Suppressed with native //lint:ignore SA1019 +
comment block citing the RFC.
internal/connector/issuer/local/bundle9_coverage_test.go:342 —
deliberate regression-oracle that calls elliptic.Marshal to
prove the new crypto/ecdh path is byte-identical. Comment
converted from //nolint:staticcheck to native //lint:ignore
SA1019 so staticcheck-as-CLI honors the suppression.
Audit deliverables:
cowork/comprehensive-audit-2026-04-25/audit-report.md: M-028 box
flipped [x]; score 30/55 -> 31/55 (Medium 12/27 -> 13/27).
cowork/comprehensive-audit-2026-04-25/findings.yaml: M-028 status
partial_closed -> closed with closure note.
Verification:
go test -count=1 -short ./cmd/server ./internal/api/handler
./internal/connector/issuer/local ./internal/api/middleware
./internal/config — all green.
staticcheck on each changed package — 0 SA1019 hits.
Bundle C had M-028 in scope; this CI-fix lift moves it forward so
master CI goes green immediately. Bundle C scope adjusts to remove
M-028 and focuses on M-006 / M-015 / M-016 / M-019 / M-020 plus the
M-007 / M-008 coverage gaps.
|
||
|
|
45ba27693b | Update LICENSE metadata | ||
|
|
30f9f1e712 |
Bundle B: Auth & transport surface tightening — 5 findings closed
Closes M-001 + M-002 + M-013 + M-018 + M-025 from
comprehensive-audit-2026-04-25.
M-001 (CWE-916) — PBKDF2 100k -> 600k via v3 blob format
internal/crypto/encryption.go:
- New v3Magic (0x03), pbkdf2IterationsV3 (600,000 — OWASP 2024
Password Storage Cheat Sheet floor), v3SaltSize (16 bytes),
deriveKeyWithSaltV3 helper.
- EncryptIfKeySet now unconditionally writes v3:
magic(0x03) || salt(16) || nonce(12) || ciphertext+tag
- DecryptIfKeySet falls through v3 -> v2 -> v1 with AEAD verification
at each step. Wrong-passphrase v3 reads cannot be silently
misattributed to v2/v1.
- IsLegacyFormat updated to recognize 0x03 as non-legacy.
internal/crypto/encryption_v3_test.go (NEW, 7 tests):
V3 round-trip / V2 read-fallback against deterministic v2 fixture /
V3 wrong-passphrase fails / V3-vs-V2 dispatch order / V2 vs V3 keys
differ for same (passphrase, salt) / iteration-count pin at OWASP
2024 floor / IsLegacyFormat-recognises-V3.
Coverage internal/crypto: 86.7% -> 88.2%.
M-002 (CWE-862) — Auth-exempt allowlist constants + AST regression test
Recon found auth-exempt surface spans TWO layers (audit's claim was
incomplete):
Layer 1 (router.go direct r.mux.Handle):
GET /health, GET /ready, GET /api/v1/auth/info, GET /api/v1/version
Layer 2 (cmd/server/main.go::buildFinalHandler URL-prefix dispatch):
/.well-known/pki/*, /.well-known/est/*, /scep[/...]*
internal/api/router/router.go:
- New AuthExemptRouterRoutes constant with per-entry justifications.
- New AuthExemptDispatchPrefixes constant.
internal/api/router/auth_exempt_test.go (NEW, 2 tests):
AST-walks router.go for every direct mux.Handle call and asserts
set equals AuthExemptRouterRoutes; reads source bytes of Register /
RegisterFunc and asserts they still wrap with middleware.Chain.
cmd/server/auth_exempt_test.go (NEW, 2 tests):
14-case table test on buildFinalHandler asserting documented
prefixes route to noAuthHandler and authenticated routes route to
apiHandler; inverse-overlap pin proves no documented bypass shadows
an authenticated prefix.
M-013 (CWE-942) — CORS deny-by-default verified-already-clean + pin
Audit claim 'default allows all origins if env-var unset' was WRONG.
internal/api/middleware/middleware.go::NewCORS already denies cross-
origin requests when len(cfg.AllowedOrigins) == 0 (no
Access-Control-Allow-Origin header is emitted, same-origin policy
applies).
internal/api/middleware/cors_test.go: +TestNewCORS_NilOriginsDeniesAll
+ TestNewCORS_M013_ContractDocumentedInOrder (5-case table test
pinning the 3-arm dispatch contract).
M-018 (CWE-319 / PCI-DSS Req 4) — Postgres TLS opt-in toggle
deploy/helm/certctl/values.yaml: new postgresql.tls.{mode,caSecretRef}
operator-facing knobs. Default 'disable' preserves in-cluster pod-
network behavior; PCI-scoped operators set verify-full.
deploy/helm/certctl/templates/_helpers.tpl: certctl.databaseURL helper
pipes postgresql.tls.mode into ?sslmode=.
deploy/helm/certctl/templates/server-secret.yaml: uses the helper
instead of hardcoded sslmode=disable.
deploy/docker-compose.yml: CERTCTL_DATABASE_URL is now
${CERTCTL_DATABASE_URL:-...} so operators override without editing.
docs/database-tls.md (NEW): operator runbook covering 4 deployment
shapes, RDS verify-full example with PGSSLROOTCERT mount, and
pg_stat_ssl verification query.
helm template + helm lint clean.
M-025 (OWASP ASVS L2 §11.2.1) — Per-key rate limiting
internal/api/middleware/middleware.go::NewRateLimiter rewritten from
a single global tokenBucket to a keyedRateLimiter map keyed on
'user:'+GetUser(ctx) for authenticated callers
'ip:'+RemoteAddr-host for unauthenticated
- Empty UserKey strings treated as unauthenticated.
- X-Forwarded-For intentionally NOT consulted (header-spoofing risk).
- Create-on-demand bucket allocation under sync.RWMutex with double-
check pattern.
RateLimitConfig.PerUserRPS / PerUserBurstSize fields with env vars
CERTCTL_RATE_LIMIT_PER_USER_RPS / CERTCTL_RATE_LIMIT_PER_USER_BURST
allow per-user budgets distinct from per-IP.
internal/api/middleware/ratelimit_keyed_test.go (NEW, 5 tests):
TwoIPsHaveIndependentBuckets / SameUserDifferentIPsShareBucket /
TwoUsersHaveIndependentBuckets / PerUserBudgetOverride /
EmptyUserKeyTreatedAsAnonymous.
Coverage internal/api/middleware: 82.1% -> 83.7%.
Audit deliverables:
cowork/comprehensive-audit-2026-04-25/audit-report.md: score
25/55 -> 30/55 closed (High 7/9, Medium 7/27 -> 12/27, Low 8/19).
cowork/comprehensive-audit-2026-04-25/findings.yaml: 5 status flips
open -> closed with closure notes citing the Bundle B mechanism.
certctl/CHANGELOG.md: Bundle B section under [unreleased].
Verification:
go test -count=1 -short ./... all green
staticcheck on changed packages no new SA*/ST* hits
(the 4 pre-existing SA1019 sites in cmd/server/main_test.go are
Bundle 9 / M-028 partial closure leftovers tracked in Bundle C)
helm template + helm lint clean
internal/repository/postgres setup-fail sandbox disk pressure,
same on master HEAD before this branch — environmental, not Bundle B
|
||
|
|
1d6c7a0552 |
fix(bundle-6): Audit Integrity + Privacy — 3 audit findings closed
Closes Audit-2026-04-25 H-008 (High), M-017 (Medium), M-022 (Medium).
Hardens audit-trail tamper-resistance + minimizes PII leakage in one
cohesive change, with both controls applying automatically and no
operator action required at install time.
What changed
- internal/service/audit_redact.go (NEW) — RedactDetailsForAudit:
* credentialKeys deny-list (api_key, password, *_pem, eab_secret, ...)
* piiKeys deny-list (email, phone, ssn, name, address, ip_address, ...)
* case-insensitive key match; recurses into nested maps + arrays
* mutation-free; surfaces redacted_keys array for operator visibility
* nil/empty input → nil out (preserves pre-Bundle-6 behaviour)
- internal/service/audit.go — RecordEvent now routes details through
RedactDetailsForAudit BEFORE marshaling. No call-site changes required.
- internal/service/audit_redact_test.go (NEW) — full coverage:
* credential keys (~30 entries)
* PII keys (~20 entries)
* nested maps + arrays
* case-insensitivity
* mutation-free invariant
* JSON round-trip (catches type-assertion regressions)
* scalar pass-through (no panic on int/bool/nil)
- migrations/000018_audit_events_worm.up.sql (NEW) — DB-level WORM:
* BEFORE UPDATE OR DELETE trigger raises check_violation with
diagnostic citing the rationale + compliance-superuser hint
* REVOKE UPDATE,DELETE ON audit_events FROM certctl (defence-in-depth)
* REVOKE wrapped in pg_roles existence check so test fixtures
without the certctl role stay idempotent
- migrations/000018_audit_events_worm.down.sql (NEW) — clean teardown
for dev resets; not for production use.
- internal/repository/postgres/audit_worm_test.go (NEW, testcontainers,
-short gated) — INSERT succeeds; UPDATE + DELETE fail with
check_violation; second INSERT after blocked modification still
succeeds (no trigger-state corruption).
- docs/compliance.md — new section "Audit-Trail Integrity & Privacy
(Bundle 6)" with verification psql snippet, compliance-superuser
pattern (NOT auto-created), redactor before/after example, and a
maintenance note for adding new credential keys.
Compliance mapping
- H-008 (CWE-532 Insertion of Sensitive Information into Log File)
- M-017 (HIPAA Technical Safeguards §164.312(b) — audit controls)
- M-022 (GDPR Art. 32 — data minimization)
Threat model: TB-3 (audit log tampering), TB-1 (operator/orchestrator).
Verification
- go vet ./... → clean
- go build ./... → clean
- go test -short -count=1 ./... → all packages pass
- go test -count=1 -run TestRedactDetailsForAudit ./internal/service/...
→ all pass
- (testcontainers, gated by -short) audit_worm_test.go pins WORM contract
- npx tsc --noEmit (web) → clean (no frontend changes)
- python3 yaml.safe_load(api/openapi.yaml) → 89 paths
Backward compatibility
- Trigger applies forward only — existing rows unchanged.
- nil/empty details from RecordEvent callers → nil out (preserves prior
behaviour for the many existing call sites that pass nil).
- Compliance superusers (provisioned out-of-band) bypass the trigger.
Bundle 6 of the 2026-04-25 comprehensive audit.
|
||
|
|
a2a82a6cf8 |
fix(bundle-5): CI green-up — drop unused sync.Once + document new env vars
Two CI gate failures from the Bundle 5 push:
1. golangci-lint (unused) — agent_bootstrap.go declared
`var bootstrapWarnOnce sync.Once` but never called .Do(). The
one-shot WARN actually lives in cmd/server/main.go (per-process at
startup, not per-request) so the handler-side variable was dead code.
Dropped the var + sync import; left a comment explaining where the
WARN lives.
2. G-3 env-var docs guardrail — Bundle 5 added two new env vars
(CERTCTL_AGENT_BOOTSTRAP_TOKEN, CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS)
but the G-3 closure CI step asserts every CERTCTL_* env defined in
internal/config/config.go is mentioned in docs/features.md. Added
three new sub-sections to docs/features.md after the Body Size
Limits block:
* Agent Bootstrap Token (H-007 contract + generation guidance)
* Graceful Shutdown Audit Flush (M-011 timeout knob)
* Liveness vs Readiness Probes (H-006 /health vs /ready table)
No production behaviour change; pure CI-gate fix.
Verification
- go vet ./internal/api/handler/... → clean
- go test -count=1 -run 'TestVerifyBootstrapToken|TestRegisterAgent_BootstrapToken' ./internal/api/handler/... → all pass
- grep CERTCTL_AGENT_BOOTSTRAP_TOKEN docs/features.md → present
- grep CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS docs/features.md → present
|
||
|
|
c6a9a76147 |
docs(features): document CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL (G-3 fix)
CI on the S-2 merge ( |
||
|
|
1c6009a920 |
chore(cleanup,docs): vite proxy + dead scheduler setter wired + registerAgent/CLI docs (C-1 master)
Closes six 2026-04-24 audit findings (3 P2 + 3 P3) — a cleanup-and-doc
tail bundle that drains the smallest remaining leaves of the audit:
- cat-u-vite_dev_proxy_plaintext_drift (P2): web/vite.config.ts
proxied dev requests to http://localhost:8443 against an HTTPS-only
backend (HTTPS-only since v2.0.47). Every dev-server API call 502'd.
Fix: targets are now object-form `{target: 'https://...', secure: false,
changeOrigin: true}` — the dev cert is self-signed by the
deploy/test bootstrap and changes per-checkout.
- cat-g-7e38f9708e20 (P3): Scheduler.SetShortLivedExpiryCheckInterval
was defined + tested but never called from cmd/server/main.go.
Operators tuning CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL got
no effect — the 30s default in scheduler.NewScheduler was
effectively hardcoded. Fix: added Config.Scheduler.ShortLivedExpiryCheckInterval
+ getEnvDuration in Load() reading the env var with a 30s default,
+ sched.SetShortLivedExpiryCheckInterval(...) call in main.go
alongside the other scheduler-interval setters.
- diff-10xmain-2bf4a0a60388 (P3): same root cause as cat-g-7e38f9708e20;
closes as ride-along.
- cat-b-6177f36636fb (P2): registerAgent client fn orphan. By-design
per pull-only deployment model. Fix (audit recommendation:
"document"): added a closure docblock above the export in
client.ts + a new "Registration is by-design pull-only" paragraph
in docs/architecture.md::Agents section explaining when/why a
future GUI-driven enrollment feature might reach the endpoint
(proxy-agent topologies for network appliances).
- cat-i-7c8b28936e3d (P2): CLI scope intentionally narrow but
undocumented. Fix: new "Scope (intentionally narrow)" subsection
in docs/features.md::CLI capturing the SSH-into-prod / day-to-day
GUI / AI-automation MCP three-way split.
Verification:
- go build ./... — clean
- go vet ./... — clean
- go test ./internal/scheduler/... ./internal/config/... — pass
- golangci-lint v2.11.4 run ./... — 0 issues
- tsc --noEmit (frontend) — clean
- All sibling guardrails (S-1 / G-3 / D-1+D-2 / B-1 / L-1 / H-1) still pass
Audit findings closed:
- cat-u-vite_dev_proxy_plaintext_drift (P2)
- cat-g-7e38f9708e20 (P3)
- diff-10xmain-2bf4a0a60388 (P3)
- cat-b-6177f36636fb (P2)
- cat-i-7c8b28936e3d (P2)
- (audit-bookkeeping ride-along: ensures every closed-bundle row has a non-empty merge SHA)
Deferred follow-ups: none from this bundle. The remaining audit
backlog (frontend test campaign, F-1 CertificatesPage UX, P-1
orphan-fn sweep, S-2 handler error-mapping refactor) is sibling
sub-bundles in this mega-prompt.
|
||
|
|
2419f8cd27 |
docs(features): reconcile env-var inventory with config.go (G-3 master)
Closes three 2026-04-24 audit findings (all P2, all category cat-g):
- cat-g-renewal_check_interval_rename_drift: features.md:152
advertised CERTCTL_RENEWAL_CHECK_INTERVAL but config.go renamed
that to CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL. Fixed in prose
+ the scheduler-loops table on line 1117.
- cat-g-b8f8f8796159: 6 env vars in config.go that were never
documented:
CERTCTL_DATABASE_MIGRATIONS_PATH
CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT
CERTCTL_JOB_AWAITING_CSR_TIMEOUT
CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL
CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL
CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL
Added to the scheduler-loops table at features.md:1117 and
(DATABASE_MIGRATIONS_PATH) to the new Database Schema preamble.
- cat-g-163dae19bc59: 37 env vars in docs not defined in config.go.
The audit's strict comm over-flagged this set: most "phantoms"
are integration-surface contracts (script env vars certctl
EXPORTS to user-provided ACME DNS-01 / OpenSSL CA scripts;
StepCA / Webhook per-issuer-or-notifier config-blob field
names; CERTCTL_QA_* test fixtures; agent-side env vars defined
in cmd/agent/main.go). The closure narrows the gate to the
one true phantom (the rename) and allowlists the documented
integration contracts in the CI guard. Each allowlist entry
has a one-line justification.
CI regression guardrail:
- .github/workflows/ci.yml::"Forbidden env-var docs drift regression
guard (G-3)" — runs `comm -23` both ways between the env vars
defined in Go source (config.go + cmd/* + ACME DNS export +
test fixtures) and env vars mentioned in README + docs/ +
deploy/helm/. Fails the build if either set is non-empty modulo
the documented integration-surface allowlist.
Verification:
- comm -23 docs vs defined → empty post-fix (allowlist applied)
- comm -23 defined vs docs → empty post-fix
- golangci-lint v2.11.4 run ./... → 0 issues
- tsc --noEmit → clean
- S-1 stale-counts guardrail still passes
Audit findings closed:
- cat-g-163dae19bc59 (P2, docs-only env vars)
- cat-g-b8f8f8796159 (P2, config-only env vars)
- cat-g-renewal_check_interval_rename_drift (P2, renamed env var still in docs)
Deferred follow-ups:
- The 26 documented-but-unimplemented integration contracts on the
allowlist (CERTCTL_OPENSSL_*, CERTCTL_ACME_EAB_*, CERTCTL_WEBHOOK_*,
CERTCTL_AUDIT_EXCLUDE_PATHS, CERTCTL_TLS_*, CERTCTL_ACME_DNS_PROPAGATION_WAIT)
are documented in features.md / connectors.md / demo-advanced.md but
not yet read by any Go source. Either implement in config.go (each is
its own M-X) or delete from docs (separate cleanup PR). Neither
expansion fits inside G-3's "reconcile drift" scope.
|
||
|
|
530da674f8 |
docs(README,features,examples): replace stale source counts with rebuild commands (S-1 master)
Closes two 2026-04-24 audit findings — one P1 (cat-s1-9ce1cbe26876,
README + features.md cite stale numeric counts) and one P2
(cat-s1-features_md_issuer_count_contradiction, features.md self-
disagreed on issuer count saying 9 in two places + 12 in two others).
Both root in a CLAUDE.md invariant: "Numeric claims about current
state rot the instant the next release lands... Before adding any
current-state count, delete it and write the command instead."
Per-site changes:
- docs/features.md::"At a Glance" table — replaced 12 hardcoded counts
with `rebuild via <command>` references quoting the canonical
source-of-truth grep from CLAUDE.md::"Current-state commands".
- docs/features.md::Issuer Connectors section — dropped "9 issuer
connectors" (stale; live: 12) and "12 IssuerType constants" prose;
prose now references the rebuild command.
- docs/features.md::Target Connectors section — same treatment for
"14 target connector types".
- docs/features.md::"Per-type config schema validation for all 9
issuer types" — same treatment.
- docs/features.md::"80 MCP tools covering all API endpoints" — same.
- docs/features.md::Web Dashboard section — dropped "24 pages wired"
+ the "(25 Route elements, 24 pages)" comment.
- docs/examples.md::"Beyond These Examples" — dropped "7 issuer
backends and 10 target connectors" prose; references features.md
and the rebuild commands.
CI regression guardrail:
- .github/workflows/ci.yml::"Forbidden hardcoded source-count prose
regression guard (S-1)" — grep-fails the build if any of the
blocked phrases (e.g. "9 issuer connectors", "21 database tables",
"80 MCP tools") reappears in README or docs/. Allowlists demo-
fixture prose ("32 certificates" — seed_demo.sql facts), historical
WORKSPACE-CHANGELOG counts, the testing-guide example phrasing,
and any number adjacent to a quoted rebuild command.
Verification:
- S-1 guardrail dry-run on post-fix tree → empty (good)
- golangci-lint v2.11.4 run ./... → 0 issues
- tsc --noEmit → clean
- vitest, vite build unchanged from pre-S-1 baseline (no JS/TS touched)
Audit findings closed:
- cat-s1-9ce1cbe26876 (P1, README + features.md stale numeric counts)
- cat-s1-features_md_issuer_count_contradiction (P2, features.md
self-contradiction on issuer count)
Deferred follow-ups:
- WORKSPACE-CHANGELOG.md historical-milestone counts intentionally
preserved (those are point-in-time facts about shipped slices, not
current-state claims). README demo-fixture counts ("32 certs, 10
issuers") preserved — those describe the seed_demo.sql shape, not
the live source surface.
|
||
|
|
55eb7135be |
fix(web,ci): close TS↔Go type drift across 5 entities (D-2 master)
Closes five 2026-04-24 audit findings (all P2, all category cat-f /
diff-05x06-*) by reconciling the TypeScript interfaces in
web/src/api/types.ts with the on-wire JSON shape Go's
internal/domain/*.go structs actually emit. D-1 closed the same pattern
for one entity (Certificate / ManagedCertificate); D-2 covers the
remaining five.
Per-entity verdicts (audit's "stricter side is the contract"):
Agent — TRIM 5 phantoms (last_heartbeat, capabilities, tags,
created_at, updated_at). Go emits last_heartbeat_at only.
Target — ADD 2 (retired_at?, retired_reason?) — I-004 fields.
DiscCert — ADD pem_data? — real field, real Go emit, omitempty.
Issuer — TRIM phantom status. Go has Enabled bool only.
Notif — TRIM phantom subject. Go has Message string only.
Certificate — verify-only; D-1 closure confirmed clean at recon.
Consumer fixes (same commit as the trim):
- AgentDetailPage.tsx — remove dead Capabilities + Tags sections (always
rendered empty); replace agent.created_at/updated_at row with the
Go-emitted registered_at; widen heartbeatStatus() to accept undefined.
- AgentsPage.tsx — same heartbeatStatus widening.
- IssuersPage.tsx + IssuerDetailPage.tsx — issuerStatus() now derives
from `enabled` exclusively; the dead `issuer.status || 'Unknown'`
fallback is gone.
- NotificationsPage.tsx — drop dead `|| n.subject` fallback.
- NotificationsPage.test.tsx — drop dead `subject:` from mocks.
- api/utils.ts::timeAgo widened to accept string | undefined | null.
- api/types.test.ts — Agent (I-004) fixture trimmed of the 5 phantoms.
Tests (Vitest):
- 5 new describe blocks in web/src/api/types.test.ts:
- Agent interface (D-2 phantom-fields trim) — 2 it blocks
- Target interface (D-2 retirement fields) — 2 it blocks
- DiscoveredCertificate interface (D-2 pem_data ADD) — 2 it blocks
- Issuer interface (D-2 status phantom trim) — 1 it block
- Notification interface (D-2 subject phantom trim) — 1 it block
- Each block uses the literal-construction pattern from D-1; trimmed
fields are pinned via excess-property comments that compile-fail when
uncommented if a phantom is reintroduced.
CI regression guardrail:
- .github/workflows/ci.yml — existing D-1 step renamed to "Forbidden
StatusBadge dead-key + TS phantom-field regression guard (D-1 + D-2)".
Three new awk-windowed greps over Agent / Issuer / Notification
interfaces in types.ts. The Agent grep includes a `grep -v
'last_heartbeat_at'` filter to avoid false positives on the
legitimate Go-emitted heartbeat field.
Documentation:
- CHANGELOG.md — new D-2 section above B-1 under [unreleased] with full
Added/Removed/Audit findings closed/Known follow-ups breakdown.
- docs/architecture.md — Web Dashboard section gains a new "TS ↔ Go
type contract rule (D-1 + D-2 closure)" paragraph capturing the
stricter-side-wins rule and the CI guardrail it's anchored by.
- coverage-gap-audit-2026-04-24-v5/unified-audit.md — Live Tracker score
20/47 → 25/47 (P2: 6/27 → 11/27). Per-finding ✅ RESOLVED Status
blocks added to all 5 diff-05x06-* entries plus the verify-only
Certificate entry. Closed-bundle index gets D-2 row.
Verification (all gates green):
- cd web && tsc --noEmit → clean
- cd web && vitest run --reporter=dot → 9 files, 302 tests passing
(was 294 → +8 D-2 cases)
- cd web && vite build → clean
- go vet ./internal/... ./cmd/... → clean (no Go touched)
- golangci-lint v2.11.4 run ./... → 0 issues
- D-2 Agent guardrail dry-run → empty (good)
- D-2 Issuer guardrail dry-run → empty (good)
- D-2 Notification guardrail dry-run → empty (good)
- D-2 Target ADD-shape sanity → 2 retirement fields present
- D-2 DiscCert ADD-shape sanity → pem_data present
- D-1 Certificate guardrail still clean → empty (good)
- OpenAPI YAML parses → 89 paths
Audit findings closed:
- diff-05x06-7cdf4e78ae24 (P2, Agent TS↔Go drift)
- diff-05x06-2044a46f4dd0 (P2, Target TS↔DeploymentTarget Go drift)
- diff-05x06-85ab6b98a2f7 (P2, DiscoveredCertificate TS↔Go drift)
- diff-05x06-97fab8783a5c (P2, Issuer TS↔Go drift)
- diff-05x06-caba9eb3620e (P2, Notification TS↔NotificationEvent drift)
- diff-05x06-af18a8d7ef41 (P2) — verified clean since D-1; no edit
Deferred follow-ups:
- Issuer richer status view (enabled × test_status) — UX scope, not drift.
- Real Agent metadata (capabilities, tags) — backend feature, not drift.
- DiscoveredCertificate pem_data list-response perf — separate backend change.
|
||
|
|
097995e503 |
fix(web,ci): close orphan-CRUD GUI gaps + dead exportCertificatePEM (B-1 master)
Closes four 2026-04-24 audit findings via per-page Edit modals on five
existing pages, a brand-new RenewalPoliciesPage for the rp-* CRUD surface,
and removal of one dead duplicate so the public client surface stops
growing without consumers. Anchored by a CI grep guardrail that fails
the build if any of the eight previously-orphan client functions loses
its non-test page consumer or if exportCertificatePEM is resurrected.
Per-page Edit modals (mirroring existing CreateXModal scaffolding):
- web/src/pages/OwnersPage.tsx — EditOwnerModal (name/email/team_id)
- web/src/pages/TeamsPage.tsx — EditTeamModal (name/description)
- web/src/pages/AgentGroupsPage.tsx — EditAgentGroupModal (full match-rule
set: name/description/match_os/match_architecture/match_ip_cidr/
match_version/enabled)
- web/src/pages/IssuersPage.tsx — EditIssuerModal (rename-only; type
locked, config blob preserved untouched, footer note about delete+
recreate for credential rotation)
- web/src/pages/ProfilesPage.tsx — EditProfileModal (rename + description
only; policy fields preserved untouched, footer note about deferred
policy editing)
New page (closes cat-b-4631ca092bee — RenewalPolicy CRUD orphan):
- web/src/pages/RenewalPoliciesPage.tsx — full CRUD page with shared
PolicyFormModal for Create + Edit (form shape identical), 7-column
DataTable (Policy/RenewalWindow/Auto/Retries/AlertThresholds/Created/
Actions), comma-separated alert_thresholds_days input parser, and
alert() surfacing of repository.ErrRenewalPolicyInUse (409) on Delete
so operators can re-target dependent certs before deletion.
- web/src/main.tsx — adds /renewal-policies route.
- web/src/components/Layout.tsx — adds sidebar nav item slotted between
Policies and Profiles.
Removed (closes cat-b-9b97ffb35ef7 — dead duplicate):
- web/src/api/client.ts::exportCertificatePEM — zero consumers across
web/, MCP, CLI, tests; downloadCertificatePEM is the actual call site
in CertificateDetailPage. Test references in client.test.ts and
client.error.test.ts also removed.
CI regression guardrail:
- .github/workflows/ci.yml — adds 'Forbidden orphan-CRUD client function
regression guard (B-1)' step. Greps for all eight previously-orphan
fns (updateOwner/updateTeam/updateAgentGroup/updateIssuer/updateProfile
+ createRenewalPolicy/updateRenewalPolicy/deleteRenewalPolicy) under
web/src/pages/ and fails the build if any has zero non-test consumers.
Also blocks resurrection of exportCertificatePEM. Verified locally
(all 8 fns have ≥2 consumers; exportCertificatePEM is gone) and
against synthetic regressions.
Documentation:
- CHANGELOG.md — new B-1 section above L-1 under [unreleased].
- docs/architecture.md — Web Dashboard section gains a new paragraph
capturing the 'every backend CRUD must have a GUI consumer' rule
with reference to the CI guardrail.
- coverage-gap-audit-2026-04-24-v5/unified-audit.md — flips four
findings to ✅ RESOLVED with detailed Status blocks; bumps Live
Tracker score 16/47 → 20/47 (P1: 9→12, P3: 1→2); adds B-1 row to
closed-bundle index.
Verification:
- cd web && tsc --noEmit — clean
- cd web && vitest run — 9 test files, 294 tests, all passing
- cd web && vite build — clean (no new warnings)
- B-1 guardrail dry-run — all 8 client fns have ≥2 page consumers,
exportCertificatePEM removed (good), FAIL=0
Audit findings closed:
- cat-b-31ceb6aaa9f1 (P1, updateOwner/updateTeam/updateAgentGroup orphan)
- cat-b-7a34f893a8f9 (P1, updateIssuer/updateProfile orphan, rename-only)
- cat-b-4631ca092bee (P1, RenewalPolicy CRUD orphan)
- cat-b-9b97ffb35ef7 (P3, exportCertificatePEM dead duplicate)
Deferred follow-ups:
- Fuller EditIssuerModal with credential-rotation flow (needs threat
model: rotation reuse window, in-flight CSR cancellation, audit-trail
granularity).
- Fuller EditProfileModal with policy-field editing (max-TTL, allowed
EKUs, allowed key algorithms — affect already-issued cert evaluation).
- Per-page Vitest coverage for the new Edit modals (CI grep guardrail
catches the same regression vector at lower cost).
|
||
|
|
a3d8b9c607 |
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree. |
||
|
|
86fffa305a |
fix(deploy,helm,docs): published-image HEALTHCHECK speaks HTTPS + Helm /ready path + docs HTTPS sweep (U-2)
Pre-U-2 the published `ghcr.io/shankar0123/certctl-server` image shipped with `HEALTHCHECK CMD curl -f http://localhost:8443/health`. The server has been HTTPS-only since the v2.2 HTTPS-Everywhere milestone (`cmd/server/main.go::ListenAndServeTLS`, no plaintext fallback, TLS 1.3 pinned), so the probe failed on every interval and Docker marked the container `unhealthy` indefinitely. Operators inside docker- compose / Helm / the example stacks were unaffected — compose overrides the HEALTHCHECK with `--cacert + https://`, Helm uses explicit `httpGet` probes that ignore Docker's HEALTHCHECK, and every example compose file overrides with `curl -sfk https://localhost:8443/health`. But anyone running bare `docker run` / Docker Swarm / Nomad / ECS — exactly the "I just pulled the published image" path — saw permanent `unhealthy` status and (depending on orchestrator policy) a restart- loop. (Audit: cat-u-healthcheck_protocol_mismatch in coverage-gap-audit-2026-04-24-v5/unified-audit.md.) Recon for U-2 surfaced two adjacent bugs from the same v2.2 milestone gap, both bundled into this commit because they share the same root cause and the same operator surface: 1. Helm chart `server.readinessProbe.httpGet.path` pointed at `/readyz`, the kube-flavored convention. The certctl server doesn't register `/readyz` (only `/health` and `/ready` are wired and bypass the auth middleware — see internal/api/router/router.go:81 and cmd/server/main.go:920). K8s readiness probes therefore got 401 (api-key auth rejection) or 404 (when auth was disabled), pods stayed `NotReady` indefinitely, and Helm rollouts stalled. 2. The agent image (`Dockerfile.agent`) had no HEALTHCHECK at all, so bare-`docker run` agents got zero health signal. The compose override at `deploy/docker-compose.yml:173` called `pgrep -f certctl-agent` against the agent image, but the agent image didn't ship `procps` — pgrep was missing too. The compose probe was a latent always-fail. We fixed all three with the audit-recommended shape (option (a) — `-k`) plus three structural backstops: Files changed: Phase 1 — Dockerfile fix: - Dockerfile: HEALTHCHECK switched from `curl -f http://localhost:8443/ health` to `curl -fsk https://localhost:8443/health`. `-k` (insecure) is acceptable because the probe is localhost-to-localhost: the same process serving the cert is being probed, no network hop. Pinning `--cacert` is not viable for the published image because the bootstrap cert is per-deploy (generated into the `certs` named volume on first up; operator-supplied via Helm's `existingSecret` or cert-manager). Long-form docblock cross-references the audit closure, the compose vs Helm vs examples coverage matrix, and the CI guardrail. - Dockerfile.agent: added HEALTHCHECK using `pgrep -f certctl-agent` matching the compose pattern. Added `procps` to the runtime apk install — fixes both the new image-level HEALTHCHECK AND the pre-existing compose probe that was silently failing. Phase 2 — Helm readiness probe path: - deploy/helm/certctl/values.yaml: server.readinessProbe.httpGet.path changed from `/readyz` to `/ready`. Liveness probe path (`/health`) was correct and is unchanged. Probes block now carries an explanatory comment naming the registered no-auth probe routes and the U-2 closure rationale. Phase 3 — Image-level integration tests: - deploy/test/healthcheck_test.go (new, //go:build integration): TestPublishedServerImage_HealthcheckSpecUsesHTTPS builds the server image, inspects `Config.Healthcheck.Test` via `docker inspect`, and asserts the array contains `https://localhost:8443/health` and `-k`, and does NOT contain `http://localhost:8443/health` (positive + negative regression contracts). TestPublishedAgentImage_HealthcheckSpecExists builds the agent image and asserts the HEALTHCHECK uses `pgrep` against `certctl-agent`. Both tests `t.Skip` cleanly when docker isn't available (sandbox / CI without docker-in-docker) — verified locally: tests skip with the diagnostic and the suite returns PASS. TestPublishedServerImage_HealthcheckTransitionsToHealthy is a documented `t.Skip` placeholder until the harness wires a sidecar postgres for image-level smoke; the spec-level tests above cover the audit-flagged regression. Phase 4 — CI guardrail: - .github/workflows/ci.yml: new "Forbidden plaintext HEALTHCHECK regression guard (U-2)" step. Scoped patterns catch `HEALTHCHECK.*http://` and `curl -f http://localhost:8443/health` in any `Dockerfile*`. Comment lines exempt; docs/upgrade-to-tls.md out of scope (the post-cutover invariant string at line 182 is intentionally a documented expected-failure assertion). Verified locally on the real tree (passes) and against synthetic regressions (each fires the guard). Phase 5 — Docs sweep: - docs/connectors.md: 15 stale curl examples updated from `http://localhost:8443/...` to `https://localhost:8443/...` with `--cacert "$CA"` injected on every site. Added a one-time introductory note documenting the `$CA` extraction with `docker compose ... exec ... cat /etc/certctl/tls/ca.crt`, matching the pattern in docs/quickstart.md. Pre-U-2 these examples silently failed against the HTTPS listener. Phase 6 — Release surface: - CHANGELOG.md: appended U-2 section to the existing [unreleased] block (immediately below the G-1 entry). Sections: explanatory blockquote covering all three bugs (primary + 2 adjacent), Fixed, Added, Changed. Verification (all gates pass): - go build ./... — clean - go vet ./... — clean - go vet -tags integration ./deploy/test/ — clean - go test -short ./... — every package green - go test -tags integration -v -run TestPublishedServerImage|TestPublishedAgentImage ./deploy/test/ — three tests SKIP cleanly with "docker not available" diagnostic - helm lint deploy/helm/certctl/ — clean - helm template smoke render — succeeds; rendered Deployment carries `path: /ready` and zero `/readyz` matches - python3 yaml.safe_load on api/openapi.yaml — parses - govulncheck ./... — no vulnerabilities in our code - CI guardrail mirror: clean on real tree, fires on synthetic regression patterns Out of scope (intentionally untouched): - cmd/server/main.go::ListenAndServeTLS — HTTPS-only is correct, this finding does NOT propose adding back a plaintext listener. - deploy/docker-compose.yml:126 HEALTHCHECK — already correct. - deploy/docker-compose.test.yml HEALTHCHECK blocks — already correct. - All 5 examples/*/docker-compose.yml HEALTHCHECK overrides — already correct (they ALSO use `-fsk https://localhost:8443/health`). - Helm server.livenessProbe.httpGet — already uses `scheme: HTTPS` + `path: /health`, correct. - docs/upgrade-to-tls.md:182 `curl ... http://localhost:8443/health` invariant line — that's the expected-failure assertion for the post-cutover state ("plaintext is gone, expect Connection refused"); intentionally left intact. - Go production code — this is purely a deploy-image / probe / docs / Helm-chart fix. Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md §2 P1 cluster, cat-u-healthcheck_protocol_mismatch Audit recommendation followed verbatim: 'change Dockerfile:80 to CMD curl -kf https://localhost:8443/health'. |
||
|
|
87213128cc |
fix(security,domain): redact Agent.APIKeyHash from JSON wire shape (G-2)
Pre-G-2 internal/domain/connector.go::Agent::APIKeyHash was tagged
`json:"api_key_hash"` and shipped on every wire surface that returned
domain.Agent — GET /api/v1/agents (PagedResponse{Data: agents}),
GET /api/v1/agents/{id}, GET /api/v1/agents/retired, and the
POST /api/v1/agents registration response. Every authenticated client
(browser, CLI --json, MCP tool calls) received the SHA-256-of-the-API-key
string. The browser silently dropped it because web/src/api/types.ts
omits the field, but CLI and MCP consumers print full JSON so the hash
was visible there. Even though the value is a hash and not the plaintext
key, shipping it gives an attacker an offline brute-force target if the
API-key entropy is low (certctl doesn't enforce a minimum on operator-
supplied keys), and there's no business reason for any client to ever
receive it — the value is server-internal, used only for the lookup at
internal/repository/postgres/agent.go::GetByAPIKey. (Audit:
cat-s5-apikey_leak in coverage-gap-audit-2026-04-24-v5/unified-audit.md.)
We chose the audit's recommended fix (json:"-") plus a defense-in-depth
MarshalJSON plus a CI guardrail. Three layers because struct-tag
redaction alone is one rebase away from being silently reverted, the
custom MarshalJSON catches the case where a parent struct embeds Agent
under a different tag, and the CI grep blocks reintroduction at the spec
or frontend boundary even without a code review catching it.
Files changed:
Phase 1 — Domain redaction:
- internal/domain/connector.go: APIKeyHash tag flipped from
`json:"api_key_hash"` to `json:"-"`. New Agent.MarshalJSON
with value receiver + type-alias-recursion-break that explicitly
zeroes APIKeyHash on the marshal-time copy. Long-form docblock
explaining the G-2 closure rationale + cross-references to
service.RegisterAgent (populator), repository.AgentRepository::
GetByAPIKey (consumer), docs/architecture.md (DB-shape vs
API-shape distinction), and the audit finding.
Phase 2 — Domain tests (5 test functions):
- internal/domain/connector_test.go: TestAgent_MarshalJSON_RedactsAPIKeyHash
pins the marshal-boundary contract on a value receiver. ...RedactsViaPointer
pins the *Agent path. ...RedactsInSlice pins the []Agent path that the
ListAgents handler actually emits via PagedResponse. ...DoesNotMutateReceiver
pins the by-value-receiver contract so a future refactor that switches
to pointer-receiver gets caught. ...RoundTrip pins the wire-shape
guarantee that APIKeyHash is dropped on encode and cannot reappear on
decode. Single sentinel value ("sha256:LEAKED-CREDENTIAL-DERIVATIVE-
SENTINEL") flows through every fixture for grep-ability on regression.
Phase 3 — Handler tests (4 test functions):
- internal/api/handler/agent_handler_test.go: TestListAgents_DoesNotLeakAPIKeyHash,
TestGetAgent_DoesNotLeakAPIKeyHash, TestRegisterAgent_DoesNotLeakAPIKeyHash,
TestListRetiredAgents_DoesNotLeakAPIKeyHash. Each asserts (a) the
literal substring "api_key_hash" is absent from the httptest-captured
body, (b) the leak sentinel value is absent, (c) the non-leaked fields
ARE present (sanity that the handler is serving real data, not just
empty payloads). Shared sentinel "sha256:LEAKED-CREDENTIAL-DERIVATIVE-
HANDLER-SENTINEL" so a single grep over a failing test's output
identifies the leak surface immediately.
Phase 4 — Spec / docs:
- api/openapi.yaml: api_key_hash property REMOVED from Agent schema
(was at line 3690). Inline G-2 comment naming the closure + the
database-vs-API-shape distinction so a future spec edit doesn't
silently re-introduce the field.
- docs/architecture.md: ER-diagram block already documents the agents
table including api_key_hash (DB shape — correct). Added a sibling
note paragraph immediately below the diagram explaining that several
columns are intentionally server-internal (api_key_hash redaction
+ issuers.config / deployment_targets.config encrypted shadow), with
cross-references to the redaction enforcement site, the OpenAPI
schema, the frontend interface, and the CI guardrail.
- web/src/api/types.ts: Agent interface unchanged in shape (already
omitted the field) but added a leading comment block explaining
WHY the omission is intentional — stops a future frontend dev from
"completing" the interface from the OpenAPI spec or the Go struct.
Phase 5 — CI guardrail:
- .github/workflows/ci.yml: new "Forbidden api_key_hash JSON-shape
regression guard (G-2)" step. Scoped patterns catch the actual
regression shapes — Go struct tag (json:"api_key_hash"), frontend
interface declaration, OpenAPI schema property, YAML enum/array
membership. Repository / migration / seed / service / integration /
unit-test / comment lines exempt. Verified locally on the real tree
(passes) and against 4 synthetic regression patterns (each fires
the guardrail). Mirrors the G-1 pattern from .github/workflows/
ci.yml lines 47-108.
Phase 5b — Sweep verification (no changes, results documented for the
next reader):
- internal/api/middleware/audit.go: doesn't serialize Agent struct;
records request body only. No leak.
- service.RegisterAgent audit-event payload: `map[string]interface{}{
"name": name, "hostname": hostname}` — name + hostname only,
no APIKeyHash. No leak.
- All 9 slog sites that mention agent: scalar attrs only ("agent_id",
"error", "agent_hostname"), never the full struct. No leak.
- internal/mcp, internal/cli, cmd/cli, cmd/mcp-server: zero matches
for APIKeyHash / api_key_hash. Both pass server JSON verbatim, so
the wire-side fix transitively closes them.
Verification (all gates pass):
- go build ./...
- go vet ./...
- go test -short ./... — every package green
- go test -short -race ./internal/domain/... ./internal/api/handler/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template smoke render — succeeds
- python3 yaml.safe_load on api/openapi.yaml — parses
- OpenAPI Agent schema scan: no api_key_hash property
- CI guardrail mirror: clean on real tree, fires on all 4 synthetic
regression patterns
- Domain pkg coverage: Agent.MarshalJSON 100%, connector.go total 87.5%
- Handler pkg coverage: 79.2%
Sample response body (httptest captured during verification, GET
/api/v1/agents/{id} via the new handler test):
{"id":"agent-demo","name":"demo-agent","hostname":"demo.host",
"status":"Online","last_heartbeat_at":"2026-04-24T11:59:30Z",
"registered_at":"2026-04-24T12:00:00Z","os":"linux",
"architecture":"amd64","ip_address":"10.0.0.42",
"version":"v2.0.49"}
Note the absence of any api_key_hash key, even though the in-memory
struct passed to the handler had APIKeyHash set to a sentinel.
Out of scope (intentionally untouched):
- internal/repository/postgres/agent.go SELECT/INSERT/UPDATE/scan
paths and GetByAPIKey lookup — DB column stays, repo still
populates the struct, auth lookup still works. The redaction is a
marshal-boundary concern.
- migrations/000001_initial_schema.up.sql + migrations/seed_*.sql —
DB schema and seed data unchanged.
- internal/service/agent.go::RegisterAgent — service-side hashing
and persistence unchanged.
- Other domain types with potential credential-derivative fields
(Issuer.Config, DeploymentTarget.Config, notifier configs). Not
flagged by the audit; some are already protected (e.g.,
DeploymentTarget.EncryptedConfig []byte `json:"-"`). File a
separate audit pass if recon surfaces additional leaks.
- Per-resource DTO layer across every handler. Single audit
finding, single domain type.
- A separate possible follow-up: the v2 RegisterAgent endpoint
doesn't return the plaintext API key to the agent, which may
mean self-bootstrap via POST /api/v1/agents is broken. Verified
during recon; out of scope for G-2; should be its own ticket.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-s5-apikey_leak
Audit recommendation: 'json:"-" or API-response DTO
excluding APIKeyHash' — went with the json:"-" + MarshalJSON
defense-in-depth pair plus CI guardrail and structural docs.
|
||
|
|
9c1d446e40 |
fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
|