Two unrelated CI failures from run #25305811340; fixed in one
commit since neither needs the other to land first.
CodeQL alert #32 (go/log-injection at middleware.go:68) reopened
after edb71fb. The previous fix introduced a scrubLogValue helper
backed by strings.NewReplacer; CodeQL's taint tracker only
recognizes the literal strings.ReplaceAll pattern as a sanitizer
(matches the OWASP example in the rule docs). Wrapper helpers and
NewReplacer don't trigger the recognition, so the analyzer kept
flagging.
Fix: drop the helper. Inline strings.ReplaceAll chains directly at
the call site for r.Method and r.URL.Path. Same runtime semantics
(strip CR/LF/NUL); CodeQL pattern-matches the literal call so the
alert can finally close.
Loadtest CI failure (run #25305811340 'k6 throughput run' job at
make loadtest):
ERROR: failed to compute cache key: failed to calculate checksum
of ref ...: "/deploy/test/f5-mock-icontrol": not found
The f5-mock-icontrol Dockerfile has `COPY deploy/test/f5-mock-icontrol/
./` which assumes the build context is the repo root. The
docker-compose.test.yml f5-mock-icontrol service correctly uses the
long-form build:
build:
context: .. # = repo root from deploy/docker-compose.test.yml
dockerfile: deploy/test/f5-mock-icontrol/Dockerfile
The loadtest compose at deploy/test/loadtest/docker-compose.yml
used the shorthand:
build: ../f5-mock-icontrol
That sets context = the f5-mock-icontrol directory itself, breaking
the Dockerfile's COPY (it tries to find the directory inside itself).
Fix: change the loadtest compose to the long-form pattern matching
docker-compose.test.yml, with context: ../../.. (= repo root from
deploy/test/loadtest/) and explicit dockerfile path.
Verified locally:
gofmt: clean.
go vet ./internal/api/middleware/...: exit 0.
go test -short -count=1 ./internal/api/middleware/...: ok 0.253s.
python3 -c 'import yaml; yaml.safe_load(...)' on the compose
file: parses clean.
grep -rnE 'scrubLogValue' internal/api/: zero references (helper
fully dropped).
References:
https://github.com/certctl-io/certctl/security/code-scanning/32
CI run https://github.com/certctl-io/certctl/actions/runs/25305811340
Closes CodeQL #32 + restores loadtest CI.
certctl Load-Test Harness
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit (cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
Pre-fix, certctl had zero benchmarks or load tests for any API path; an
acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day
rotation" had nothing to point at. This harness is the substantiation.
What it measures
A k6 driver hits two scenarios in parallel for 5 minutes at a fixed 50 req/s:
POST /api/v1/certificates— the issuance-acceptance hot path. Exercises auth, JSON decode, validation,service.CreateCertificate, and themanaged_certificatesinsert. This is the operator-facing request-acceptance throughput an automation client (Terraform, Crossplane, GitOps controller) would generate.GET /api/v1/certificates?per_page=50— the most-trafficked read endpoint. Exercises pagination + filtering on the cert list query.
Latency is reported as avg / min / med / p95 / p99 / max. The error
floor is < 1% (any 4xx/5xx counts as failed).
What it explicitly does NOT measure
- Issuer connector latency. Connector calls (DigiCert, ACME, Vault,
AWS ACM PCA, etc.) happen asynchronously via the renewal scheduler.
Their latency is pinned by the
certctl_issuance_duration_seconds{issuer_type=...}Prometheus histogram (audit fix #4). Driving them through k6 would load-test someone else's API, which is wrong. - Full ACME enrollment flow. The audit prompt mentioned ACME-via- pebble; sustained 100/s through a multi-RTT order/challenge/finalize flow requires pebble tuning + crypto helpers k6 doesn't ship out of the box. Deferred to a follow-up.
- Bulk-revoke / bulk-renew. Those are admin endpoints with their own throughput characteristics and warrant a separate scenario.
- Scheduler concurrency under bulk renewal. That's audit fix #9's scope; the harness here measures the API tier, not the scheduler.
Threshold contract
Any future change that breaches one of these fails the test:
| Scenario | p95 | p99 | Error rate |
|---|---|---|---|
issuance_acceptance |
< 2 s | < 5 s | n/a |
list_certificates |
< 800 ms | < 2 s | n/a |
| All requests | n/a | n/a | < 1% |
These are the regression guards, not the SLO. The SLO is whatever the operator chooses based on the baseline below.
How to run
From the repo root:
make loadtest
This:
- Builds the certctl image from the repo root
Dockerfile. - Spins up postgres, the tls-init bootstrap, certctl-server (with
CERTCTL_DEMO_SEED=trueso the FK rows the script needs exist), and the k6 driver. - Runs the k6 script for ~5 minutes 5 seconds (5s stagger between scenarios + 5m duration).
- Prints the summary text to stdout.
- Exits non-zero if any threshold was breached.
The full machine-readable summary lands at
deploy/test/loadtest/results/summary.json (gitignored). The
human-readable summary lands at results/summary.txt.
To run against a server already booted on the host (skip the compose spin-up):
docker run --rm \
-e CERTCTL_BASE=https://localhost:8443 \
-e CERTCTL_TOKEN=load-test-token \
-e K6_INSECURE_SKIP_TLS_VERIFY=true \
-v "$(pwd)/deploy/test/loadtest/k6.js:/scripts/k6.js:ro" \
-v "$(pwd)/deploy/test/loadtest/results:/results" \
--network host \
grafana/k6:0.54.0 run /scripts/k6.js
Current baseline
The first operator run captures real numbers and commits them into
this section. Pre-baseline this section reads "TBD — operator captures
on first make loadtest run." The numbers below are the agreed
minimum-acceptable thresholds, not the captured baseline; once captured,
the baseline goes here as a separate row so future regressions have a
diff target.
| Scenario | p50 | p95 | p99 | Error rate |
|---|---|---|---|---|
| issuance_acceptance (threshold) | — | < 2 s | < 5 s | < 1% |
| issuance_acceptance (baseline)1 | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% |
| list_certificates (threshold) | — | < 800 ms | < 2 s | < 1% |
| list_certificates (baseline)1 | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% |
Methodology of the sandbox-placeholder capture above:
- Hardware: Linux/aarch64 unprivileged sandbox (uid 1019, no root, ~1.2 GiB free disk). NOT canonical hardware.
- Postgres: 14.22 (Ubuntu, native binaries, unix-socket dir
/tmp/pg-sock), unix sockets only, port 55432. - certctl: built from HEAD via
go build -o bin/certctl-server ./cmd/server. - Concurrency: 50 req/s sustained per scenario, both scenarios in parallel (= 100 req/s combined).
- Duration: 10 seconds per scenario (NOT 5 minutes — sandbox bash-call budget is bounded; canonical-hardware run uses 5 minutes).
- TLS: ECDSA-P256 self-signed
localhostcert at/tmp/certctl-tls/. - Auth: api-key, single Bearer token (
CERTCTL_AUTH_SECRET=load-test-token). - Rate limiting: disabled (
CERTCTL_RATE_LIMIT_ENABLED=false) — without this, the 100 req/s combined load trips the default token-bucket and drives error rate to ~40%, masking real latency. - Encryption:
CERTCTL_CONFIG_ENCRYPTION_KEYset (32+ bytes). - Captured: 2026-05-02. Total: 1002 requests, 100.15 req/s sustained,
0 failures, 100% checks passed. Raw
summary.jsonis not committed (gitignored per the existingresults/convention).
Methodology pinned at canonical baseline capture (replace placeholder):
- Hardware: GitHub-hosted
ubuntu-latestrunner (4 vCPU / 16 GiB / SSD). Run viagh workflow run loadtest.yml; rawsummary.jsonis available for 90 days as a workflow artifact. - Postgres: 16-alpine in compose, default config.
- certctl: image built from this repo at the commit referenced below.
- Concurrency: 50 req/s sustained per scenario (100 req/s total).
- Duration: 5 minutes per scenario, 5s stagger.
- Auth: api-key (Bearer token, single key).
- Encryption:
CERTCTL_CONFIG_ENCRYPTION_KEYset (32+ bytes).
To recapture the baseline after a tuning commit:
make loadtest
# Inspect deploy/test/loadtest/results/summary.txt for the new numbers.
# Update the table above + the methodology line, commit alongside the
# tuning commit.
Interpreting a regression
If a future PR's make loadtest run pushes p99 above the threshold,
the make target exits non-zero and CI fails. The summary.txt prints
which threshold breached. Triage:
- Look at the per-scenario
http_req_durationp95 + p99 insummary.json. If only one scenario regressed, the change is localized to that endpoint's hot path. - Look at the
iteration_durationper scenario — if total iteration time grew buthttp_req_durationis flat, the latency is in k6 client setup (rare; suggests something changed in the script). - Compare against the committed baseline. If p99 was 800 ms at baseline and is now 1.5 s but still under the 5 s threshold, the change is below the regression guard but still meaningful — flag in the PR description.
The harness deliberately does NOT auto-tune. Tuning is informed by the data; tuning commits land separately, each with their own captured baseline update.
CI cadence
Defined in .github/workflows/loadtest.yml:
workflow_dispatch— manual trigger from the Actions tab. Used before tagging a release or after a meaningful tuning commit.- Weekly cron — Mondays at 06:00 UTC. Catches gradual regressions from cumulative changes that no single PR triggered.
The workflow does not run per-push. Load tests are minutes long
and would not provide useful per-PR signal; per-push pressure goes
through make verify (which is fast) and the deploy-vendor-e2e job.
Connector-tier baseline (Bundle 10 of the 2026-05-02 deployment-target audit)
Bundle 10 extended the harness to cover per-target-type handshake throughput
in addition to the API-tier issuance/list throughput documented above. The
docker-compose stack now boots four target sidecars (nginx, apache, haproxy,
f5-mock) each serving a starter cert from a shared target-tls-init
container, and k6 runs four additional scenarios — nginx_handshake,
apache_handshake, haproxy_handshake, f5_handshake — at sustained
100 conns/min for 5 minutes against each.
What the connector tier measures
End-to-end TCP connect + TLS handshake + tiny HTTP request/response latency
per target type, tagged via the k6 target_type label so summary.json's
connector_tier section breaks the numbers out per sidecar:
{
"connector_tier": {
"nginx": { "p50": ..., "p95": ..., "p99": ..., "error_rate": ..., "iterations": ... },
"apache": { ... },
"haproxy": { ... },
"f5": { ... }
}
}
This validates the target sidecar daemons are operational under sustained connection load. Procurement asks "can certctl's nginx target handle 5,000 endpoints at 47-day rotation?" — the connector code's correctness is pinned by per-connector unit tests; the underlying daemon's connection-rate ceiling is what these scenarios pin.
What the connector tier explicitly does NOT measure (v1)
- The full agent-driven deploy hot path. v1 measures handshake throughput against the sidecars directly. v2 of the harness is a follow-up that POSTs cert requests bound to per-target-type targets, polls the deployments endpoint until the agent reports complete, and measures the full POST → poll → cert-served loop. v2 needs the agent registration + target-binding API surface plumbed end-to-end in the loadtest stack — meaningful work, but not a blocker for the connection- rate procurement question.
- Kubernetes connector. kind-in-docker requires
privileged: trueand is operationally fragile in CI. Deferred until Bundle 2 (realk8s.io/client-go) lands and a CI-friendly envtest harness is wired. - Real F5 BIG-IP. The harness uses the in-tree
f5-mock-icontrolGo server (already used by the deploy-vendor-e2e CI job). Real F5 appliance benchmarking is out of scope; operators with a real F5 vagrant box perdocs/connector-f5.mdcan substitute it manually.
Threshold contract
Defined in k6.js's thresholds block. Any change pushing past these
fails the test:
| Target type | p95 | p99 | Error rate |
|---|---|---|---|
nginx |
< 1 s | < 3 s | < 1% (global) |
apache |
< 1 s | < 3 s | < 1% (global) |
haproxy |
< 1 s | < 3 s | < 1% (global) |
f5 |
< 1.5 s | < 5 s | < 1% (global) |
f5-mock's threshold is looser because the iControl REST handler does slightly more work per request (login+upload+install dance the F5 connector itself drives — not exercised here, but the daemon's request handler is heavier).
Connector-tier captured baseline
| Target type | p50 | p95 | p99 | Error rate | Iterations |
|---|---|---|---|---|---|
| nginx (threshold) | — | < 1 s | < 3 s | < 1% | n/a |
| nginx (baseline) | TBD | TBD | TBD | TBD | TBD |
| apache (threshold) | — | < 1 s | < 3 s | < 1% | n/a |
| apache (baseline) | TBD | TBD | TBD | TBD | TBD |
| haproxy (threshold) | — | < 1 s | < 3 s | < 1% | n/a |
| haproxy (baseline) | TBD | TBD | TBD | TBD | TBD |
| f5 (threshold) | — | < 1.5 s | < 5 s | < 1% | n/a |
| f5 (baseline) | TBD | TBD | TBD | TBD | TBD |
The em-dash placeholders are deliberate: do not commit numeric values
without running the loadtest on canonical hardware first. Numbers from a
developer laptop are misleading. The first gh workflow run loadtest.yml
on a clean GitHub runner captures the baseline; commit the captured numbers
into the table above as a follow-up commit alongside the methodology line.
Methodology pinned at baseline capture (canonical hardware):
- Hardware: GitHub-hosted
ubuntu-latestrunners (currently 4 vCPU / 16 GiB / SSD-backed). Operator captures fromgh workflow run loadtest.ymlto keep the hardware constant across runs. - Sidecar images: nginx:1.27-alpine, httpd:2.4-alpine, haproxy:2.9-alpine,
in-tree f5-mock-icontrol (built from
deploy/test/f5-mock-icontrol/). - Concurrency: 100 conns/min sustained per target type (400 conns/min total across the four target scenarios + 100 req/s on the API tier).
- Duration: 5 minutes per scenario, 10s stagger between API tier and connector tier so warmup overlap doesn't skew the first 30 seconds.
- TLS: starter cert from
target-tls-init(ECDSA P-256, multi-SAN). The loadtest scenarios connect withK6_INSECURE_SKIP_TLS_VERIFY=true.
To recapture the connector-tier baseline after a tuning commit affecting target sidecars or the connector code:
make loadtest
# Inspect deploy/test/loadtest/results/summary.json for the
# connector_tier object and update the table above.
Files in this directory
deploy/test/loadtest/
├── README.md (this file)
├── docker-compose.yml
├── k6.js (the load script)
├── certs/ (gitignored — tls-init writes here)
├── fixtures/ (Bundle 10: target sidecar configs + shared starter cert)
│ ├── nginx.conf
│ ├── httpd.conf
│ ├── haproxy.cfg
│ └── target-certs/ (gitignored — target-tls-init writes here)
└── results/ (gitignored — k6 writes summary.{json,txt} here)
ACME flows (Phase 5)
The deploy/test/loadtest/k6/acme_flow.js scenario hammers the
unauthenticated ACME surface (directory + new-nonce + ARI synthetic
lookups) at constant 100 VUs for 5 minutes. JWS-signed paths
(new-account / new-order / finalize) are intentionally out of scope:
k6 doesn't ship JWS, and bundling lego inside k6 would obscure the
underlying-server p95 we're trying to measure. Instead, the
make acme-rfc-conformance-test target drives lego against the same
stack for the full happy-path conformance gate.
Run it:
cd deploy/test/loadtest
docker compose up -d certctl postgres
k6 run --env CERTCTL_ACME_DIRECTORY=https://localhost:8443/acme/profile/prof-test/directory \
k6/acme_flow.js
Baseline (ACME flows, 100 VUs × 5m)
The baseline is operator-captured on a workstation-class machine with a single certctl-server container + a single postgres container. Re-capture after schema migrations or transport changes; commit the new numbers so regressions are visible in code review.
| Metric | Threshold | Last captured | Notes |
|---|---|---|---|
directory_duration p95 |
< 500 ms | operator | Unauth GET; cache-friendly. |
new_nonce_duration p95 |
< 300 ms | operator | Single Postgres INSERT under the hood. |
renewal_info_duration p95 (synthetic id) |
< 800 ms | operator | Synthetic cert-id → 4xx fast path. |
http_req_failed rate |
< 1% | operator | Should be ~0 — failures here mean transport issues. |
Capture command: make loadtest after pointing the compose stack at
the ACME flow scenario. Operators with kind / cert-manager available
should pair this with make acme-cert-manager-test for end-to-end
verification.
Audit references
- API tier:
cowork/issuer-coverage-audit-2026-05-01/RESULTS.mdfix #8. - Connector tier:
cowork/deployment-target-audit-2026-05-02/RESULTS.mdBundle 10. - ACME flows: Phase 5 master prompt (
cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md).
-
Sandbox-aggregate placeholder — captured at HEAD on a Linux/aarch64 unprivileged sandbox (no Docker, no GitHub-hosted runner). Both rows show the same aggregate combined-load numbers because the sandbox run did not break out per-scenario tags in
summary.json. Treat these as a sanity floor (proof the API tier handles 100 req/s combined with zero errors and sub-10ms p99), not as the per-scenario baselines the threshold contract is written against. Replace viagh workflow run loadtest.ymlon the canonicalubuntu-latestrunner — that produces per-scenario tagged metrics insummary.json. ↩︎