Sprint 5 unified-master-audit closure. Pre-fix:
- docs/operator/scale.md L163-185 held a TBD-laden table with 5
scenario rows. The Phase 8 scenarios shipped 2026-05-14; baseline
capture on canonical hardware was 'the next operational step'
that had not been taken.
- Acquirers + operators asking 'what's the scale ceiling?' got
'TBD' as the in-tree answer.
The audit's fix wanted three things:
1. Capture p50/p95/p99 + error rate + memory profile on a fixed-
spec runner.
2. Replace the scale.md TBD rows with real numbers.
3. Archive k6 artifacts under deploy/test/loadtest-artifacts/.
The actual capture is a workflow_dispatch run the operator triggers
on a real Linux runner — it can't happen from a sandbox without
Docker. What I CAN deliver in this commit is the canonical-record
infrastructure that turns the next workflow run into a baseline that
sticks:
- New docs/operator/scale-baseline-2026-Q2.md is the canonical
record. Documents the three scenarios, the methodology, the
capture procedure, and a 'Latest capture' table with
placeholder rows ready to receive the workflow_dispatch run's
numbers. The doc explicitly defends the 'ubuntu-latest runner'
choice (reproducibility > paid-AWS-account specificity).
- docs/operator/scale.md L163-185 — the TBD table — replaced with
a pointer paragraph to the new baseline file. Per the
canonical-doc-pointer pattern: the operator-posture doc changes
when scenarios change; the baseline doc changes on every
capture. Splitting them avoids review-noise on per-capture
commits.
- New deploy/test/loadtest-artifacts/ directory with a README
documenting the long-term-archive contract (the GHA artifact
retention is 90 days; numbers acquisition reviewers look at
months later need a committed home).
Operator next steps to fill the placeholders:
1. Trigger Actions → loadtest → Run workflow.
2. Download the three matrix-leg artifacts.
3. Update the baseline doc's 'Latest capture' rows.
4. Commit the raw artifacts (or git-lfs for >100 MB archives) to
deploy/test/loadtest-artifacts/.
Closes TEST-005 (infrastructure side). Numbers land on the next
canonical-runner workflow_dispatch capture.
5.9 KiB
Scale baseline — 2026 Q2 canonical-hardware capture
Last reviewed: 2026-05-16
What this file is
The canonical record of certctl's load-test baselines for the
2026-Q2 reporting window. TEST-005 closure (Sprint 5, 2026-05-16)
introduces this doc as the single source of truth for "what's the
scale ceiling?" — replacing the TBD-laden table at
docs/operator/scale.md that had been
unfilled since the scenarios shipped in Phase 8.
The numbers below come from the loadtest GitHub Actions workflow
running its three canonical scenarios on ubuntu-latest runners:
bulk-renewal— 10,000-cert seed + criteria-modePOST /api/v1/certificates/bulk-renew, 200 concurrent VUs over 10 minutes.acme-burst— 200 concurrent VUs hitting/acme/directory,/acme/new-nonce, and/acme/renewal-info/<cert-id>simultaneously.agent-storm— 5,000-agent seed + sustainedPOST /api/v1/agents/{id}/heartbeatat 167 RPS.
Thresholds enforced inline in deploy/test/loadtest/k6.js (p99 < 5s
for issuance-acceptance, p99 < 2s for list, error rate < 1%). k6 exits
non-zero on any breach, which propagates through docker compose up --exit-code-from k6 → make loadtest → workflow exit.
Capture procedure
- Trigger the workflow:
- Actions →
loadtest→ Run workflow, branchmaster. - Wait ~25 minutes for the three matrix legs to finish.
- Actions →
- Download each scenario's artifact from the workflow run page:
k6-scale-bulk-renewal-<run-id>k6-scale-acme-burst-<run-id>k6-scale-agent-storm-<run-id>- Each archive contains the k6
summary.json+ raw NDJSON points (90-day GHA retention).
- Run
scripts/scale-baseline/extract.sh <run-id>(see below) to pull the three artifacts and emit the table rows for this doc. - Paste the rows under the Latest capture section. Update
> Last reviewed:to today. - Commit the artifacts you want long-term-retained to
deploy/test/loadtest-artifacts/usinggit lfsif the archives exceed 100 MB; otherwise commit them inline.
Latest capture
| Scenario | Run ID | Date | p50 | p95 | p99 | Error rate | Peak server RSS | Notes |
|---|---|---|---|---|---|---|---|---|
| bulk-renewal | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | First post-TEST-005 capture; trigger via workflow_dispatch + extract via the procedure above. |
| acme-burst directory | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | — |
| acme-burst new-nonce | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | — |
| acme-burst renewal-info | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | — |
| agent-storm | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | capture pending | — |
The "capture pending" placeholders are deliberate — the operator
fills them after the next loadtest workflow_dispatch run. Once
filled, replace these rows; do not edit them in place across runs
(the historical row stays as evidence).
Why "ubuntu-latest" instead of RDS-shaped hardware
The audit's fix language preferred RDS-shaped Postgres on a fixed-spec runner. ubuntu-latest's 2-vCPU / 7-GB-RAM shape is narrower than typical production Postgres, but it has two virtues:
- Reproducibility. Every operator + acquirer can reproduce the numbers; an RDS-shaped Postgres requires a paid AWS account.
- Conservative ceiling. If the published numbers come from a constrained runner, real-world deployments on production Postgres sizes (db.m5.large +) only get better.
When an acquirer or operator asks for a production-equivalent baseline, capture a second run on whatever infrastructure they want to validate against and add it under a new 2026 Q3 capture section.
Methodology
Hardware
- Runner: GitHub Actions
ubuntu-latest(currently Ubuntu 24.04, 2-vCPU, 7-GB RAM). - certctl image: built from the same commit the workflow runs on.
- Postgres:
postgres:16-alpine@sha256:890480b08124ce7f79960a9bb16fe39729aa302bd384bfd7c408fee6c8f7adb7, in-cluster, default config (no operator tuning). - Network: runner localhost.
Software
- k6: version pinned in
deploy/test/loadtest/Dockerfile. - certctl tag: the v* tag at workflow trigger time (matches
openapi.yaml info.version).
Metrics captured
- p50 / p95 / p99 latency — k6's
http_req_durationpercentiles. - Error rate — k6
http_req_failedrate (non-2xx + connection errors). - Peak server RSS —
docker statspolled at 1-Hz for the duration of the run;max(memory_stats.usage)taken from the emitted JSON. - Acceptance gate — the k6 thresholds in
k6.js; if exceeded the workflow fails.
What's NOT captured
- Cold-start latency — these are steady-state baselines after the k6 warmup ramp. Cold-start is a separate concern (renewal-loop startup, scheduler tick boundary), not covered by these scenarios.
- WAN latency — runs are localhost; production-WAN-RTT additions fall outside scope.
- Federation overhead — single-instance only; HA + replicas runs are a future deliverable.
Related reading
docs/operator/scale.md— the operator-facing scale posture doc; baseline rows there point at this file.deploy/test/loadtest/README.md— scenario semantics + how to read the k6 output.deploy/test/loadtest-artifacts/— long-term archive of captured k6 results.