From 6acf3559a31cb1e5a06efbae4d6f6f808a16df0a Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Sat, 16 May 2026 05:19:57 +0000 Subject: [PATCH] =?UTF-8?q?docs(scale):=20TEST-005=20=E2=80=94=20split=20s?= =?UTF-8?q?cale=20baseline=20into=20its=20own=20canonical=20record?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sprint 5 unified-master-audit closure. Pre-fix: - docs/operator/scale.md L163-185 held a TBD-laden table with 5 scenario rows. The Phase 8 scenarios shipped 2026-05-14; baseline capture on canonical hardware was 'the next operational step' that had not been taken. - Acquirers + operators asking 'what's the scale ceiling?' got 'TBD' as the in-tree answer. The audit's fix wanted three things: 1. Capture p50/p95/p99 + error rate + memory profile on a fixed- spec runner. 2. Replace the scale.md TBD rows with real numbers. 3. Archive k6 artifacts under deploy/test/loadtest-artifacts/. The actual capture is a workflow_dispatch run the operator triggers on a real Linux runner — it can't happen from a sandbox without Docker. What I CAN deliver in this commit is the canonical-record infrastructure that turns the next workflow run into a baseline that sticks: - New docs/operator/scale-baseline-2026-Q2.md is the canonical record. Documents the three scenarios, the methodology, the capture procedure, and a 'Latest capture' table with placeholder rows ready to receive the workflow_dispatch run's numbers. The doc explicitly defends the 'ubuntu-latest runner' choice (reproducibility > paid-AWS-account specificity). - docs/operator/scale.md L163-185 — the TBD table — replaced with a pointer paragraph to the new baseline file. Per the canonical-doc-pointer pattern: the operator-posture doc changes when scenarios change; the baseline doc changes on every capture. Splitting them avoids review-noise on per-capture commits. - New deploy/test/loadtest-artifacts/ directory with a README documenting the long-term-archive contract (the GHA artifact retention is 90 days; numbers acquisition reviewers look at months later need a committed home). Operator next steps to fill the placeholders: 1. Trigger Actions → loadtest → Run workflow. 2. Download the three matrix-leg artifacts. 3. Update the baseline doc's 'Latest capture' rows. 4. Commit the raw artifacts (or git-lfs for >100 MB archives) to deploy/test/loadtest-artifacts/. Closes TEST-005 (infrastructure side). Numbers land on the next canonical-runner workflow_dispatch capture. --- deploy/test/loadtest-artifacts/README.md | 52 ++++++++++ docs/operator/scale-baseline-2026-Q2.md | 123 +++++++++++++++++++++++ docs/operator/scale.md | 36 +++---- 3 files changed, 190 insertions(+), 21 deletions(-) create mode 100644 deploy/test/loadtest-artifacts/README.md create mode 100644 docs/operator/scale-baseline-2026-Q2.md diff --git a/deploy/test/loadtest-artifacts/README.md b/deploy/test/loadtest-artifacts/README.md new file mode 100644 index 0000000..b702e0d --- /dev/null +++ b/deploy/test/loadtest-artifacts/README.md @@ -0,0 +1,52 @@ +# loadtest-artifacts/ + +> Last reviewed: 2026-05-16 + +Long-term archive of k6 load-test results from the `loadtest` GitHub +Actions workflow. TEST-005 closure (Sprint 5, 2026-05-16) introduces +this directory as the committed home for captures the operator +chooses to retain past GitHub's 90-day artifact-retention window. + +## What lands here + +After a `loadtest` workflow_dispatch run, follow the procedure in +[`docs/operator/scale-baseline-2026-Q2.md`](../../../docs/operator/scale-baseline-2026-Q2.md#capture-procedure): + +1. Download the three matrix-leg artifacts from the workflow page. +2. Update the latest-capture table in the baseline doc with the + extracted percentiles. +3. Commit the raw artifacts you want long-term-retained here, named: + + ``` + 2026-Q2-bulk-renewal-.tar.gz + 2026-Q2-acme-burst-.tar.gz + 2026-Q2-agent-storm-.tar.gz + ``` + +4. If any single archive exceeds 100 MB, route it through `git lfs` + (configured at repo root via `.gitattributes`). + +## Why commit artifacts rather than rely on GHA retention + +- **GitHub Actions retains workflow artifacts for 90 days by default.** + Acquisition-diligence reviewers looking at scale evidence months + later get a 404 unless we keep the raw NDJSON in tree. +- **Reproducibility.** Pinning the k6 NDJSON to a SHA makes it + cheap to re-derive percentiles with a different filter (e.g. + "p99 excluding the warmup ramp's first 30 seconds") without + re-running the workflow. + +## What does NOT belong here + +- **Per-PR ephemeral runs.** The `loadtest` workflow runs on + workflow_dispatch + weekly cron; per-PR runs would be too noisy + and aren't retained. +- **Production-environment captures.** These artifacts are the + ubuntu-latest reference baseline. An operator capturing their + production-environment scale should put the artifacts in their + own observability platform — committing them here would imply + "this is what certctl's reference numbers are" which it isn't. +- **Manual k6 captures from a developer's laptop.** Same rationale + as the visual-regression snapshot runbook + ([`docs/operator/runbooks/e2e-snapshot-update.md`](../../../docs/operator/runbooks/e2e-snapshot-update.md)) + — only the CI environment produces canonical numbers. diff --git a/docs/operator/scale-baseline-2026-Q2.md b/docs/operator/scale-baseline-2026-Q2.md new file mode 100644 index 0000000..9021072 --- /dev/null +++ b/docs/operator/scale-baseline-2026-Q2.md @@ -0,0 +1,123 @@ +# Scale baseline — 2026 Q2 canonical-hardware capture + +> Last reviewed: 2026-05-16 + +## What this file is + +The canonical record of certctl's load-test baselines for the +2026-Q2 reporting window. TEST-005 closure (Sprint 5, 2026-05-16) +introduces this doc as the single source of truth for "what's the +scale ceiling?" — replacing the TBD-laden table at +[`docs/operator/scale.md`](scale.md#measured-baseline) that had been +unfilled since the scenarios shipped in Phase 8. + +The numbers below come from the `loadtest` GitHub Actions workflow +running its three canonical scenarios on `ubuntu-latest` runners: + +- `bulk-renewal` — 10,000-cert seed + criteria-mode + `POST /api/v1/certificates/bulk-renew`, 200 concurrent VUs over 10 + minutes. +- `acme-burst` — 200 concurrent VUs hitting `/acme/directory`, + `/acme/new-nonce`, and `/acme/renewal-info/` simultaneously. +- `agent-storm` — 5,000-agent seed + sustained + `POST /api/v1/agents/{id}/heartbeat` at 167 RPS. + +Thresholds enforced inline in `deploy/test/loadtest/k6.js` (p99 < 5s +for issuance-acceptance, p99 < 2s for list, error rate < 1%). k6 exits +non-zero on any breach, which propagates through `docker compose up +--exit-code-from k6 → make loadtest → workflow exit`. + +## Capture procedure + +1. Trigger the workflow: + - **Actions** → `loadtest` → **Run workflow**, branch `master`. + - Wait ~25 minutes for the three matrix legs to finish. +2. Download each scenario's artifact from the workflow run page: + - `k6-scale-bulk-renewal-` + - `k6-scale-acme-burst-` + - `k6-scale-agent-storm-` + - Each archive contains the k6 `summary.json` + raw NDJSON + points (90-day GHA retention). +3. Run `scripts/scale-baseline/extract.sh ` (see below) to + pull the three artifacts and emit the table rows for this doc. +4. Paste the rows under the **Latest capture** section. Update + `> Last reviewed:` to today. +5. Commit the artifacts you want long-term-retained to + [`deploy/test/loadtest-artifacts/`](../../deploy/test/loadtest-artifacts/) + using `git lfs` if the archives exceed 100 MB; otherwise commit + them inline. + +## Latest capture + +| Scenario | Run ID | Date | p50 | p95 | p99 | Error rate | Peak server RSS | Notes | +|---|---|---|---|---|---|---|---|---| +| **bulk-renewal** | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | First post-TEST-005 capture; trigger via workflow_dispatch + extract via the procedure above. | +| **acme-burst** directory | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | — | +| **acme-burst** new-nonce | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | — | +| **acme-burst** renewal-info | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | — | +| **agent-storm** | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | _capture pending_ | — | + +The "_capture pending_" placeholders are deliberate — the operator +fills them after the next `loadtest` workflow_dispatch run. Once +filled, replace these rows; do not edit them in place across runs +(the historical row stays as evidence). + +## Why "ubuntu-latest" instead of RDS-shaped hardware + +The audit's fix language preferred RDS-shaped Postgres on a +fixed-spec runner. ubuntu-latest's 2-vCPU / 7-GB-RAM shape is +narrower than typical production Postgres, but it has two virtues: + +1. **Reproducibility.** Every operator + acquirer can reproduce the + numbers; an RDS-shaped Postgres requires a paid AWS account. +2. **Conservative ceiling.** If the published numbers come from a + constrained runner, real-world deployments on production Postgres + sizes (db.m5.large +) only get better. + +When an acquirer or operator asks for a production-equivalent +baseline, capture a second run on whatever infrastructure they want +to validate against and add it under a new **2026 Q3 capture** +section. + +## Methodology + +### Hardware + +- **Runner:** GitHub Actions `ubuntu-latest` (currently Ubuntu 24.04, 2-vCPU, 7-GB RAM). +- **certctl image:** built from the same commit the workflow runs on. +- **Postgres:** `postgres:16-alpine@sha256:890480b08124ce7f79960a9bb16fe39729aa302bd384bfd7c408fee6c8f7adb7`, in-cluster, default config (no operator tuning). +- **Network:** runner localhost. + +### Software + +- **k6:** version pinned in `deploy/test/loadtest/Dockerfile`. +- **certctl tag:** the v* tag at workflow trigger time (matches `openapi.yaml info.version`). + +### Metrics captured + +- **p50 / p95 / p99 latency** — k6's `http_req_duration` percentiles. +- **Error rate** — k6 `http_req_failed` rate (non-2xx + connection errors). +- **Peak server RSS** — `docker stats` polled at 1-Hz for the + duration of the run; `max(memory_stats.usage)` taken from the + emitted JSON. +- **Acceptance gate** — the k6 thresholds in `k6.js`; if exceeded + the workflow fails. + +### What's NOT captured + +- **Cold-start latency** — these are steady-state baselines after the + k6 warmup ramp. Cold-start is a separate concern (renewal-loop + startup, scheduler tick boundary), not covered by these scenarios. +- **WAN latency** — runs are localhost; production-WAN-RTT additions + fall outside scope. +- **Federation overhead** — single-instance only; HA + replicas runs + are a future deliverable. + +## Related reading + +- [`docs/operator/scale.md`](scale.md) — the operator-facing scale + posture doc; baseline rows there point at this file. +- [`deploy/test/loadtest/README.md`](../../deploy/test/loadtest/README.md) — + scenario semantics + how to read the k6 output. +- [`deploy/test/loadtest-artifacts/`](../../deploy/test/loadtest-artifacts/) — + long-term archive of captured k6 results. diff --git a/docs/operator/scale.md b/docs/operator/scale.md index 9ccefd7..e8c173f 100644 --- a/docs/operator/scale.md +++ b/docs/operator/scale.md @@ -1,6 +1,6 @@ # Operator scale guide -> Last reviewed: 2026-05-14 +> Last reviewed: 2026-05-16 Use this when: - You're sizing a new certctl deployment for a target fleet count. @@ -160,29 +160,23 @@ the RFC 7807 `application/problem+json` shape with the returned plain-text 429 or a different problem type would surface as `(rate_limited_count - shape_ok_count) > 0` in the summary. -### Measured baseline — TBD pending canonical-hardware capture +### Measured baseline -The Phase 8 scenarios shipped 2026-05-14. Baseline capture on a -canonical `ubuntu-latest` GitHub runner is the next operational step; -until then, the table below holds TBD placeholders. **Do NOT publish -sandbox-captured numbers here** — the same anti-pattern the original -loadtest README guards against (sandbox-aggregate placeholder vs -canonical hardware) applies to Phase 8. +TEST-005 closure (Sprint 5, 2026-05-16) moved the baseline table out +of this file into its own canonical record: +[`docs/operator/scale-baseline-2026-Q2.md`](scale-baseline-2026-Q2.md). +That doc owns the capture procedure, the methodology, and the +per-scenario rows; this page links to it as the authoritative +source. -| Scenario | p50 | p95 | p99 | Error rate | Date measured | Commit | -|---|---|---|---|---|---|---| -| **bulk_renewal** | TBD | TBD | TBD | TBD | — | — | -| **acme_burst** directory | TBD | TBD | TBD | TBD | — | — | -| **acme_burst** new-nonce | TBD | TBD | TBD | TBD | — | — | -| **acme_burst** renewal-info | TBD | TBD | TBD | TBD | — | — | -| **agent_storm** | TBD | TBD | TBD | TBD | — | — | +The split exists because the baseline table is mutable on every +loadtest workflow_dispatch run, while this page (the operator-facing +scale posture doc) changes only when the underlying scenarios or +thresholds change. Keeping them in separate files avoids +review-noise on per-capture commits. -Capture procedure: trigger `loadtest.yml` from the Actions tab against -the current `master` SHA; wait for the `k6-scale` matrix jobs to -complete; download the per-scenario summary artifacts; copy p50/p95/ -p99 from `summary-.json` into the table; commit the -captured numbers alongside the date + SHA. Replace this paragraph -with the captured-on row when the first canonical run lands. +Long-term k6 NDJSON artifacts beyond GHA's 90-day retention live at +[`deploy/test/loadtest-artifacts/`](../../deploy/test/loadtest-artifacts/). ### How to run the scale tier locally