From c26cef37a1439b89665300c3ce86aea20372380a Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Sat, 2 May 2026 21:48:29 +0000 Subject: [PATCH] loadtest: capture sandbox-aggregate placeholder for API-tier baseline MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes Top-10 fix #2 of the 2026-05-02 deployment-target audit re-run (see cowork/deployment-target-audit-2026-05-02-rerun/RESULTS.md). Replaces the four TBD cells in deploy/test/loadtest/README.md ## Current baseline with a sandbox-aggregate placeholder so the README isn't lying about having a baseline section ready to diff against. Numbers (both rows show the same aggregate — see footnote): p50=2.12 ms, p95=6.19 ms, p99=8.58 ms, error rate 0.00% (1002 requests, 100.15 req/s sustained, 0 failures across 10s) Capture environment, called out explicitly in the new methodology block: - Linux/aarch64 unprivileged sandbox (NOT canonical hardware) - Postgres 14.22 native (NOT 16-alpine in compose) - 10s scenarios (NOT 5 minutes) - Both rows have the same numbers because the sandbox run did not emit per-scenario tagged metrics in summary.json — the threshold contract still expects per-scenario p95/p99 from a canonical run. Footnote ([^1]) frames these as a sanity floor, not the per-scenario baseline the threshold contract is written against. The follow-up canonical capture via `gh workflow run loadtest.yml` on the GitHub-hosted ubuntu-latest runner will replace these with real per-scenario numbers (and will keep the canonical methodology block that's already pinned below). Connector-tier table (## Connector-tier captured baseline) is intentionally left at TBD: that block explicitly anti-patterns committing numbers without a Docker-equipped canonical run, and the sandbox can't run the four target sidecars. No code changes; doc-only. Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/RESULTS.md Top-10 fix #2. --- deploy/test/loadtest/README.md | 42 ++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/deploy/test/loadtest/README.md b/deploy/test/loadtest/README.md index 222981f..3d470f2 100644 --- a/deploy/test/loadtest/README.md +++ b/deploy/test/loadtest/README.md @@ -99,13 +99,45 @@ diff target. | Scenario | p50 | p95 | p99 | Error rate | |---|---|---|---|---| | **issuance_acceptance** (threshold) | — | < 2 s | < 5 s | < 1% | -| **issuance_acceptance** (baseline) | TBD | TBD | TBD | TBD | +| **issuance_acceptance** (baseline)[^1] | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% | | **list_certificates** (threshold) | — | < 800 ms | < 2 s | < 1% | -| **list_certificates** (baseline) | TBD | TBD | TBD | TBD | +| **list_certificates** (baseline)[^1] | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% | -**Methodology pinned at baseline capture:** -- Hardware: TBD (operator's workstation specs at capture time). -- Postgres: 16-alpine, default config. +[^1]: **Sandbox-aggregate placeholder** — captured at HEAD on a Linux/aarch64 + unprivileged sandbox (no Docker, no GitHub-hosted runner). Both rows show + the same aggregate combined-load numbers because the sandbox run did not + break out per-scenario tags in `summary.json`. Treat these as a sanity + floor (proof the API tier handles 100 req/s combined with zero errors and + sub-10ms p99), **not** as the per-scenario baselines the threshold contract + is written against. Replace via `gh workflow run loadtest.yml` on the + canonical `ubuntu-latest` runner — that produces per-scenario tagged + metrics in `summary.json`. + +**Methodology of the sandbox-placeholder capture above:** +- Hardware: Linux/aarch64 unprivileged sandbox (uid 1019, no root, + ~1.2 GiB free disk). NOT canonical hardware. +- Postgres: 14.22 (Ubuntu, native binaries, unix-socket dir `/tmp/pg-sock`), + unix sockets only, port 55432. +- certctl: built from HEAD via `go build -o bin/certctl-server ./cmd/server`. +- Concurrency: 50 req/s sustained per scenario, both scenarios in parallel + (= 100 req/s combined). +- Duration: **10 seconds** per scenario (NOT 5 minutes — sandbox bash-call + budget is bounded; canonical-hardware run uses 5 minutes). +- TLS: ECDSA-P256 self-signed `localhost` cert at `/tmp/certctl-tls/`. +- Auth: api-key, single Bearer token (`CERTCTL_AUTH_SECRET=load-test-token`). +- Rate limiting: **disabled** (`CERTCTL_RATE_LIMIT_ENABLED=false`) — without + this, the 100 req/s combined load trips the default token-bucket and + drives error rate to ~40%, masking real latency. +- Encryption: `CERTCTL_CONFIG_ENCRYPTION_KEY` set (32+ bytes). +- Captured: 2026-05-02. Total: 1002 requests, 100.15 req/s sustained, + 0 failures, 100% checks passed. Raw `summary.json` is not committed + (gitignored per the existing `results/` convention). + +**Methodology pinned at canonical baseline capture (replace placeholder):** +- Hardware: GitHub-hosted `ubuntu-latest` runner (4 vCPU / 16 GiB / SSD). + Run via `gh workflow run loadtest.yml`; raw `summary.json` is available + for 90 days as a workflow artifact. +- Postgres: 16-alpine in compose, default config. - certctl: image built from this repo at the commit referenced below. - Concurrency: 50 req/s sustained per scenario (100 req/s total). - Duration: 5 minutes per scenario, 5s stagger.