From c26cef37a1439b89665300c3ce86aea20372380a Mon Sep 17 00:00:00 2001
From: shankar0123 <skreddy040@gmail.com>
Date: Sat, 2 May 2026 21:48:29 +0000
Subject: [PATCH] loadtest: capture sandbox-aggregate placeholder for API-tier
 baseline
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes Top-10 fix #2 of the 2026-05-02 deployment-target audit re-run
(see cowork/deployment-target-audit-2026-05-02-rerun/RESULTS.md).
Replaces the four TBD cells in deploy/test/loadtest/README.md ## Current
baseline with a sandbox-aggregate placeholder so the README isn't lying
about having a baseline section ready to diff against.

Numbers (both rows show the same aggregate — see footnote):
  p50=2.12 ms, p95=6.19 ms, p99=8.58 ms, error rate 0.00%
  (1002 requests, 100.15 req/s sustained, 0 failures across 10s)

Capture environment, called out explicitly in the new methodology block:
  - Linux/aarch64 unprivileged sandbox (NOT canonical hardware)
  - Postgres 14.22 native (NOT 16-alpine in compose)
  - 10s scenarios (NOT 5 minutes)
  - Both rows have the same numbers because the sandbox run did not
    emit per-scenario tagged metrics in summary.json — the threshold
    contract still expects per-scenario p95/p99 from a canonical run.

Footnote ([^1]) frames these as a sanity floor, not the per-scenario
baseline the threshold contract is written against. The follow-up
canonical capture via `gh workflow run loadtest.yml` on the
GitHub-hosted ubuntu-latest runner will replace these with real
per-scenario numbers (and will keep the canonical methodology block
that's already pinned below).

Connector-tier table (## Connector-tier captured baseline) is intentionally
left at TBD: that block explicitly anti-patterns committing numbers without
a Docker-equipped canonical run, and the sandbox can't run the four target
sidecars.

No code changes; doc-only.

Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/RESULTS.md
Top-10 fix #2.
---
 deploy/test/loadtest/README.md | 42 ++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/deploy/test/loadtest/README.md b/deploy/test/loadtest/README.md
index 222981f..3d470f2 100644
--- a/deploy/test/loadtest/README.md
+++ b/deploy/test/loadtest/README.md
@@ -99,13 +99,45 @@ diff target.
 | Scenario | p50 | p95 | p99 | Error rate |
 |---|---|---|---|---|
 | **issuance_acceptance** (threshold) | — | < 2 s | < 5 s | < 1% |
-| **issuance_acceptance** (baseline) | TBD | TBD | TBD | TBD |
+| **issuance_acceptance** (baseline)[^1] | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% |
 | **list_certificates** (threshold) | — | < 800 ms | < 2 s | < 1% |
-| **list_certificates** (baseline) | TBD | TBD | TBD | TBD |
+| **list_certificates** (baseline)[^1] | 2.12 ms | 6.19 ms | 8.58 ms | 0.00% |
 
-**Methodology pinned at baseline capture:**
-- Hardware: TBD (operator's workstation specs at capture time).
-- Postgres: 16-alpine, default config.
+[^1]: **Sandbox-aggregate placeholder** — captured at HEAD on a Linux/aarch64
+  unprivileged sandbox (no Docker, no GitHub-hosted runner). Both rows show
+  the same aggregate combined-load numbers because the sandbox run did not
+  break out per-scenario tags in `summary.json`. Treat these as a sanity
+  floor (proof the API tier handles 100 req/s combined with zero errors and
+  sub-10ms p99), **not** as the per-scenario baselines the threshold contract
+  is written against. Replace via `gh workflow run loadtest.yml` on the
+  canonical `ubuntu-latest` runner — that produces per-scenario tagged
+  metrics in `summary.json`.
+
+**Methodology of the sandbox-placeholder capture above:**
+- Hardware: Linux/aarch64 unprivileged sandbox (uid 1019, no root,
+  ~1.2 GiB free disk). NOT canonical hardware.
+- Postgres: 14.22 (Ubuntu, native binaries, unix-socket dir `/tmp/pg-sock`),
+  unix sockets only, port 55432.
+- certctl: built from HEAD via `go build -o bin/certctl-server ./cmd/server`.
+- Concurrency: 50 req/s sustained per scenario, both scenarios in parallel
+  (= 100 req/s combined).
+- Duration: **10 seconds** per scenario (NOT 5 minutes — sandbox bash-call
+  budget is bounded; canonical-hardware run uses 5 minutes).
+- TLS: ECDSA-P256 self-signed `localhost` cert at `/tmp/certctl-tls/`.
+- Auth: api-key, single Bearer token (`CERTCTL_AUTH_SECRET=load-test-token`).
+- Rate limiting: **disabled** (`CERTCTL_RATE_LIMIT_ENABLED=false`) — without
+  this, the 100 req/s combined load trips the default token-bucket and
+  drives error rate to ~40%, masking real latency.
+- Encryption: `CERTCTL_CONFIG_ENCRYPTION_KEY` set (32+ bytes).
+- Captured: 2026-05-02. Total: 1002 requests, 100.15 req/s sustained,
+  0 failures, 100% checks passed. Raw `summary.json` is not committed
+  (gitignored per the existing `results/` convention).
+
+**Methodology pinned at canonical baseline capture (replace placeholder):**
+- Hardware: GitHub-hosted `ubuntu-latest` runner (4 vCPU / 16 GiB / SSD).
+  Run via `gh workflow run loadtest.yml`; raw `summary.json` is available
+  for 90 days as a workflow artifact.
+- Postgres: 16-alpine in compose, default config.
 - certctl: image built from this repo at the commit referenced below.
 - Concurrency: 50 req/s sustained per scenario (100 req/s total).
 - Duration: 5 minutes per scenario, 5s stagger.