loadtest: per-connector deploy throughput scenarios + target sidecars + README baseline section

Closes Bundle 10 of the 2026-05-02 deployment-target coverage audit (see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix, deploy/test/loadtest/k6.js drove only the API-tier throughput path (POST /api/v1/certificates + GET /api/v1/certificates) — the operator- facing rate at which an automation client can submit cert requests. The deploy hot path (cert deployed to a target — connector-tier latency) had no benchmarks. Procurement asks "can certctl handle our 5,000-NGINX fleet at 47-day rotation?" and the answer should be a number with methodology, not a claim. This commit ships v1 of the connector-tier loadtest harness: 1. Target-side sidecars added to docker-compose.yml: nginx-target, apache-target, haproxy-target, f5-mock-target. Each daemon serves a starter cert (ECDSA P-256, multi-SAN) written into a shared ./fixtures/target-certs/ volume by a new target-tls-init container. f5-mock-target re-uses the in-tree deploy/test/f5-mock-icontrol/ image (already used by the deploy- vendor-e2e CI job) and generates its own self-signed cert via tls.go::selfSignedCert at startup. 2. Fixture configs committed under deploy/test/loadtest/fixtures/: - nginx.conf — minimal HTTPS server, single 200 OK location. - httpd.conf — self-contained Apache config with the minimum module set + SSL vhost. - haproxy.cfg — minimal SSL-terminating frontend backed by a static "ok" backend. 3. k6 scenarios added (4 new): nginx_handshake, apache_handshake, haproxy_handshake, f5_handshake. Each runs constant-arrival-rate at 100 conns/min for 5 minutes. Latency captured by k6's http_req_duration metric covers TCP connect + TLS handshake + tiny HTTP request/response — that's the end-to-end "connection readiness" latency a deploy connector cares about. 4. summary.json gains a connector_tier object with per-target p50/p95/p99/max/avg/error_rate/iterations breakdowns. Operators tracking a connector regression diff connector_tier.<type> between runs. Implementation: a new enrichWithConnectorTier helper that reads data.metrics keyed by target_type tag and shallow-merges the breakdown into the summary before serialisation. 5. Threshold contract per target type: - nginx/apache/haproxy: p99 < 3s, p95 < 1s. - f5-mock: p99 < 5s, p95 < 1.5s (iControl REST handler does slightly more work per request than pure TLS termination). - All scenarios: error rate < 1% (k6 default; any 4xx/5xx counts as failed). Any change pushing past these fails the workflow. 6. README documents the methodology + the baseline-number table for the connector tier. Numeric values are em-dash placeholders pending the first clean canonical-hardware run; the accompanying commit message in that follow-up captures the methodology line alongside the numbers. Out-of-scope is documented explicitly: - Full agent-driven deploy poll loop (POST cert with target binding → poll deployments endpoint → verify served cert). v2 of the harness — needs the agent registration + target- binding API surface plumbed end-to-end in the loadtest stack. - Kubernetes target via kind-in-docker. kind requires `privileged: true` and is operationally fragile in CI; deferred until Bundle 2 (real k8s.io/client-go) lands and a CI-friendly envtest harness is wired. - Real F5 BIG-IP. CI uses the in-tree f5-mock; real-appliance benchmarking is out of scope. 7. CI workflow .github/workflows/loadtest.yml timeout-minutes bumped from 15 to 25. The harness now boots four additional target sidecars before the k6 run; their healthchecks add ~30-60s. The k6 scenarios themselves are still 5 minutes (run in parallel, not serially). 25 minutes absorbs that plus slow CI runners and cold image caches without letting a stuck container consume the runner indefinitely. Trigger remains workflow_dispatch + cron — sustained 25-minute runs are too slow for per-PR signal. What this connector tier explicitly does NOT measure (documented in the k6.js header + README): - The agent-driven full deploy hot path (v2 follow-up). - K8s target (Bundle 2 dependency). - Real F5 appliance. - Issuer-side throughput (handled by issuer-coverage-audit fix #8). Verified locally: - python3 -c "import yaml; yaml.safe_load(...)" on docker-compose.yml and .github/workflows/loadtest.yml — clean. - node -c on k6.js — clean syntax. - gofmt / go vet on the rest of the tree (no Go diff in this commit). - Manual smoke against docker-compose pending — operator validates on the canonical-hardware first run; if any fixture config is off, fix-up commit lands separately so the methodology change and the numeric baseline have independent reviewability. No Go code changes; this is a loadtest-harness-only commit. Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
2026-06-07 14:01:36 +00:00 · 2026-05-02 19:28:45 +00:00
parent 08a86d355d
commit e292faafc6
8 changed files with 677 additions and 47 deletions
@@ -155,6 +155,116 @@ The workflow does **not** run per-push. Load tests are minutes long
 and would not provide useful per-PR signal; per-push pressure goes
 through `make verify` (which is fast) and the deploy-vendor-e2e job.

+## Connector-tier baseline (Bundle 10 of the 2026-05-02 deployment-target audit)
+
+Bundle 10 extended the harness to cover per-target-type handshake throughput
+in addition to the API-tier issuance/list throughput documented above. The
+docker-compose stack now boots four target sidecars (nginx, apache, haproxy,
+f5-mock) each serving a starter cert from a shared `target-tls-init`
+container, and k6 runs four additional scenarios — `nginx_handshake`,
+`apache_handshake`, `haproxy_handshake`, `f5_handshake` — at sustained
+100 conns/min for 5 minutes against each.
+
+### What the connector tier measures
+
+End-to-end TCP connect + TLS handshake + tiny HTTP request/response latency
+per target type, tagged via the k6 `target_type` label so summary.json's
+`connector_tier` section breaks the numbers out per sidecar:
+
+```json
+{
+  "connector_tier": {
+    "nginx":   { "p50": ..., "p95": ..., "p99": ..., "error_rate": ..., "iterations": ... },
+    "apache":  { ... },
+    "haproxy": { ... },
+    "f5":      { ... }
+  }
+}
+```
+
+This validates the target sidecar daemons are operational under sustained
+connection load. Procurement asks "can certctl's nginx target handle 5,000
+endpoints at 47-day rotation?" — the connector code's correctness is pinned
+by per-connector unit tests; **the underlying daemon's connection-rate
+ceiling is what these scenarios pin**.
+
+### What the connector tier explicitly does NOT measure (v1)
+
+- **The full agent-driven deploy hot path.** v1 measures handshake
+  throughput against the sidecars directly. v2 of the harness is a
+  follow-up that POSTs cert requests bound to per-target-type targets,
+  polls the deployments endpoint until the agent reports complete, and
+  measures the full POST → poll → cert-served loop. v2 needs the agent
+  registration + target-binding API surface plumbed end-to-end in the
+  loadtest stack — meaningful work, but not a blocker for the connection-
+  rate procurement question.
+- **Kubernetes connector.** kind-in-docker requires `privileged: true`
+  and is operationally fragile in CI. Deferred until Bundle 2 (real
+  `k8s.io/client-go`) lands and a CI-friendly envtest harness is wired.
+- **Real F5 BIG-IP.** The harness uses the in-tree `f5-mock-icontrol`
+  Go server (already used by the deploy-vendor-e2e CI job). Real F5
+  appliance benchmarking is out of scope; operators with a real F5
+  vagrant box per `docs/connector-f5.md` can substitute it manually.
+
+### Threshold contract
+
+Defined in `k6.js`'s `thresholds` block. Any change pushing past these
+fails the test:
+
+| Target type | p95 | p99 | Error rate |
+|---|---|---|---|
+| `nginx`   | < 1 s   | < 3 s | < 1% (global) |
+| `apache`  | < 1 s   | < 3 s | < 1% (global) |
+| `haproxy` | < 1 s   | < 3 s | < 1% (global) |
+| `f5`      | < 1.5 s | < 5 s | < 1% (global) |
+
+f5-mock's threshold is looser because the iControl REST handler does
+slightly more work per request (login+upload+install dance the F5
+connector itself drives — not exercised here, but the daemon's request
+handler is heavier).
+
+### Connector-tier captured baseline
+
+| Target type | p50 | p95 | p99 | Error rate | Iterations |
+|---|---|---|---|---|---|
+| **nginx** (threshold)   | — | < 1 s   | < 3 s | < 1% | n/a |
+| **nginx** (baseline)    | TBD | TBD | TBD | TBD | TBD |
+| **apache** (threshold)  | — | < 1 s   | < 3 s | < 1% | n/a |
+| **apache** (baseline)   | TBD | TBD | TBD | TBD | TBD |
+| **haproxy** (threshold) | — | < 1 s   | < 3 s | < 1% | n/a |
+| **haproxy** (baseline)  | TBD | TBD | TBD | TBD | TBD |
+| **f5** (threshold)      | — | < 1.5 s | < 5 s | < 1% | n/a |
+| **f5** (baseline)       | TBD | TBD | TBD | TBD | TBD |
+
+The em-dash placeholders are deliberate: do **not** commit numeric values
+without running the loadtest on canonical hardware first. Numbers from a
+developer laptop are misleading. The first `gh workflow run loadtest.yml`
+on a clean GitHub runner captures the baseline; commit the captured numbers
+into the table above as a follow-up commit alongside the methodology line.
+
+**Methodology pinned at baseline capture (canonical hardware):**
+
+- Hardware: GitHub-hosted `ubuntu-latest` runners (currently 4 vCPU /
+  16 GiB / SSD-backed). Operator captures from `gh workflow run loadtest.yml`
+  to keep the hardware constant across runs.
+- Sidecar images: nginx:1.27-alpine, httpd:2.4-alpine, haproxy:2.9-alpine,
+  in-tree f5-mock-icontrol (built from `deploy/test/f5-mock-icontrol/`).
+- Concurrency: 100 conns/min sustained per target type (400 conns/min
+  total across the four target scenarios + 100 req/s on the API tier).
+- Duration: 5 minutes per scenario, 10s stagger between API tier and
+  connector tier so warmup overlap doesn't skew the first 30 seconds.
+- TLS: starter cert from `target-tls-init` (ECDSA P-256, multi-SAN). The
+  loadtest scenarios connect with `K6_INSECURE_SKIP_TLS_VERIFY=true`.
+
+To recapture the connector-tier baseline after a tuning commit affecting
+target sidecars or the connector code:
+
+```sh
+make loadtest
+# Inspect deploy/test/loadtest/results/summary.json for the
+# connector_tier object and update the table above.
+```
+
 ## Files in this directory

 ```
@@ -163,9 +273,15 @@ deploy/test/loadtest/
 ├── docker-compose.yml
 ├── k6.js             (the load script)
 ├── certs/            (gitignored — tls-init writes here)
+├── fixtures/         (Bundle 10: target sidecar configs + shared starter cert)
+│   ├── nginx.conf
+│   ├── httpd.conf
+│   ├── haproxy.cfg
+│   └── target-certs/ (gitignored — target-tls-init writes here)
 └── results/          (gitignored — k6 writes summary.{json,txt} here)
 ```

-## Audit reference
+## Audit references

-`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` Top-10 fix #8.
+- API tier:       `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` fix #8.
+- Connector tier: `cowork/deployment-target-audit-2026-05-02/RESULTS.md` Bundle 10.