diff --git a/.github/workflows/loadtest.yml b/.github/workflows/loadtest.yml new file mode 100644 index 0000000..3c5e94e --- /dev/null +++ b/.github/workflows/loadtest.yml @@ -0,0 +1,73 @@ +# Load-test workflow — closes the #8 acquisition-readiness blocker from +# the 2026-05-01 issuer coverage audit (see +# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md). +# +# CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests +# are minutes long and don't provide useful per-PR signal — per-push +# pressure goes through ci.yml. This workflow exists to (a) catch +# gradual regressions from cumulative changes that no single PR +# triggered, and (b) give an operator a one-click way to capture +# numbers before tagging a release. +# +# THRESHOLDS: defined in deploy/test/loadtest/k6.js (p99 < 5s for +# issuance-acceptance, p99 < 2s for list, error rate < 1%). k6 exits +# non-zero on any breach, which propagates through `docker compose up +# --exit-code-from k6` → `make loadtest` → this workflow's exit. + +name: loadtest + +on: + workflow_dispatch: + # Manual trigger from the Actions tab. Use before tagging a + # release or after a meaningful tuning commit. + + schedule: + # Mondays at 06:00 UTC. Off-peak; catches regressions accumulated + # over the previous week's merges. Once a baseline is committed + # in deploy/test/loadtest/README.md, drift relative to that + # baseline is the signal — diff the captured summary.json + # against the committed numbers. + - cron: '0 6 * * 1' + +# Reduce permissions — this workflow doesn't write to PRs or push tags. +permissions: + contents: read + +jobs: + k6: + name: k6 throughput run + runs-on: ubuntu-latest + # 15-minute hard cap. The harness itself is ~7 minutes (5m run + + # 2m for image build + healthcheck wait); the cap absorbs slow CI + # runners and cold image caches without letting a stuck container + # consume the runner indefinitely. + timeout-minutes: 15 + + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Docker Buildx + # The compose stack builds the certctl image from the repo + # root Dockerfile. Buildx gives the build a usable cache and + # works with newer compose versions. + uses: docker/setup-buildx-action@v3 + + - name: Run loadtest + run: make loadtest + env: + # Disable BuildKit progress noise so the run log is + # diff-able against past runs. + BUILDKIT_PROGRESS: plain + + - name: Upload summary + # Always upload the summary so a regression has a diffable + # artifact even when k6 exited non-zero. summary.json is the + # authoritative machine-readable form; summary.txt is the + # human-readable text the README baseline tracks. + if: always() + uses: actions/upload-artifact@v4 + with: + name: k6-summary-${{ github.run_id }} + path: deploy/test/loadtest/results/ + retention-days: 90 diff --git a/Makefile b/Makefile index 247a339..442a7dd 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: help build run test lint verify verify-docs verify-deploy clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build qa-stats +.PHONY: help build run test lint verify verify-docs verify-deploy loadtest clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build qa-stats # Default target - show help help: @@ -18,6 +18,7 @@ help: @echo " make verify Pre-commit gate: fmt + vet + lint + test (CI-parity)" @echo " make verify-docs Pre-tag gate: QA-doc drift checks (operator-facing docs)" @echo " make verify-deploy Pre-push gate: digest validity + OpenAPI parity + docker build smoke" + @echo " make loadtest k6 throughput run against postgres + certctl (NOT in verify; manual + cron only)" @echo "" @echo "Database:" @echo " make migrate-up Run migrations (requires DB_URL)" @@ -150,6 +151,23 @@ verify-deploy: @echo "" @echo "verify-deploy: PASS — safe to push" +# Load-test harness — closes the #8 acquisition-readiness blocker from +# the 2026-05-01 issuer coverage audit. Boots a minimal certctl stack +# (postgres + tls-init + certctl-server) and runs k6 against the API +# tier for ~5 minutes. Exits non-zero on any threshold breach. +# +# NOT in `make verify` — load tests take minutes, not seconds, and +# don't gate per-PR signal. CI gates this behind workflow_dispatch + +# weekly cron in .github/workflows/loadtest.yml. See +# deploy/test/loadtest/README.md for thresholds, baseline, and how to +# interpret a regression. +loadtest: + @echo "==> spinning up postgres + certctl + k6 driver (this takes ~7m)" + @cd deploy/test/loadtest && docker compose up --build --abort-on-container-exit --exit-code-from k6 + @echo "" + @echo "==> results landed in deploy/test/loadtest/results/" + @if [ -f deploy/test/loadtest/results/summary.txt ]; then cat deploy/test/loadtest/results/summary.txt; fi + # Database targets (requires migrate tool) migrate-up: @echo "Running migrations..." diff --git a/deploy/test/loadtest/.gitignore b/deploy/test/loadtest/.gitignore new file mode 100644 index 0000000..ee72a40 --- /dev/null +++ b/deploy/test/loadtest/.gitignore @@ -0,0 +1,10 @@ +# Per-run artifacts. summary.json + summary.txt are regenerated on +# every `make loadtest` run; committing them would create huge diffs +# on each invocation. The README captures the canonical baseline +# numbers manually. +results/* +!results/.gitkeep + +# tls-init bind mount — server cert + key are regenerated on every +# fresh run. +certs/ diff --git a/deploy/test/loadtest/README.md b/deploy/test/loadtest/README.md new file mode 100644 index 0000000..15c4020 --- /dev/null +++ b/deploy/test/loadtest/README.md @@ -0,0 +1,171 @@ +# certctl Load-Test Harness + +Closes the **#8 acquisition-readiness blocker** from the 2026-05-01 issuer +coverage audit (`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`). +Pre-fix, certctl had zero benchmarks or load tests for any API path; an +acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day +rotation" had nothing to point at. This harness is the substantiation. + +## What it measures + +A k6 driver hits two scenarios in parallel for 5 minutes at a fixed 50 req/s: + +1. **`POST /api/v1/certificates`** — the issuance-acceptance hot path. + Exercises auth, JSON decode, validation, `service.CreateCertificate`, + and the `managed_certificates` insert. This is the operator-facing + request-acceptance throughput an automation client (Terraform, + Crossplane, GitOps controller) would generate. +2. **`GET /api/v1/certificates?per_page=50`** — the most-trafficked read + endpoint. Exercises pagination + filtering on the cert list query. + +Latency is reported as `avg / min / med / p95 / p99 / max`. The error +floor is < 1% (any 4xx/5xx counts as failed). + +## What it explicitly does NOT measure + +- **Issuer connector latency.** Connector calls (DigiCert, ACME, Vault, + AWS ACM PCA, etc.) happen asynchronously via the renewal scheduler. + Their latency is pinned by the `certctl_issuance_duration_seconds{issuer_type=...}` + Prometheus histogram (audit fix #4). Driving them through k6 would + load-test someone else's API, which is wrong. +- **Full ACME enrollment flow.** The audit prompt mentioned ACME-via- + pebble; sustained 100/s through a multi-RTT order/challenge/finalize + flow requires pebble tuning + crypto helpers k6 doesn't ship out of + the box. Deferred to a follow-up. +- **Bulk-revoke / bulk-renew.** Those are admin endpoints with their + own throughput characteristics and warrant a separate scenario. +- **Scheduler concurrency under bulk renewal.** That's audit fix #9's + scope; the harness here measures the API tier, not the scheduler. + +## Threshold contract + +Any future change that breaches one of these fails the test: + +| Scenario | p95 | p99 | Error rate | +|---|---|---|---| +| `issuance_acceptance` | < 2 s | < 5 s | n/a | +| `list_certificates` | < 800 ms | < 2 s | n/a | +| All requests | n/a | n/a | < 1% | + +These are the regression guards, not the SLO. The SLO is whatever the +operator chooses based on the baseline below. + +## How to run + +From the repo root: + +```sh +make loadtest +``` + +This: + +1. Builds the certctl image from the repo root `Dockerfile`. +2. Spins up postgres, the tls-init bootstrap, certctl-server (with + `CERTCTL_DEMO_SEED=true` so the FK rows the script needs exist), + and the k6 driver. +3. Runs the k6 script for ~5 minutes 5 seconds (5s stagger between + scenarios + 5m duration). +4. Prints the summary text to stdout. +5. Exits non-zero if any threshold was breached. + +The full machine-readable summary lands at +`deploy/test/loadtest/results/summary.json` (gitignored). The +human-readable summary lands at `results/summary.txt`. + +To run against a server already booted on the host (skip the compose +spin-up): + +```sh +docker run --rm \ + -e CERTCTL_BASE=https://localhost:8443 \ + -e CERTCTL_TOKEN=load-test-token \ + -e K6_INSECURE_SKIP_TLS_VERIFY=true \ + -v "$(pwd)/deploy/test/loadtest/k6.js:/scripts/k6.js:ro" \ + -v "$(pwd)/deploy/test/loadtest/results:/results" \ + --network host \ + grafana/k6:0.54.0 run /scripts/k6.js +``` + +## Current baseline + +The first operator run captures real numbers and commits them into +this section. Pre-baseline this section reads "TBD — operator captures +on first `make loadtest` run." The numbers below are the agreed +minimum-acceptable thresholds, not the captured baseline; once captured, +the baseline goes here as a separate row so future regressions have a +diff target. + +| Scenario | p50 | p95 | p99 | Error rate | +|---|---|---|---|---| +| **issuance_acceptance** (threshold) | — | < 2 s | < 5 s | < 1% | +| **issuance_acceptance** (baseline) | TBD | TBD | TBD | TBD | +| **list_certificates** (threshold) | — | < 800 ms | < 2 s | < 1% | +| **list_certificates** (baseline) | TBD | TBD | TBD | TBD | + +**Methodology pinned at baseline capture:** +- Hardware: TBD (operator's workstation specs at capture time). +- Postgres: 16-alpine, default config. +- certctl: image built from this repo at the commit referenced below. +- Concurrency: 50 req/s sustained per scenario (100 req/s total). +- Duration: 5 minutes per scenario, 5s stagger. +- Auth: api-key (Bearer token, single key). +- Encryption: `CERTCTL_CONFIG_ENCRYPTION_KEY` set (32+ bytes). + +To recapture the baseline after a tuning commit: + +```sh +make loadtest +# Inspect deploy/test/loadtest/results/summary.txt for the new numbers. +# Update the table above + the methodology line, commit alongside the +# tuning commit. +``` + +## Interpreting a regression + +If a future PR's `make loadtest` run pushes p99 above the threshold, +the make target exits non-zero and CI fails. The summary.txt prints +which threshold breached. Triage: + +1. Look at the per-scenario `http_req_duration` p95 + p99 in + `summary.json`. If only one scenario regressed, the change is + localized to that endpoint's hot path. +2. Look at the `iteration_duration` per scenario — if total iteration + time grew but `http_req_duration` is flat, the latency is in k6 + client setup (rare; suggests something changed in the script). +3. Compare against the committed baseline. If p99 was 800 ms at + baseline and is now 1.5 s but still under the 5 s threshold, the + change is below the regression guard but still meaningful — flag + in the PR description. + +The harness deliberately does NOT auto-tune. Tuning is informed by the +data; tuning commits land separately, each with their own captured +baseline update. + +## CI cadence + +Defined in `.github/workflows/loadtest.yml`: + +- **`workflow_dispatch`** — manual trigger from the Actions tab. Used + before tagging a release or after a meaningful tuning commit. +- **Weekly cron** — Mondays at 06:00 UTC. Catches gradual regressions + from cumulative changes that no single PR triggered. + +The workflow does **not** run per-push. Load tests are minutes long +and would not provide useful per-PR signal; per-push pressure goes +through `make verify` (which is fast) and the deploy-vendor-e2e job. + +## Files in this directory + +``` +deploy/test/loadtest/ +├── README.md (this file) +├── docker-compose.yml +├── k6.js (the load script) +├── certs/ (gitignored — tls-init writes here) +└── results/ (gitignored — k6 writes summary.{json,txt} here) +``` + +## Audit reference + +`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` Top-10 fix #8. diff --git a/deploy/test/loadtest/docker-compose.yml b/deploy/test/loadtest/docker-compose.yml new file mode 100644 index 0000000..b2115a4 --- /dev/null +++ b/deploy/test/loadtest/docker-compose.yml @@ -0,0 +1,162 @@ +# ============================================================================= +# certctl Load-Test Harness — Docker Compose +# ============================================================================= +# +# Spins up a minimal certctl stack and runs a k6 driver against it to capture +# p50 / p95 / p99 latency for the certificate-management API hot path. +# +# Stack: +# 1. postgres — empty database (server runs migrations + seeds at boot) +# 2. certctl-tls-init — one-shot init container; writes self-signed +# server.crt/.key/ca.crt into ./certs (bind mount, +# host-readable so the k6 container can pin against +# it via volumes) +# 3. certctl-server — HTTPS API on :8443, demo-seed enabled so the k6 +# script has iss-local + an operator + a team +# ready to reference in CreateCertificate payloads +# 4. k6 — runs k6.js once and exits with the threshold- +# driven exit code (zero on green, non-zero on any +# threshold breach so `make loadtest` surfaces +# regressions as a failed shell command) +# +# Usage: make loadtest (from the repo root) +# Manual: cd deploy/test/loadtest && docker compose up --abort-on-container-exit --exit-code-from k6 +# +# Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8. +# ============================================================================= + +services: + + # --------------------------------------------------------------------------- + # Self-signed TLS bootstrap. Mirrors the deploy/docker-compose.test.yml + # tls-init pattern exactly: bind-mount instead of named volume so the host + # (and the sibling k6 container) can read ca.crt without a chown dance. + # See deploy/docker-compose.test.yml::certctl-tls-init for the full rationale. + # --------------------------------------------------------------------------- + certctl-tls-init: + image: alpine/openssl:latest + container_name: certctl-loadtest-tls-init + restart: "no" + entrypoint: /bin/sh + command: + - -c + - | + set -eu + CERT=/etc/certctl/tls/server.crt + KEY=/etc/certctl/tls/server.key + CA=/etc/certctl/tls/ca.crt + if [ -f "$$CERT" ] && [ -f "$$KEY" ] && [ -f "$$CA" ]; then + echo "TLS cert already present — skipping generation" + else + mkdir -p /etc/certctl/tls + openssl req -x509 -newkey ec \ + -pkeyopt ec_paramgen_curve:P-256 \ + -nodes \ + -keyout "$$KEY" \ + -out "$$CERT" \ + -days 3650 \ + -subj "/CN=certctl-server" \ + -addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1" + cp "$$CERT" "$$CA" + echo "Generated self-signed TLS cert (ECDSA-P256, 3650d, CN=certctl-server)" + fi + chmod 0644 "$$CERT" "$$CA" + chmod 0600 "$$KEY" + volumes: + - ./certs:/etc/certctl/tls + + # --------------------------------------------------------------------------- + # Database. The server runs migrations + seed.sql + (because + # CERTCTL_DEMO_SEED=true below) seed_demo.sql at boot — so the load-test + # k6 script can reference iss-local, o-alice, t-platform, and rp-default + # without a separate seed step. + # --------------------------------------------------------------------------- + postgres: + image: postgres:16-alpine + container_name: certctl-loadtest-postgres + environment: + POSTGRES_DB: certctl + POSTGRES_USER: certctl + POSTGRES_PASSWORD: loadtestpass + healthcheck: + test: ["CMD-SHELL", "pg_isready -U certctl"] + interval: 5s + timeout: 3s + retries: 10 + start_period: 30s + + # --------------------------------------------------------------------------- + # certctl server. Built from the repo root Dockerfile (same as production). + # Demo seed is enabled so referenced FK rows exist when the k6 script + # POSTs CreateCertificate payloads. Auth is api-key with a deterministic + # token the k6 script knows. + # --------------------------------------------------------------------------- + certctl-server: + build: + context: ../../.. + dockerfile: Dockerfile + args: + HTTP_PROXY: ${HTTP_PROXY:-} + HTTPS_PROXY: ${HTTPS_PROXY:-} + NO_PROXY: ${NO_PROXY:-} + container_name: certctl-loadtest-server + depends_on: + postgres: + condition: service_healthy + certctl-tls-init: + condition: service_completed_successfully + environment: + CERTCTL_DATABASE_URL: postgres://certctl:loadtestpass@postgres:5432/certctl?sslmode=disable + CERTCTL_SERVER_HOST: 0.0.0.0 + CERTCTL_SERVER_PORT: 8443 + CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt + CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key + CERTCTL_LOG_LEVEL: warn + CERTCTL_AUTH_TYPE: api-key + CERTCTL_AUTH_SECRET: load-test-token + CERTCTL_KEYGEN_MODE: agent + # CERTCTL_DEMO_SEED=true triggers seed_demo.sql which creates iss-local, + # o-alice, t-platform, rp-standard so CreateCertificate FK validation + # has rows to bind to. + CERTCTL_DEMO_SEED: "true" + # Bigger body limit so listing 100s of certs in the GET scenario + # doesn't 413 once the harness has been running for a few minutes. + CERTCTL_MAX_BODY_SIZE: "10485760" + # Encryption key (≥32 bytes per H-1 floor — the test compose's + # documented value). + CERTCTL_CONFIG_ENCRYPTION_KEY: "loadtest-key-must-be-32-bytes-long-yes" + volumes: + - ./certs:/etc/certctl/tls:ro + healthcheck: + # /healthz is unauthenticated. -k because the cert is self-signed. + test: ["CMD-SHELL", "wget -q --no-check-certificate -O- https://localhost:8443/healthz || exit 1"] + interval: 5s + timeout: 3s + retries: 30 + start_period: 60s + + # --------------------------------------------------------------------------- + # k6 driver. Pinned to a specific version so threshold expressions stay + # stable across runs. --insecure-skip-tls-verify because the server cert is + # self-signed; the load test isn't a TLS conformance test. The k6 process + # exits non-zero if any threshold is breached, which the parent + # `docker compose up --exit-code-from k6` propagates as the compose exit + # code, which `make loadtest` then surfaces as the make-target exit code. + # --------------------------------------------------------------------------- + k6: + image: grafana/k6:0.54.0 + container_name: certctl-loadtest-k6 + depends_on: + certctl-server: + condition: service_healthy + environment: + CERTCTL_BASE: https://certctl-server:8443 + CERTCTL_TOKEN: load-test-token + K6_INSECURE_SKIP_TLS_VERIFY: "true" + volumes: + - ./k6.js:/scripts/k6.js:ro + - ./results:/results + command: + - run + - --summary-export=/results/summary.json + - /scripts/k6.js diff --git a/deploy/test/loadtest/k6.js b/deploy/test/loadtest/k6.js new file mode 100644 index 0000000..fd30d58 --- /dev/null +++ b/deploy/test/loadtest/k6.js @@ -0,0 +1,163 @@ +// certctl load-test driver — k6 v0.54+ JS API. +// +// Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer +// coverage audit. Pre-fix, certctl had no benchmarks or load tests for any +// API path. An acquirer evaluating "can certctl handle our 50k-cert fleet +// at 47-day rotation" had nothing to point at; this script gives them +// a reproducible number with a methodology. +// +// What this measures (be honest about scope): +// - POST /api/v1/certificates: auth + JSON decode + validation + service +// CreateCertificate + DB insert + response. This is the operator-facing +// request-acceptance throughput. The downstream issuer-connector call +// happens asynchronously via the renewal scheduler (and is bounded +// separately via CERTCTL_RENEWAL_CONCURRENCY — audit fix #9). +// - GET /api/v1/certificates: read path with pagination. Exercises the +// cert list query, which is the most-called read endpoint in any UI/ +// automation client. +// +// What this does NOT measure: +// - Issuer connector latency (DigiCert / ACME / Vault / etc. round-trips +// to upstream CAs). Those are async; pin via the per-issuer-type +// metrics instead (audit fix #4: certctl_issuance_duration_seconds). +// - The full ACME enrollment flow (newOrder → challenge → finalize). +// The audit prompt mentioned ACME-via-pebble; deferred to a follow-up +// because driving multi-RTT ACME flows at sustained 100/s requires +// pebble tuning + k6 crypto helpers that don't exist out of the box. +// +// Threshold contract: any future change that pushes p99 above 5s for the +// issuance-acceptance scenario or 2s for the read scenario, OR any change +// that pushes the error rate above 1%, fails the test. CI gates the run +// behind workflow_dispatch + cron (NOT per-push — load tests are too slow +// to gate per-PR signal). +// +// Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8. + +import http from 'k6/http'; +import { check } from 'k6'; +import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.2/index.js'; + +// __ENV.* lets the same script run unchanged on the operator's +// workstation (CERTCTL_BASE=https://localhost:8443) and inside the +// docker-compose stack (CERTCTL_BASE=https://certctl-server:8443). +const BASE = __ENV.CERTCTL_BASE || 'https://localhost:8443'; +const TOKEN = __ENV.CERTCTL_TOKEN || 'load-test-token'; + +// Demo seed (CERTCTL_DEMO_SEED=true) creates these rows; CreateCertificate +// requires all four FKs to exist. Pre-baked here so the script has zero +// dependency on test fixtures beyond the seed. +const ISSUER_ID = 'iss-local'; +const OWNER_ID = 'o-alice'; +const TEAM_ID = 't-platform'; +const RENEWAL_POLICY = 'rp-standard'; + +export const options = { + scenarios: { + // Issuance-acceptance throughput. constant-arrival-rate fires + // requests at a fixed rate regardless of latency, which is the + // right shape for capacity testing — VU-bound load (constant-vus) + // would let slow responses backpressure the offered load and + // mask actual capacity ceilings. + issuance_acceptance: { + executor: 'constant-arrival-rate', + rate: 50, + timeUnit: '1s', + duration: '5m', + preAllocatedVUs: 50, + maxVUs: 200, + exec: 'createCertificate', + tags: { scenario: 'issuance_acceptance' }, + }, + // Read path. Same rate as issuance so the DB sees a balanced + // mix; staggered start so warmup overlap doesn't skew the + // first 30 seconds of either scenario. + list_certificates: { + executor: 'constant-arrival-rate', + rate: 50, + timeUnit: '1s', + duration: '5m', + preAllocatedVUs: 50, + maxVUs: 200, + exec: 'listCertificates', + startTime: '5s', + tags: { scenario: 'list_certificates' }, + }, + }, + thresholds: { + // Hard floor: 99% of issuance-acceptance requests complete in + // under 5 seconds. Pre-fix this was unsubstantiated; post-fix + // this is the regression guard. The number isn't aspirational — + // it's the worst-acceptable user-facing API SLO from the + // operator perspective. + 'http_req_duration{scenario:issuance_acceptance}': ['p(99)<5000', 'p(95)<2000'], + 'http_req_duration{scenario:list_certificates}': ['p(99)<2000', 'p(95)<800'], + // < 1% error rate. The k6 default is "any 4xx/5xx counts as + // failed"; legitimate 201/200 responses don't count. Auth + // failures, validation failures, server errors all do. + 'http_req_failed': ['rate<0.01'], + }, + // Smaller summary payload — strip per-VU metrics we don't read. + summaryTrendStats: ['avg', 'min', 'med', 'p(95)', 'p(99)', 'max'], +}; + +// uniqueCN returns a deterministic-but-unique CommonName per +// (VU, iter). This avoids unique-constraint violations on the +// managed_certificates row (the table has a unique index on +// (issuer_id, name) so two parallel POSTs with the same Name 409 +// rather than 201). +function uniqueCN() { + return `loadtest-${__VU}-${__ITER}-${Date.now()}.example.test`; +} + +export function createCertificate() { + const cn = uniqueCN(); + const payload = JSON.stringify({ + name: cn, + common_name: cn, + issuer_id: ISSUER_ID, + owner_id: OWNER_ID, + team_id: TEAM_ID, + renewal_policy_id: RENEWAL_POLICY, + environment: 'production', + sans: [cn], + }); + + const res = http.post(`${BASE}/api/v1/certificates`, payload, { + headers: { + 'Content-Type': 'application/json', + 'Authorization': `Bearer ${TOKEN}`, + }, + tags: { scenario: 'issuance_acceptance' }, + }); + + check(res, { + 'create status 201': (r) => r.status === 201, + }); +} + +export function listCertificates() { + const res = http.get(`${BASE}/api/v1/certificates?per_page=50`, { + headers: { + 'Authorization': `Bearer ${TOKEN}`, + }, + tags: { scenario: 'list_certificates' }, + }); + + check(res, { + 'list status 200': (r) => r.status === 200, + }); +} + +// handleSummary writes the full results to /results/summary.{json,txt} +// so the operator can commit the baseline numbers into README.md after +// each run and so CI can ingest the JSON for diffing. +// +// stdout reproduces the textSummary so the docker compose log shows +// the same numbers an operator running it manually would see. +export function handleSummary(data) { + return { + '/results/summary.json': JSON.stringify(data, null, 2), + '/results/summary.txt': textSummary(data, { indent: ' ', enableColors: false }), + stdout: textSummary(data, { indent: ' ', enableColors: true }), + }; +} diff --git a/deploy/test/loadtest/results/.gitkeep b/deploy/test/loadtest/results/.gitkeep new file mode 100644 index 0000000..f87f41c --- /dev/null +++ b/deploy/test/loadtest/results/.gitkeep @@ -0,0 +1,3 @@ +# Placeholder so `results/` exists in a fresh checkout. The k6 +# container mounts this directory and writes summary.{json,txt} into +# it on every run; both outputs are gitignored. diff --git a/docs/architecture.md b/docs/architecture.md index 395297d..8811906 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1312,6 +1312,16 @@ certctl is extensively tested across eight layers with CI-enforced coverage gate For detailed test procedures, smoke tests, and the release sign-off checklist, see the [Testing Guide](testing-guide.md). For setting up the Docker Compose test environment with real CA backends, see [Test Environment](test-env.md). +## Performance Characteristics + +Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (see `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`). Pre-audit, certctl had no benchmarks or load tests for any API path, so any throughput claim was hand-waved; the harness in `deploy/test/loadtest/` substantiates the API-tier capacity numbers with reproducible methodology. + +The harness drives a k6 client at sustained 50 req/s × 2 scenarios × 5 minutes against a docker-compose stack of postgres + tls-init + certctl-server. Two scenarios run in parallel: `POST /api/v1/certificates` (issuance-acceptance hot path: auth + JSON decode + validation + service `CreateCertificate` + `managed_certificates` insert) and `GET /api/v1/certificates?per_page=50` (most-trafficked read endpoint). Hard regression-guard thresholds: p99 < 5 s for issuance-acceptance, p99 < 2 s for list, error rate < 1% globally. k6 exits non-zero on any threshold breach so a future PR that pushes p99 above the bar fails `make loadtest`. Run via `make loadtest` from the repo root or via `.github/workflows/loadtest.yml` (`workflow_dispatch` + weekly cron — never per-push). + +What this measures vs what it does NOT: the harness intentionally measures the API tier (auth → DB), not the issuer connector round-trip latency. Connector calls (DigiCert, ACME, Vault, AWS ACM PCA, etc.) happen asynchronously through the renewal scheduler and are pinned by the `certctl_issuance_duration_seconds{issuer_type=...}` Prometheus histogram (audit fix #4 from the same audit). Driving them through k6 would amount to load-testing someone else's API, which is the wrong thing to do. The full ACME enrollment flow (multi-RTT order/challenge/finalize against pebble) is deferred — sustained 100/s through that flow needs pebble tuning + crypto helpers k6 doesn't ship out of the box. + +Captured baseline numbers are committed in `deploy/test/loadtest/README.md` once an operator runs the harness on a representative workstation; future tuning commits land alongside refreshed baseline numbers so each commit's impact is diffable. Operators considering certctl for a 50k-cert fleet at 47-day TLS rotation (CA/B Forum SC-081v3, lands 2029) have a published number with documented methodology to compare against, not a claim. + ## What's Next - [Quick Start](quickstart.md) — Get certctl running locally