mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:41:30 +00:00
3a665ae6ba
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. Pre-fix, certctl had zero benchmarks or load tests for any API path. An acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day rotation" had nothing to point at; CA/B Forum SC-081v3 lands 47-day TLS in 2029, and operators need real numbers, not hand- waved capacity claims. What landed: - deploy/test/loadtest/docker-compose.yml — minimal stack (postgres + tls-init bootstrap + certctl-server with CERTCTL_DEMO_SEED=true so the FK rows the script needs exist + grafana/k6:0.54.0 driver). Pinned k6 version so threshold expressions stay stable across runs. k6 command runs the script once and exits with the threshold-driven exit code so `--exit-code-from k6` propagates non-zero on any regression. - deploy/test/loadtest/k6.js — two scenarios at 50 req/s × 5 min, staggered 5s. Scenario 1: POST /api/v1/certificates (issuance- acceptance hot path: auth + JSON decode + validation + service CreateCertificate + DB insert). Scenario 2: GET /api/v1/certificates (most-trafficked read endpoint, exercises pagination). Hard thresholds: p99 < 5s + p95 < 2s for issuance-acceptance, p99 < 2s + p95 < 800ms for list, error rate < 1% globally. constant-arrival- rate executor (NOT constant-vus) so VU-bound load doesn't backpressure the offered rate and mask capacity ceilings. __ENV.CERTCTL_BASE lets the same script run on the operator's workstation (https://localhost:8443) and inside the compose stack (https://certctl-server:8443). - deploy/test/loadtest/README.md — documents what's measured (API tier: auth → DB) vs what's NOT (issuer connector latency: pinned separately by certctl_issuance_duration_seconds from audit fix #4; full ACME enrollment flow: deferred — sustained 100/s through multi-RTT pebble takes pebble tuning + crypto helpers k6 doesn't ship with). Threshold contract pinned. Baseline numbers row reads TBD until the operator captures on a representative workstation; methodology pinned so future tuning commits land alongside refreshed baselines that are diffable. - deploy/test/loadtest/.gitignore — results/{summary.json,summary.txt} + certs/ (per-run TLS bootstrap output). Both regenerate on every run; committing them would create huge per-run diffs. - deploy/test/loadtest/results/.gitkeep — placeholder so the directory exists in fresh checkouts (the k6 container mounts it). - Makefile: new `loadtest` target spinning up the compose stack with --abort-on-container-exit --exit-code-from k6 and printing the summary. Added to .PHONY + help. Explicitly NOT in `make verify` — load tests are minutes long and don't gate per-PR signal. - .github/workflows/loadtest.yml — workflow_dispatch (manual) + weekly cron at Mon 06:00 UTC. NOT per-push. 15-minute hard cap. Always uploads results/ as an artifact (90d retention) so a regression has a diffable artifact even when k6 exited non-zero. Read-only repo permissions. - docs/architecture.md: new "Performance Characteristics" section citing the harness location, scenarios, thresholds, scope (what's measured vs not), and where the captured baseline lives. Inserted before the existing "What's Next" section. Scope decisions documented in the README + this commit message: - The audit prompt's k6 example targeted POST /api/v1/certificates + ACME-via-pebble. CreateCertificate exercises auth + DB but the downstream issuer-connector call is async (renewal scheduler); that's the right surface for "request-acceptance" throughput. Driving the connectors directly would load-test someone else's API. - Pebble was excluded from the harness stack. Sustained 100/s through ACME's order/challenge/finalize flow needs pebble tuning + k6 crypto helpers that don't exist out of the box. README flags this as a deferred follow-up. Acquirer impact: the diligence question "what's your throughput?" now has a number with a reproducible methodology and a regression guard, not a claim. The first operator run captures the baseline into README.md so subsequent tuning commits are diffable. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - go build ./... clean - bash scripts/ci-guards/H-1-encryption-key-min-length.sh — clean (the 38-byte loadtest key is above the 32-byte floor) - bash scripts/ci-guards/openapi-handler-parity.sh — clean - bash scripts/ci-guards/test-compose-scep-coherence.sh — clean - make -n loadtest produces the expected command sequence - The first `make loadtest` run from the operator's workstation populates the README baseline numbers (committed in a follow-up). Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #8.
74 lines
2.7 KiB
YAML
74 lines
2.7 KiB
YAML
# Load-test workflow — closes the #8 acquisition-readiness blocker from
|
|
# the 2026-05-01 issuer coverage audit (see
|
|
# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
|
|
#
|
|
# CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests
|
|
# are minutes long and don't provide useful per-PR signal — per-push
|
|
# pressure goes through ci.yml. This workflow exists to (a) catch
|
|
# gradual regressions from cumulative changes that no single PR
|
|
# triggered, and (b) give an operator a one-click way to capture
|
|
# numbers before tagging a release.
|
|
#
|
|
# THRESHOLDS: defined in deploy/test/loadtest/k6.js (p99 < 5s for
|
|
# issuance-acceptance, p99 < 2s for list, error rate < 1%). k6 exits
|
|
# non-zero on any breach, which propagates through `docker compose up
|
|
# --exit-code-from k6` → `make loadtest` → this workflow's exit.
|
|
|
|
name: loadtest
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
# Manual trigger from the Actions tab. Use before tagging a
|
|
# release or after a meaningful tuning commit.
|
|
|
|
schedule:
|
|
# Mondays at 06:00 UTC. Off-peak; catches regressions accumulated
|
|
# over the previous week's merges. Once a baseline is committed
|
|
# in deploy/test/loadtest/README.md, drift relative to that
|
|
# baseline is the signal — diff the captured summary.json
|
|
# against the committed numbers.
|
|
- cron: '0 6 * * 1'
|
|
|
|
# Reduce permissions — this workflow doesn't write to PRs or push tags.
|
|
permissions:
|
|
contents: read
|
|
|
|
jobs:
|
|
k6:
|
|
name: k6 throughput run
|
|
runs-on: ubuntu-latest
|
|
# 15-minute hard cap. The harness itself is ~7 minutes (5m run +
|
|
# 2m for image build + healthcheck wait); the cap absorbs slow CI
|
|
# runners and cold image caches without letting a stuck container
|
|
# consume the runner indefinitely.
|
|
timeout-minutes: 15
|
|
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Set up Docker Buildx
|
|
# The compose stack builds the certctl image from the repo
|
|
# root Dockerfile. Buildx gives the build a usable cache and
|
|
# works with newer compose versions.
|
|
uses: docker/setup-buildx-action@v3
|
|
|
|
- name: Run loadtest
|
|
run: make loadtest
|
|
env:
|
|
# Disable BuildKit progress noise so the run log is
|
|
# diff-able against past runs.
|
|
BUILDKIT_PROGRESS: plain
|
|
|
|
- name: Upload summary
|
|
# Always upload the summary so a regression has a diffable
|
|
# artifact even when k6 exited non-zero. summary.json is the
|
|
# authoritative machine-readable form; summary.txt is the
|
|
# human-readable text the README baseline tracks.
|
|
if: always()
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: k6-summary-${{ github.run_id }}
|
|
path: deploy/test/loadtest/results/
|
|
retention-days: 90
|