mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:51:30 +00:00
e292faafc6
Closes Bundle 10 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
deploy/test/loadtest/k6.js drove only the API-tier throughput path
(POST /api/v1/certificates + GET /api/v1/certificates) — the operator-
facing rate at which an automation client can submit cert requests.
The deploy hot path (cert deployed to a target — connector-tier
latency) had no benchmarks. Procurement asks "can certctl handle our
5,000-NGINX fleet at 47-day rotation?" and the answer should be a
number with methodology, not a claim.
This commit ships v1 of the connector-tier loadtest harness:
1. Target-side sidecars added to docker-compose.yml: nginx-target,
apache-target, haproxy-target, f5-mock-target. Each daemon serves
a starter cert (ECDSA P-256, multi-SAN) written into a shared
./fixtures/target-certs/ volume by a new target-tls-init
container. f5-mock-target re-uses the in-tree
deploy/test/f5-mock-icontrol/ image (already used by the deploy-
vendor-e2e CI job) and generates its own self-signed cert via
tls.go::selfSignedCert at startup.
2. Fixture configs committed under deploy/test/loadtest/fixtures/:
- nginx.conf — minimal HTTPS server, single 200 OK location.
- httpd.conf — self-contained Apache config with the minimum
module set + SSL vhost.
- haproxy.cfg — minimal SSL-terminating frontend backed by a
static "ok" backend.
3. k6 scenarios added (4 new): nginx_handshake, apache_handshake,
haproxy_handshake, f5_handshake. Each runs constant-arrival-rate
at 100 conns/min for 5 minutes. Latency captured by k6's
http_req_duration metric covers TCP connect + TLS handshake +
tiny HTTP request/response — that's the end-to-end "connection
readiness" latency a deploy connector cares about.
4. summary.json gains a connector_tier object with per-target
p50/p95/p99/max/avg/error_rate/iterations breakdowns. Operators
tracking a connector regression diff connector_tier.<type>
between runs. Implementation: a new enrichWithConnectorTier
helper that reads data.metrics keyed by target_type tag and
shallow-merges the breakdown into the summary before
serialisation.
5. Threshold contract per target type:
- nginx/apache/haproxy: p99 < 3s, p95 < 1s.
- f5-mock: p99 < 5s, p95 < 1.5s (iControl REST
handler does slightly more work per
request than pure TLS termination).
- All scenarios: error rate < 1% (k6 default; any 4xx/5xx
counts as failed).
Any change pushing past these fails the workflow.
6. README documents the methodology + the baseline-number table for
the connector tier. Numeric values are em-dash placeholders
pending the first clean canonical-hardware run; the accompanying
commit message in that follow-up captures the methodology line
alongside the numbers. Out-of-scope is documented explicitly:
- Full agent-driven deploy poll loop (POST cert with target
binding → poll deployments endpoint → verify served cert).
v2 of the harness — needs the agent registration + target-
binding API surface plumbed end-to-end in the loadtest stack.
- Kubernetes target via kind-in-docker. kind requires
`privileged: true` and is operationally fragile in CI;
deferred until Bundle 2 (real k8s.io/client-go) lands and a
CI-friendly envtest harness is wired.
- Real F5 BIG-IP. CI uses the in-tree f5-mock; real-appliance
benchmarking is out of scope.
7. CI workflow .github/workflows/loadtest.yml timeout-minutes
bumped from 15 to 25. The harness now boots four additional
target sidecars before the k6 run; their healthchecks add
~30-60s. The k6 scenarios themselves are still 5 minutes (run
in parallel, not serially). 25 minutes absorbs that plus slow
CI runners and cold image caches without letting a stuck
container consume the runner indefinitely. Trigger remains
workflow_dispatch + cron — sustained 25-minute runs are too
slow for per-PR signal.
What this connector tier explicitly does NOT measure (documented in
the k6.js header + README):
- The agent-driven full deploy hot path (v2 follow-up).
- K8s target (Bundle 2 dependency).
- Real F5 appliance.
- Issuer-side throughput (handled by issuer-coverage-audit fix #8).
Verified locally:
- python3 -c "import yaml; yaml.safe_load(...)" on docker-compose.yml
and .github/workflows/loadtest.yml — clean.
- node -c on k6.js — clean syntax.
- gofmt / go vet on the rest of the tree (no Go diff in this commit).
- Manual smoke against docker-compose pending — operator validates
on the canonical-hardware first run; if any fixture config is off,
fix-up commit lands separately so the methodology change and the
numeric baseline have independent reviewability.
No Go code changes; this is a loadtest-harness-only commit.
Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 10.
78 lines
3.0 KiB
YAML
78 lines
3.0 KiB
YAML
# Load-test workflow — closes the #8 acquisition-readiness blocker from
|
|
# the 2026-05-01 issuer coverage audit (see
|
|
# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
|
|
#
|
|
# CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests
|
|
# are minutes long and don't provide useful per-PR signal — per-push
|
|
# pressure goes through ci.yml. This workflow exists to (a) catch
|
|
# gradual regressions from cumulative changes that no single PR
|
|
# triggered, and (b) give an operator a one-click way to capture
|
|
# numbers before tagging a release.
|
|
#
|
|
# THRESHOLDS: defined in deploy/test/loadtest/k6.js (p99 < 5s for
|
|
# issuance-acceptance, p99 < 2s for list, error rate < 1%). k6 exits
|
|
# non-zero on any breach, which propagates through `docker compose up
|
|
# --exit-code-from k6` → `make loadtest` → this workflow's exit.
|
|
|
|
name: loadtest
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
# Manual trigger from the Actions tab. Use before tagging a
|
|
# release or after a meaningful tuning commit.
|
|
|
|
schedule:
|
|
# Mondays at 06:00 UTC. Off-peak; catches regressions accumulated
|
|
# over the previous week's merges. Once a baseline is committed
|
|
# in deploy/test/loadtest/README.md, drift relative to that
|
|
# baseline is the signal — diff the captured summary.json
|
|
# against the committed numbers.
|
|
- cron: '0 6 * * 1'
|
|
|
|
# Reduce permissions — this workflow doesn't write to PRs or push tags.
|
|
permissions:
|
|
contents: read
|
|
|
|
jobs:
|
|
k6:
|
|
name: k6 throughput run
|
|
runs-on: ubuntu-latest
|
|
# 25-minute hard cap. Pre-Bundle-10: 15min was enough for the API
|
|
# tier alone (~7 minutes total). Post-Bundle-10 the harness boots
|
|
# four additional target sidecars (nginx, apache, haproxy, f5-mock)
|
|
# before the k6 run; their healthchecks add ~30-60s. The k6 scenarios
|
|
# themselves are still 5 minutes (run in parallel with the API
|
|
# scenarios, not serially). 25 minutes absorbs that plus slow CI
|
|
# runners and cold image caches without letting a stuck container
|
|
# consume the runner indefinitely.
|
|
timeout-minutes: 25
|
|
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Set up Docker Buildx
|
|
# The compose stack builds the certctl image from the repo
|
|
# root Dockerfile. Buildx gives the build a usable cache and
|
|
# works with newer compose versions.
|
|
uses: docker/setup-buildx-action@v3
|
|
|
|
- name: Run loadtest
|
|
run: make loadtest
|
|
env:
|
|
# Disable BuildKit progress noise so the run log is
|
|
# diff-able against past runs.
|
|
BUILDKIT_PROGRESS: plain
|
|
|
|
- name: Upload summary
|
|
# Always upload the summary so a regression has a diffable
|
|
# artifact even when k6 exited non-zero. summary.json is the
|
|
# authoritative machine-readable form; summary.txt is the
|
|
# human-readable text the README baseline tracks.
|
|
if: always()
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: k6-summary-${{ github.run_id }}
|
|
path: deploy/test/loadtest/results/
|
|
retention-days: 90
|