mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-09 11:28:52 +00:00
loadtest: per-connector deploy throughput scenarios + target sidecars + README baseline section
Closes Bundle 10 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
deploy/test/loadtest/k6.js drove only the API-tier throughput path
(POST /api/v1/certificates + GET /api/v1/certificates) — the operator-
facing rate at which an automation client can submit cert requests.
The deploy hot path (cert deployed to a target — connector-tier
latency) had no benchmarks. Procurement asks "can certctl handle our
5,000-NGINX fleet at 47-day rotation?" and the answer should be a
number with methodology, not a claim.
This commit ships v1 of the connector-tier loadtest harness:
1. Target-side sidecars added to docker-compose.yml: nginx-target,
apache-target, haproxy-target, f5-mock-target. Each daemon serves
a starter cert (ECDSA P-256, multi-SAN) written into a shared
./fixtures/target-certs/ volume by a new target-tls-init
container. f5-mock-target re-uses the in-tree
deploy/test/f5-mock-icontrol/ image (already used by the deploy-
vendor-e2e CI job) and generates its own self-signed cert via
tls.go::selfSignedCert at startup.
2. Fixture configs committed under deploy/test/loadtest/fixtures/:
- nginx.conf — minimal HTTPS server, single 200 OK location.
- httpd.conf — self-contained Apache config with the minimum
module set + SSL vhost.
- haproxy.cfg — minimal SSL-terminating frontend backed by a
static "ok" backend.
3. k6 scenarios added (4 new): nginx_handshake, apache_handshake,
haproxy_handshake, f5_handshake. Each runs constant-arrival-rate
at 100 conns/min for 5 minutes. Latency captured by k6's
http_req_duration metric covers TCP connect + TLS handshake +
tiny HTTP request/response — that's the end-to-end "connection
readiness" latency a deploy connector cares about.
4. summary.json gains a connector_tier object with per-target
p50/p95/p99/max/avg/error_rate/iterations breakdowns. Operators
tracking a connector regression diff connector_tier.<type>
between runs. Implementation: a new enrichWithConnectorTier
helper that reads data.metrics keyed by target_type tag and
shallow-merges the breakdown into the summary before
serialisation.
5. Threshold contract per target type:
- nginx/apache/haproxy: p99 < 3s, p95 < 1s.
- f5-mock: p99 < 5s, p95 < 1.5s (iControl REST
handler does slightly more work per
request than pure TLS termination).
- All scenarios: error rate < 1% (k6 default; any 4xx/5xx
counts as failed).
Any change pushing past these fails the workflow.
6. README documents the methodology + the baseline-number table for
the connector tier. Numeric values are em-dash placeholders
pending the first clean canonical-hardware run; the accompanying
commit message in that follow-up captures the methodology line
alongside the numbers. Out-of-scope is documented explicitly:
- Full agent-driven deploy poll loop (POST cert with target
binding → poll deployments endpoint → verify served cert).
v2 of the harness — needs the agent registration + target-
binding API surface plumbed end-to-end in the loadtest stack.
- Kubernetes target via kind-in-docker. kind requires
`privileged: true` and is operationally fragile in CI;
deferred until Bundle 2 (real k8s.io/client-go) lands and a
CI-friendly envtest harness is wired.
- Real F5 BIG-IP. CI uses the in-tree f5-mock; real-appliance
benchmarking is out of scope.
7. CI workflow .github/workflows/loadtest.yml timeout-minutes
bumped from 15 to 25. The harness now boots four additional
target sidecars before the k6 run; their healthchecks add
~30-60s. The k6 scenarios themselves are still 5 minutes (run
in parallel, not serially). 25 minutes absorbs that plus slow
CI runners and cold image caches without letting a stuck
container consume the runner indefinitely. Trigger remains
workflow_dispatch + cron — sustained 25-minute runs are too
slow for per-PR signal.
What this connector tier explicitly does NOT measure (documented in
the k6.js header + README):
- The agent-driven full deploy hot path (v2 follow-up).
- K8s target (Bundle 2 dependency).
- Real F5 appliance.
- Issuer-side throughput (handled by issuer-coverage-audit fix #8).
Verified locally:
- python3 -c "import yaml; yaml.safe_load(...)" on docker-compose.yml
and .github/workflows/loadtest.yml — clean.
- node -c on k6.js — clean syntax.
- gofmt / go vet on the rest of the tree (no Go diff in this commit).
- Manual smoke against docker-compose pending — operator validates
on the canonical-hardware first run; if any fixture config is off,
fix-up commit lands separately so the methodology change and the
numeric baseline have independent reviewability.
No Go code changes; this is a loadtest-harness-only commit.
Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 10.
This commit is contained in:
+220
-28
@@ -1,37 +1,67 @@
|
||||
// certctl load-test driver — k6 v0.54+ JS API.
|
||||
//
|
||||
// Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer
|
||||
// coverage audit. Pre-fix, certctl had no benchmarks or load tests for any
|
||||
// API path. An acquirer evaluating "can certctl handle our 50k-cert fleet
|
||||
// at 47-day rotation" had nothing to point at; this script gives them
|
||||
// a reproducible number with a methodology.
|
||||
// Two tiers of scenarios:
|
||||
//
|
||||
// What this measures (be honest about scope):
|
||||
// API tier (issuer-coverage audit fix #8, 2026-05-01):
|
||||
// - issuance_acceptance: POST /api/v1/certificates throughput.
|
||||
// - list_certificates: GET /api/v1/certificates throughput.
|
||||
//
|
||||
// Connector tier (Bundle 10 of the deployment-target audit, 2026-05-02):
|
||||
// - nginx_handshake / apache_handshake / haproxy_handshake / f5_handshake:
|
||||
// per-target-type TCP+TLS handshake throughput against the four
|
||||
// target sidecars at sustained 100 conns/min for 5 minutes. Latency
|
||||
// is tagged by target_type so summary.json's connector_tier section
|
||||
// breaks out p50/p95/p99 per target.
|
||||
//
|
||||
// What the API tier measures (be honest about scope):
|
||||
// - POST /api/v1/certificates: auth + JSON decode + validation + service
|
||||
// CreateCertificate + DB insert + response. This is the operator-facing
|
||||
// request-acceptance throughput. The downstream issuer-connector call
|
||||
// happens asynchronously via the renewal scheduler (and is bounded
|
||||
// separately via CERTCTL_RENEWAL_CONCURRENCY — audit fix #9).
|
||||
// separately via CERTCTL_RENEWAL_CONCURRENCY — issuer audit fix #9).
|
||||
// - GET /api/v1/certificates: read path with pagination. Exercises the
|
||||
// cert list query, which is the most-called read endpoint in any UI/
|
||||
// automation client.
|
||||
//
|
||||
// What this does NOT measure:
|
||||
// What the connector tier measures:
|
||||
// - Per-target-type TCP+TLS handshake completion latency. Validates that
|
||||
// each target sidecar (nginx, apache, haproxy, f5-mock) is operational
|
||||
// and serving its starter cert under sustained connection load.
|
||||
// Procurement asks "can certctl's nginx target handle 5,000 endpoints
|
||||
// at 47-day rotation"; the answer requires (a) the connector code
|
||||
// handles deploys correctly (covered by per-connector unit tests) AND
|
||||
// (b) the underlying daemon serves TLS at the connection rates a
|
||||
// 5,000-endpoint fleet implies. The connector-tier scenarios pin (b).
|
||||
//
|
||||
// What this does NOT measure (documented limits, not lazy gaps):
|
||||
// - Issuer connector latency (DigiCert / ACME / Vault / etc. round-trips
|
||||
// to upstream CAs). Those are async; pin via the per-issuer-type
|
||||
// metrics instead (audit fix #4: certctl_issuance_duration_seconds).
|
||||
// - The full ACME enrollment flow (newOrder → challenge → finalize).
|
||||
// The audit prompt mentioned ACME-via-pebble; deferred to a follow-up
|
||||
// because driving multi-RTT ACME flows at sustained 100/s requires
|
||||
// pebble tuning + k6 crypto helpers that don't exist out of the box.
|
||||
// metrics instead (issuer audit fix #4:
|
||||
// certctl_issuance_duration_seconds).
|
||||
// - Full ACME enrollment (newOrder → challenge → finalize).
|
||||
// - The full agent-driven deploy hot path (POST cert with target
|
||||
// binding → poll deployments endpoint → verify served cert matches).
|
||||
// v1 of the connector-tier harness measures handshake throughput
|
||||
// against the sidecars directly. v2 is a follow-up that needs the
|
||||
// agent registration + target-binding API surface plumbed end-to-end
|
||||
// in the loadtest stack — a meaningful addition but not a blocker
|
||||
// for the Bundle 10 procurement question.
|
||||
// - Kubernetes connector. kind-in-docker requires `privileged: true`
|
||||
// and is operationally fragile in CI. Deferred until Bundle 2 (real
|
||||
// k8s.io/client-go) lands.
|
||||
//
|
||||
// Threshold contract: any future change that pushes p99 above 5s for the
|
||||
// issuance-acceptance scenario or 2s for the read scenario, OR any change
|
||||
// that pushes the error rate above 1%, fails the test. CI gates the run
|
||||
// behind workflow_dispatch + cron (NOT per-push — load tests are too slow
|
||||
// to gate per-PR signal).
|
||||
// Threshold contract:
|
||||
// - API tier: p99 < 5s for issuance, < 2s for list, error rate < 1%.
|
||||
// - Connector tier: p99 < 3s per handshake target (5s for f5-mock,
|
||||
// iControl REST is slower), error rate < 1%.
|
||||
// Any change pushing past these fails the workflow.
|
||||
//
|
||||
// Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
// CI gates the run behind workflow_dispatch + cron (NOT per-push — load
|
||||
// tests are too slow to gate per-PR signal).
|
||||
//
|
||||
// Audit references:
|
||||
// - API tier: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
// - Connector tier: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
|
||||
|
||||
import http from 'k6/http';
|
||||
import { check } from 'k6';
|
||||
@@ -43,6 +73,18 @@ import { textSummary } from 'https://jslib.k6.io/k6-summary/0.0.2/index.js';
|
||||
const BASE = __ENV.CERTCTL_BASE || 'https://localhost:8443';
|
||||
const TOKEN = __ENV.CERTCTL_TOKEN || 'load-test-token';
|
||||
|
||||
// Bundle 10: per-target sidecar URLs. Defaults match the docker-compose
|
||||
// stack's internal DNS; operators running k6 manually against a different
|
||||
// stack override these via env. Empty default → the corresponding
|
||||
// scenario is skipped (the scenarioFor* helper guards).
|
||||
const NGINX_TARGET_URL = __ENV.NGINX_TARGET_URL || 'https://nginx-target:443';
|
||||
const APACHE_TARGET_URL = __ENV.APACHE_TARGET_URL || 'https://apache-target:443';
|
||||
const HAPROXY_TARGET_URL = __ENV.HAPROXY_TARGET_URL || 'https://haproxy-target:443';
|
||||
// f5-mock's iControl REST `/healthz` endpoint is the CI-friendly
|
||||
// per-handshake probe — hits the path the F5 connector itself uses for
|
||||
// reachability. Real F5 BIG-IP also exposes /healthz under /mgmt/.
|
||||
const F5_TARGET_URL = __ENV.F5_TARGET_URL || 'https://f5-mock-target:443';
|
||||
|
||||
// Demo seed (CERTCTL_DEMO_SEED=true) creates these rows; CreateCertificate
|
||||
// requires all four FKs to exist. Pre-baked here so the script has zero
|
||||
// dependency on test fixtures beyond the seed.
|
||||
@@ -82,18 +124,75 @@ export const options = {
|
||||
startTime: '5s',
|
||||
tags: { scenario: 'list_certificates' },
|
||||
},
|
||||
|
||||
// Bundle 10: connector-tier per-target-type handshake scenarios.
|
||||
// 100 conns/min sustained for 5 minutes against each sidecar.
|
||||
// The handshake measurement captures TCP connect + TLS
|
||||
// handshake + tiny HTTP GET (`/` for nginx/apache/haproxy,
|
||||
// `/healthz` for f5-mock); k6's http_req_duration aggregates
|
||||
// all three so the numbers are end-to-end "respond to the
|
||||
// operator's connection" latency, not isolated TLS-handshake
|
||||
// microseconds.
|
||||
nginx_handshake: {
|
||||
executor: 'constant-arrival-rate',
|
||||
rate: 100,
|
||||
timeUnit: '1m',
|
||||
duration: '5m',
|
||||
preAllocatedVUs: 10,
|
||||
maxVUs: 50,
|
||||
exec: 'nginxHandshake',
|
||||
startTime: '10s',
|
||||
tags: { scenario: 'nginx_handshake', target_type: 'nginx' },
|
||||
},
|
||||
apache_handshake: {
|
||||
executor: 'constant-arrival-rate',
|
||||
rate: 100,
|
||||
timeUnit: '1m',
|
||||
duration: '5m',
|
||||
preAllocatedVUs: 10,
|
||||
maxVUs: 50,
|
||||
exec: 'apacheHandshake',
|
||||
startTime: '10s',
|
||||
tags: { scenario: 'apache_handshake', target_type: 'apache' },
|
||||
},
|
||||
haproxy_handshake: {
|
||||
executor: 'constant-arrival-rate',
|
||||
rate: 100,
|
||||
timeUnit: '1m',
|
||||
duration: '5m',
|
||||
preAllocatedVUs: 10,
|
||||
maxVUs: 50,
|
||||
exec: 'haproxyHandshake',
|
||||
startTime: '10s',
|
||||
tags: { scenario: 'haproxy_handshake', target_type: 'haproxy' },
|
||||
},
|
||||
f5_handshake: {
|
||||
executor: 'constant-arrival-rate',
|
||||
rate: 100,
|
||||
timeUnit: '1m',
|
||||
duration: '5m',
|
||||
preAllocatedVUs: 10,
|
||||
maxVUs: 50,
|
||||
exec: 'f5Handshake',
|
||||
startTime: '10s',
|
||||
tags: { scenario: 'f5_handshake', target_type: 'f5' },
|
||||
},
|
||||
},
|
||||
thresholds: {
|
||||
// Hard floor: 99% of issuance-acceptance requests complete in
|
||||
// under 5 seconds. Pre-fix this was unsubstantiated; post-fix
|
||||
// this is the regression guard. The number isn't aspirational —
|
||||
// it's the worst-acceptable user-facing API SLO from the
|
||||
// operator perspective.
|
||||
// API tier — issuer audit fix #8.
|
||||
'http_req_duration{scenario:issuance_acceptance}': ['p(99)<5000', 'p(95)<2000'],
|
||||
'http_req_duration{scenario:list_certificates}': ['p(99)<2000', 'p(95)<800'],
|
||||
// < 1% error rate. The k6 default is "any 4xx/5xx counts as
|
||||
// failed"; legitimate 201/200 responses don't count. Auth
|
||||
// failures, validation failures, server errors all do.
|
||||
|
||||
// Bundle 10 connector tier. nginx/apache/haproxy are pure TLS
|
||||
// termination → tight thresholds. f5-mock includes a tiny Go
|
||||
// server response on top of the handshake → slightly looser.
|
||||
'http_req_duration{target_type:nginx}': ['p(99)<3000', 'p(95)<1000'],
|
||||
'http_req_duration{target_type:apache}': ['p(99)<3000', 'p(95)<1000'],
|
||||
'http_req_duration{target_type:haproxy}': ['p(99)<3000', 'p(95)<1000'],
|
||||
'http_req_duration{target_type:f5}': ['p(99)<5000', 'p(95)<1500'],
|
||||
|
||||
// < 1% error rate across ALL scenarios. Auth failures, validation
|
||||
// failures, server errors, connection refused all count.
|
||||
'http_req_failed': ['rate<0.01'],
|
||||
},
|
||||
// Smaller summary payload — strip per-VU metrics we don't read.
|
||||
@@ -148,16 +247,109 @@ export function listCertificates() {
|
||||
});
|
||||
}
|
||||
|
||||
// --- Bundle 10: connector-tier handshake scenarios ---
|
||||
//
|
||||
// Each per-target function does a single HTTPS GET against its target
|
||||
// sidecar. k6's http_req_duration metric captures TCP connect + TLS
|
||||
// handshake + HTTP request/response — that's the end-to-end "connection
|
||||
// readiness" latency a deploy connector cares about. The target_type
|
||||
// tag groups results in summary.json's connector_tier section.
|
||||
//
|
||||
// Status-check threshold: any 4xx/5xx counts as failed (k6 default
|
||||
// behaviour for http_req_failed). f5-mock's /healthz returns 200; the
|
||||
// other three nginx/apache/haproxy default vhost configs all return
|
||||
// 200 on `/`.
|
||||
//
|
||||
// Bundle 10 of the 2026-05-02 deployment-target audit.
|
||||
|
||||
export function nginxHandshake() {
|
||||
const res = http.get(`${NGINX_TARGET_URL}/`, {
|
||||
tags: { scenario: 'nginx_handshake', target_type: 'nginx' },
|
||||
});
|
||||
check(res, {
|
||||
'nginx 2xx': (r) => r.status >= 200 && r.status < 300,
|
||||
});
|
||||
}
|
||||
|
||||
export function apacheHandshake() {
|
||||
const res = http.get(`${APACHE_TARGET_URL}/`, {
|
||||
tags: { scenario: 'apache_handshake', target_type: 'apache' },
|
||||
});
|
||||
check(res, {
|
||||
'apache 2xx': (r) => r.status >= 200 && r.status < 300,
|
||||
});
|
||||
}
|
||||
|
||||
export function haproxyHandshake() {
|
||||
const res = http.get(`${HAPROXY_TARGET_URL}/`, {
|
||||
tags: { scenario: 'haproxy_handshake', target_type: 'haproxy' },
|
||||
});
|
||||
check(res, {
|
||||
'haproxy 2xx': (r) => r.status >= 200 && r.status < 300,
|
||||
});
|
||||
}
|
||||
|
||||
export function f5Handshake() {
|
||||
const res = http.get(`${F5_TARGET_URL}/healthz`, {
|
||||
tags: { scenario: 'f5_handshake', target_type: 'f5' },
|
||||
});
|
||||
check(res, {
|
||||
'f5 2xx': (r) => r.status >= 200 && r.status < 300,
|
||||
});
|
||||
}
|
||||
|
||||
// handleSummary writes the full results to /results/summary.{json,txt}
|
||||
// so the operator can commit the baseline numbers into README.md after
|
||||
// each run and so CI can ingest the JSON for diffing.
|
||||
//
|
||||
// Bundle 10 added a `connector_tier` aggregation alongside the API tier
|
||||
// — same source data (data.metrics), grouped by target_type tag for
|
||||
// per-connector-type p50/p95/p99/error breakdowns. Operators tracking a
|
||||
// connector regression diff `connector_tier.<type>` between runs.
|
||||
//
|
||||
// stdout reproduces the textSummary so the docker compose log shows
|
||||
// the same numbers an operator running it manually would see.
|
||||
export function handleSummary(data) {
|
||||
const enriched = enrichWithConnectorTier(data);
|
||||
return {
|
||||
'/results/summary.json': JSON.stringify(data, null, 2),
|
||||
'/results/summary.json': JSON.stringify(enriched, null, 2),
|
||||
'/results/summary.txt': textSummary(data, { indent: ' ', enableColors: false }),
|
||||
stdout: textSummary(data, { indent: ' ', enableColors: true }),
|
||||
};
|
||||
}
|
||||
|
||||
// enrichWithConnectorTier appends a connector_tier object to the k6
|
||||
// summary data. Each target_type entry contains:
|
||||
// { p50, p95, p99, max, avg, error_rate, iterations }
|
||||
// Missing tags (e.g. an operator runs only the API tier scenarios) are
|
||||
// reported as null so callers can detect them without a separate scan.
|
||||
function enrichWithConnectorTier(data) {
|
||||
const targetTypes = ['nginx', 'apache', 'haproxy', 'f5'];
|
||||
const connectorTier = {};
|
||||
for (const t of targetTypes) {
|
||||
const reqDurKey = `http_req_duration{target_type:${t}}`;
|
||||
const reqFailKey = `http_req_failed{target_type:${t}}`;
|
||||
const iterKey = `iterations{target_type:${t}}`;
|
||||
|
||||
const dur = data.metrics[reqDurKey];
|
||||
const fail = data.metrics[reqFailKey];
|
||||
const iters = data.metrics[iterKey];
|
||||
|
||||
if (!dur || !dur.values) {
|
||||
connectorTier[t] = null;
|
||||
continue;
|
||||
}
|
||||
connectorTier[t] = {
|
||||
p50: dur.values['med'] ?? null,
|
||||
p95: dur.values['p(95)'] ?? null,
|
||||
p99: dur.values['p(99)'] ?? null,
|
||||
max: dur.values['max'] ?? null,
|
||||
avg: dur.values['avg'] ?? null,
|
||||
error_rate: fail && fail.values ? (fail.values['rate'] ?? null) : null,
|
||||
iterations: iters && iters.values ? (iters.values['count'] ?? null) : null,
|
||||
};
|
||||
}
|
||||
// Shallow-merge so existing summary fields (data.metrics, data.options,
|
||||
// etc.) stay untouched. The connector_tier key is additive.
|
||||
return Object.assign({}, data, { connector_tier: connectorTier });
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user