Files
certctl/deploy/test/loadtest/README.md
T
shankar0123 bee47f0318 acme-server: cert-manager integration test + production hardening (Phase 5/7)
Closes the production-readiness loop on the ACME surface. After this
commit, certctl ships per-account rate limits + a GC sweeper for
expired ACME state + a kind-driven cert-manager 1.15 integration test
+ a lego-driven RFC conformance harness + a k6 loadtest scenario for
the unauthenticated ACME path.

Architecture:
  - Rate limits live in-memory + per-replica. Restart wipes the
    counters; orders/hour caps are eventual-consistency anyway. A
    3-replica certctl-server fleet behind an LB effectively has 3x
    the configured throughput per account; persistent rate limiting
    is a follow-up if production telemetry shows abuse patterns we
    can't catch in a single restart cycle. Per-key + per-action
    isolation: ActionNewOrder/acc-1, ActionKeyChange/acc-1, and
    ActionChallengeRespond/<challenge-id> are independent buckets.
  - GC loop follows the existing scheduler-loop pattern (atomic.Bool
    + sync.WaitGroup; see crlGenerationLoop for shape). Three
    independent SQL sweeps per tick (DELETE expired nonces; UPDATE
    pending authzs whose expires_at < now() to expired; UPDATE
    pending/ready/processing orders whose expires_at < now() to
    invalid). Each sweep is a single statement; failures are logged-
    and-continued so a failing nonces sweep doesn't block authzs.
    Per-sweep 1m timeout bounds a stuck Postgres.
  - cert-manager integration test is gated on KIND_AVAILABLE so CI
    skips it cleanly (kind is too heavy for per-PR). Operators run
    locally via 'make acme-cert-manager-test'; the harness brings up
    a fresh cluster each run + tears it down on Cleanup.
  - lego conformance harness drives a real ACME client through
    register → run → cert-PEM-landed against a hermetic certctl
    stack. Catches RFC-shape regressions third-party clients would
    hit before they ship.
  - k6 ACME-flow scenario hammers the unauthenticated surface
    (directory + new-nonce + ARI synthetic-id) at 100 VUs × 5m. JWS-
    signed flows are out of scope for k6 (no JWS support); they're
    covered by the lego harness above.

What ships:
  - internal/api/acme/ratelimit.go (+ ratelimit_test.go: 7 cases —
    disable-when-perHour-zero, capacity, per-key isolation, per-
    action isolation, refill-over-time, RetryAfter, concurrent-access
    with -race + 200 goroutines × 200 calls).
  - internal/repository/postgres/acme.go: 4 new methods —
    CountActiveOrdersByAccount + GCExpiredNonces + GCExpireAuthorizations
    + GCInvalidateExpiredOrders. Each a single SQL statement.
  - internal/service/acme.go: SetRateLimiter + GarbageCollect +
    rate-limit gates at 3 entry points (CreateOrder + RotateAccountKey
    + RespondToChallenge) + concurrent-orders gate at CreateOrder.
    2 new sentinels (ErrACMERateLimited, ErrACMEConcurrentOrdersExceeded);
    5 new GC metrics (gc_runs / gc_run_failures / gc_nonces_reaped /
    gc_authzs_expired / gc_orders_invalidated).
  - internal/scheduler/scheduler.go: ACMEGarbageCollector interface +
    acmeGCRunning atomic.Bool + acmeGCInterval + 2 setters (SetACME-
    GarbageCollector + SetACMEGCInterval) + acmeGCLoop following the
    crlGenerationLoop shape.
  - internal/api/handler/acme.go: writeServiceError gains rateLimited
    (429 + RFC 8555 §6.7) + concurrent-orders-exceeded mappings.
  - internal/config/config.go: 5 new env vars
    (CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR=100,
    CERTCTL_ACME_SERVER_RATE_LIMIT_CONCURRENT_ORDERS=5,
    CERTCTL_ACME_SERVER_RATE_LIMIT_KEY_CHANGE_PER_HOUR=5,
    CERTCTL_ACME_SERVER_RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR=60,
    CERTCTL_ACME_SERVER_GC_INTERVAL=1m).
  - cmd/server/main.go: NewRateLimiter() + SetRateLimiter() at
    startup; conditional SetACMEGarbageCollector(acmeService) +
    SetACMEGCInterval(cfg.ACMEServer.GCInterval) when Enabled+
    GCInterval > 0.
  - deploy/test/acme-integration/: kind-config.yaml + cert-manager-
    install.sh + clusterissuer-trust-authenticated.yaml +
    clusterissuer-challenge.yaml + certificate-test.yaml + conformance-
    lego.sh + certmanager_test.go (//go:build integration + KIND_AVAILABLE
    gate).
  - deploy/test/loadtest/k6/acme_flow.js + README ACME-flows section.
  - Makefile: 2 new PHONY targets (acme-cert-manager-test +
    acme-rfc-conformance-test).
  - docs/acme-server.md: status flipped to Phase 5; Configuration
    table grows 5 rows; new 'Phase 5 — operational guidance' section
    explaining rate-limit math + GC sweeper semantics + cert-manager
    integration + lego conformance + k6 baseline.

Tests:
  - 'go vet ./...' clean across the repo.
  - 'go test -short -count=1 ./internal/...' green across every
    affected package (service / acme / handler / scheduler / repo /
    config).
  - 'go vet -tags=integration ./deploy/test/acme-integration/' clean
    (the integration test compiles cleanly with the build tag).
  - The kind/cert-manager harness is gated behind KIND_AVAILABLE so
    CI skips by default; operators run locally via 'make acme-cert-
    manager-test'.

Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-5'.
2026-05-03 19:42:03 +00:00

16 KiB
Raw Blame History

certctl Load-Test Harness

Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (cowork/issuer-coverage-audit-2026-05-01/RESULTS.md). Pre-fix, certctl had zero benchmarks or load tests for any API path; an acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day rotation" had nothing to point at. This harness is the substantiation.

What it measures

A k6 driver hits two scenarios in parallel for 5 minutes at a fixed 50 req/s:

  1. POST /api/v1/certificates — the issuance-acceptance hot path. Exercises auth, JSON decode, validation, service.CreateCertificate, and the managed_certificates insert. This is the operator-facing request-acceptance throughput an automation client (Terraform, Crossplane, GitOps controller) would generate.
  2. GET /api/v1/certificates?per_page=50 — the most-trafficked read endpoint. Exercises pagination + filtering on the cert list query.

Latency is reported as avg / min / med / p95 / p99 / max. The error floor is < 1% (any 4xx/5xx counts as failed).

What it explicitly does NOT measure

  • Issuer connector latency. Connector calls (DigiCert, ACME, Vault, AWS ACM PCA, etc.) happen asynchronously via the renewal scheduler. Their latency is pinned by the certctl_issuance_duration_seconds{issuer_type=...} Prometheus histogram (audit fix #4). Driving them through k6 would load-test someone else's API, which is wrong.
  • Full ACME enrollment flow. The audit prompt mentioned ACME-via- pebble; sustained 100/s through a multi-RTT order/challenge/finalize flow requires pebble tuning + crypto helpers k6 doesn't ship out of the box. Deferred to a follow-up.
  • Bulk-revoke / bulk-renew. Those are admin endpoints with their own throughput characteristics and warrant a separate scenario.
  • Scheduler concurrency under bulk renewal. That's audit fix #9's scope; the harness here measures the API tier, not the scheduler.

Threshold contract

Any future change that breaches one of these fails the test:

Scenario p95 p99 Error rate
issuance_acceptance < 2 s < 5 s n/a
list_certificates < 800 ms < 2 s n/a
All requests n/a n/a < 1%

These are the regression guards, not the SLO. The SLO is whatever the operator chooses based on the baseline below.

How to run

From the repo root:

make loadtest

This:

  1. Builds the certctl image from the repo root Dockerfile.
  2. Spins up postgres, the tls-init bootstrap, certctl-server (with CERTCTL_DEMO_SEED=true so the FK rows the script needs exist), and the k6 driver.
  3. Runs the k6 script for ~5 minutes 5 seconds (5s stagger between scenarios + 5m duration).
  4. Prints the summary text to stdout.
  5. Exits non-zero if any threshold was breached.

The full machine-readable summary lands at deploy/test/loadtest/results/summary.json (gitignored). The human-readable summary lands at results/summary.txt.

To run against a server already booted on the host (skip the compose spin-up):

docker run --rm \
  -e CERTCTL_BASE=https://localhost:8443 \
  -e CERTCTL_TOKEN=load-test-token \
  -e K6_INSECURE_SKIP_TLS_VERIFY=true \
  -v "$(pwd)/deploy/test/loadtest/k6.js:/scripts/k6.js:ro" \
  -v "$(pwd)/deploy/test/loadtest/results:/results" \
  --network host \
  grafana/k6:0.54.0 run /scripts/k6.js

Current baseline

The first operator run captures real numbers and commits them into this section. Pre-baseline this section reads "TBD — operator captures on first make loadtest run." The numbers below are the agreed minimum-acceptable thresholds, not the captured baseline; once captured, the baseline goes here as a separate row so future regressions have a diff target.

Scenario p50 p95 p99 Error rate
issuance_acceptance (threshold) < 2 s < 5 s < 1%
issuance_acceptance (baseline)1 2.12 ms 6.19 ms 8.58 ms 0.00%
list_certificates (threshold) < 800 ms < 2 s < 1%
list_certificates (baseline)1 2.12 ms 6.19 ms 8.58 ms 0.00%

Methodology of the sandbox-placeholder capture above:

  • Hardware: Linux/aarch64 unprivileged sandbox (uid 1019, no root, ~1.2 GiB free disk). NOT canonical hardware.
  • Postgres: 14.22 (Ubuntu, native binaries, unix-socket dir /tmp/pg-sock), unix sockets only, port 55432.
  • certctl: built from HEAD via go build -o bin/certctl-server ./cmd/server.
  • Concurrency: 50 req/s sustained per scenario, both scenarios in parallel (= 100 req/s combined).
  • Duration: 10 seconds per scenario (NOT 5 minutes — sandbox bash-call budget is bounded; canonical-hardware run uses 5 minutes).
  • TLS: ECDSA-P256 self-signed localhost cert at /tmp/certctl-tls/.
  • Auth: api-key, single Bearer token (CERTCTL_AUTH_SECRET=load-test-token).
  • Rate limiting: disabled (CERTCTL_RATE_LIMIT_ENABLED=false) — without this, the 100 req/s combined load trips the default token-bucket and drives error rate to ~40%, masking real latency.
  • Encryption: CERTCTL_CONFIG_ENCRYPTION_KEY set (32+ bytes).
  • Captured: 2026-05-02. Total: 1002 requests, 100.15 req/s sustained, 0 failures, 100% checks passed. Raw summary.json is not committed (gitignored per the existing results/ convention).

Methodology pinned at canonical baseline capture (replace placeholder):

  • Hardware: GitHub-hosted ubuntu-latest runner (4 vCPU / 16 GiB / SSD). Run via gh workflow run loadtest.yml; raw summary.json is available for 90 days as a workflow artifact.
  • Postgres: 16-alpine in compose, default config.
  • certctl: image built from this repo at the commit referenced below.
  • Concurrency: 50 req/s sustained per scenario (100 req/s total).
  • Duration: 5 minutes per scenario, 5s stagger.
  • Auth: api-key (Bearer token, single key).
  • Encryption: CERTCTL_CONFIG_ENCRYPTION_KEY set (32+ bytes).

To recapture the baseline after a tuning commit:

make loadtest
# Inspect deploy/test/loadtest/results/summary.txt for the new numbers.
# Update the table above + the methodology line, commit alongside the
# tuning commit.

Interpreting a regression

If a future PR's make loadtest run pushes p99 above the threshold, the make target exits non-zero and CI fails. The summary.txt prints which threshold breached. Triage:

  1. Look at the per-scenario http_req_duration p95 + p99 in summary.json. If only one scenario regressed, the change is localized to that endpoint's hot path.
  2. Look at the iteration_duration per scenario — if total iteration time grew but http_req_duration is flat, the latency is in k6 client setup (rare; suggests something changed in the script).
  3. Compare against the committed baseline. If p99 was 800 ms at baseline and is now 1.5 s but still under the 5 s threshold, the change is below the regression guard but still meaningful — flag in the PR description.

The harness deliberately does NOT auto-tune. Tuning is informed by the data; tuning commits land separately, each with their own captured baseline update.

CI cadence

Defined in .github/workflows/loadtest.yml:

  • workflow_dispatch — manual trigger from the Actions tab. Used before tagging a release or after a meaningful tuning commit.
  • Weekly cron — Mondays at 06:00 UTC. Catches gradual regressions from cumulative changes that no single PR triggered.

The workflow does not run per-push. Load tests are minutes long and would not provide useful per-PR signal; per-push pressure goes through make verify (which is fast) and the deploy-vendor-e2e job.

Connector-tier baseline (Bundle 10 of the 2026-05-02 deployment-target audit)

Bundle 10 extended the harness to cover per-target-type handshake throughput in addition to the API-tier issuance/list throughput documented above. The docker-compose stack now boots four target sidecars (nginx, apache, haproxy, f5-mock) each serving a starter cert from a shared target-tls-init container, and k6 runs four additional scenarios — nginx_handshake, apache_handshake, haproxy_handshake, f5_handshake — at sustained 100 conns/min for 5 minutes against each.

What the connector tier measures

End-to-end TCP connect + TLS handshake + tiny HTTP request/response latency per target type, tagged via the k6 target_type label so summary.json's connector_tier section breaks the numbers out per sidecar:

{
  "connector_tier": {
    "nginx":   { "p50": ..., "p95": ..., "p99": ..., "error_rate": ..., "iterations": ... },
    "apache":  { ... },
    "haproxy": { ... },
    "f5":      { ... }
  }
}

This validates the target sidecar daemons are operational under sustained connection load. Procurement asks "can certctl's nginx target handle 5,000 endpoints at 47-day rotation?" — the connector code's correctness is pinned by per-connector unit tests; the underlying daemon's connection-rate ceiling is what these scenarios pin.

What the connector tier explicitly does NOT measure (v1)

  • The full agent-driven deploy hot path. v1 measures handshake throughput against the sidecars directly. v2 of the harness is a follow-up that POSTs cert requests bound to per-target-type targets, polls the deployments endpoint until the agent reports complete, and measures the full POST → poll → cert-served loop. v2 needs the agent registration + target-binding API surface plumbed end-to-end in the loadtest stack — meaningful work, but not a blocker for the connection- rate procurement question.
  • Kubernetes connector. kind-in-docker requires privileged: true and is operationally fragile in CI. Deferred until Bundle 2 (real k8s.io/client-go) lands and a CI-friendly envtest harness is wired.
  • Real F5 BIG-IP. The harness uses the in-tree f5-mock-icontrol Go server (already used by the deploy-vendor-e2e CI job). Real F5 appliance benchmarking is out of scope; operators with a real F5 vagrant box per docs/connector-f5.md can substitute it manually.

Threshold contract

Defined in k6.js's thresholds block. Any change pushing past these fails the test:

Target type p95 p99 Error rate
nginx < 1 s < 3 s < 1% (global)
apache < 1 s < 3 s < 1% (global)
haproxy < 1 s < 3 s < 1% (global)
f5 < 1.5 s < 5 s < 1% (global)

f5-mock's threshold is looser because the iControl REST handler does slightly more work per request (login+upload+install dance the F5 connector itself drives — not exercised here, but the daemon's request handler is heavier).

Connector-tier captured baseline

Target type p50 p95 p99 Error rate Iterations
nginx (threshold) < 1 s < 3 s < 1% n/a
nginx (baseline) TBD TBD TBD TBD TBD
apache (threshold) < 1 s < 3 s < 1% n/a
apache (baseline) TBD TBD TBD TBD TBD
haproxy (threshold) < 1 s < 3 s < 1% n/a
haproxy (baseline) TBD TBD TBD TBD TBD
f5 (threshold) < 1.5 s < 5 s < 1% n/a
f5 (baseline) TBD TBD TBD TBD TBD

The em-dash placeholders are deliberate: do not commit numeric values without running the loadtest on canonical hardware first. Numbers from a developer laptop are misleading. The first gh workflow run loadtest.yml on a clean GitHub runner captures the baseline; commit the captured numbers into the table above as a follow-up commit alongside the methodology line.

Methodology pinned at baseline capture (canonical hardware):

  • Hardware: GitHub-hosted ubuntu-latest runners (currently 4 vCPU / 16 GiB / SSD-backed). Operator captures from gh workflow run loadtest.yml to keep the hardware constant across runs.
  • Sidecar images: nginx:1.27-alpine, httpd:2.4-alpine, haproxy:2.9-alpine, in-tree f5-mock-icontrol (built from deploy/test/f5-mock-icontrol/).
  • Concurrency: 100 conns/min sustained per target type (400 conns/min total across the four target scenarios + 100 req/s on the API tier).
  • Duration: 5 minutes per scenario, 10s stagger between API tier and connector tier so warmup overlap doesn't skew the first 30 seconds.
  • TLS: starter cert from target-tls-init (ECDSA P-256, multi-SAN). The loadtest scenarios connect with K6_INSECURE_SKIP_TLS_VERIFY=true.

To recapture the connector-tier baseline after a tuning commit affecting target sidecars or the connector code:

make loadtest
# Inspect deploy/test/loadtest/results/summary.json for the
# connector_tier object and update the table above.

Files in this directory

deploy/test/loadtest/
├── README.md         (this file)
├── docker-compose.yml
├── k6.js             (the load script)
├── certs/            (gitignored — tls-init writes here)
├── fixtures/         (Bundle 10: target sidecar configs + shared starter cert)
│   ├── nginx.conf
│   ├── httpd.conf
│   ├── haproxy.cfg
│   └── target-certs/ (gitignored — target-tls-init writes here)
└── results/          (gitignored — k6 writes summary.{json,txt} here)

ACME flows (Phase 5)

The deploy/test/loadtest/k6/acme_flow.js scenario hammers the unauthenticated ACME surface (directory + new-nonce + ARI synthetic lookups) at constant 100 VUs for 5 minutes. JWS-signed paths (new-account / new-order / finalize) are intentionally out of scope: k6 doesn't ship JWS, and bundling lego inside k6 would obscure the underlying-server p95 we're trying to measure. Instead, the make acme-rfc-conformance-test target drives lego against the same stack for the full happy-path conformance gate.

Run it:

cd deploy/test/loadtest
docker compose up -d certctl postgres
k6 run --env CERTCTL_ACME_DIRECTORY=https://localhost:8443/acme/profile/prof-test/directory \
       k6/acme_flow.js

Baseline (ACME flows, 100 VUs × 5m)

The baseline is operator-captured on a workstation-class machine with a single certctl-server container + a single postgres container. Re-capture after schema migrations or transport changes; commit the new numbers so regressions are visible in code review.

Metric Threshold Last captured Notes
directory_duration p95 < 500 ms operator Unauth GET; cache-friendly.
new_nonce_duration p95 < 300 ms operator Single Postgres INSERT under the hood.
renewal_info_duration p95 (synthetic id) < 800 ms operator Synthetic cert-id → 4xx fast path.
http_req_failed rate < 1% operator Should be ~0 — failures here mean transport issues.

Capture command: make loadtest after pointing the compose stack at the ACME flow scenario. Operators with kind / cert-manager available should pair this with make acme-cert-manager-test for end-to-end verification.

Audit references

  • API tier: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
  • Connector tier: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
  • ACME flows: Phase 5 master prompt (cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md).

  1. Sandbox-aggregate placeholder — captured at HEAD on a Linux/aarch64 unprivileged sandbox (no Docker, no GitHub-hosted runner). Both rows show the same aggregate combined-load numbers because the sandbox run did not break out per-scenario tags in summary.json. Treat these as a sanity floor (proof the API tier handles 100 req/s combined with zero errors and sub-10ms p99), not as the per-scenario baselines the threshold contract is written against. Replace via gh workflow run loadtest.yml on the canonical ubuntu-latest runner — that produces per-scenario tagged metrics in summary.json. ↩︎