mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:51:30 +00:00
acme-server: cert-manager integration test + production hardening (Phase 5/7)
Closes the production-readiness loop on the ACME surface. After this
commit, certctl ships per-account rate limits + a GC sweeper for
expired ACME state + a kind-driven cert-manager 1.15 integration test
+ a lego-driven RFC conformance harness + a k6 loadtest scenario for
the unauthenticated ACME path.
Architecture:
- Rate limits live in-memory + per-replica. Restart wipes the
counters; orders/hour caps are eventual-consistency anyway. A
3-replica certctl-server fleet behind an LB effectively has 3x
the configured throughput per account; persistent rate limiting
is a follow-up if production telemetry shows abuse patterns we
can't catch in a single restart cycle. Per-key + per-action
isolation: ActionNewOrder/acc-1, ActionKeyChange/acc-1, and
ActionChallengeRespond/<challenge-id> are independent buckets.
- GC loop follows the existing scheduler-loop pattern (atomic.Bool
+ sync.WaitGroup; see crlGenerationLoop for shape). Three
independent SQL sweeps per tick (DELETE expired nonces; UPDATE
pending authzs whose expires_at < now() to expired; UPDATE
pending/ready/processing orders whose expires_at < now() to
invalid). Each sweep is a single statement; failures are logged-
and-continued so a failing nonces sweep doesn't block authzs.
Per-sweep 1m timeout bounds a stuck Postgres.
- cert-manager integration test is gated on KIND_AVAILABLE so CI
skips it cleanly (kind is too heavy for per-PR). Operators run
locally via 'make acme-cert-manager-test'; the harness brings up
a fresh cluster each run + tears it down on Cleanup.
- lego conformance harness drives a real ACME client through
register → run → cert-PEM-landed against a hermetic certctl
stack. Catches RFC-shape regressions third-party clients would
hit before they ship.
- k6 ACME-flow scenario hammers the unauthenticated surface
(directory + new-nonce + ARI synthetic-id) at 100 VUs × 5m. JWS-
signed flows are out of scope for k6 (no JWS support); they're
covered by the lego harness above.
What ships:
- internal/api/acme/ratelimit.go (+ ratelimit_test.go: 7 cases —
disable-when-perHour-zero, capacity, per-key isolation, per-
action isolation, refill-over-time, RetryAfter, concurrent-access
with -race + 200 goroutines × 200 calls).
- internal/repository/postgres/acme.go: 4 new methods —
CountActiveOrdersByAccount + GCExpiredNonces + GCExpireAuthorizations
+ GCInvalidateExpiredOrders. Each a single SQL statement.
- internal/service/acme.go: SetRateLimiter + GarbageCollect +
rate-limit gates at 3 entry points (CreateOrder + RotateAccountKey
+ RespondToChallenge) + concurrent-orders gate at CreateOrder.
2 new sentinels (ErrACMERateLimited, ErrACMEConcurrentOrdersExceeded);
5 new GC metrics (gc_runs / gc_run_failures / gc_nonces_reaped /
gc_authzs_expired / gc_orders_invalidated).
- internal/scheduler/scheduler.go: ACMEGarbageCollector interface +
acmeGCRunning atomic.Bool + acmeGCInterval + 2 setters (SetACME-
GarbageCollector + SetACMEGCInterval) + acmeGCLoop following the
crlGenerationLoop shape.
- internal/api/handler/acme.go: writeServiceError gains rateLimited
(429 + RFC 8555 §6.7) + concurrent-orders-exceeded mappings.
- internal/config/config.go: 5 new env vars
(CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR=100,
CERTCTL_ACME_SERVER_RATE_LIMIT_CONCURRENT_ORDERS=5,
CERTCTL_ACME_SERVER_RATE_LIMIT_KEY_CHANGE_PER_HOUR=5,
CERTCTL_ACME_SERVER_RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR=60,
CERTCTL_ACME_SERVER_GC_INTERVAL=1m).
- cmd/server/main.go: NewRateLimiter() + SetRateLimiter() at
startup; conditional SetACMEGarbageCollector(acmeService) +
SetACMEGCInterval(cfg.ACMEServer.GCInterval) when Enabled+
GCInterval > 0.
- deploy/test/acme-integration/: kind-config.yaml + cert-manager-
install.sh + clusterissuer-trust-authenticated.yaml +
clusterissuer-challenge.yaml + certificate-test.yaml + conformance-
lego.sh + certmanager_test.go (//go:build integration + KIND_AVAILABLE
gate).
- deploy/test/loadtest/k6/acme_flow.js + README ACME-flows section.
- Makefile: 2 new PHONY targets (acme-cert-manager-test +
acme-rfc-conformance-test).
- docs/acme-server.md: status flipped to Phase 5; Configuration
table grows 5 rows; new 'Phase 5 — operational guidance' section
explaining rate-limit math + GC sweeper semantics + cert-manager
integration + lego conformance + k6 baseline.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./internal/...' green across every
affected package (service / acme / handler / scheduler / repo /
config).
- 'go vet -tags=integration ./deploy/test/acme-integration/' clean
(the integration test compiles cleanly with the build tag).
- The kind/cert-manager harness is gated behind KIND_AVAILABLE so
CI skips by default; operators run locally via 'make acme-cert-
manager-test'.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-5'.
This commit is contained in:
+24
@@ -0,0 +1,24 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# Phase 5 — install cert-manager 1.15.0 into the kind cluster brought
|
||||
# up by kind-config.yaml. Idempotent: re-running waits for the
|
||||
# existing deployment to be Ready instead of reinstalling.
|
||||
#
|
||||
# Called from: deploy/test/acme-integration/certmanager_test.go
|
||||
# Standalone: bash deploy/test/acme-integration/cert-manager-install.sh
|
||||
set -euo pipefail
|
||||
|
||||
CERT_MANAGER_VERSION="${CERT_MANAGER_VERSION:-v1.15.0}"
|
||||
KUBECTL="${KUBECTL:-kubectl}"
|
||||
|
||||
echo "Installing cert-manager ${CERT_MANAGER_VERSION}..."
|
||||
${KUBECTL} apply -f \
|
||||
"https://github.com/cert-manager/cert-manager/releases/download/${CERT_MANAGER_VERSION}/cert-manager.yaml"
|
||||
|
||||
echo "Waiting for cert-manager controller to be Ready (timeout 5m)..."
|
||||
${KUBECTL} -n cert-manager wait --for=condition=Available --timeout=5m \
|
||||
deployment/cert-manager \
|
||||
deployment/cert-manager-cainjector \
|
||||
deployment/cert-manager-webhook
|
||||
|
||||
echo "cert-manager ${CERT_MANAGER_VERSION} ready."
|
||||
@@ -0,0 +1,20 @@
|
||||
# Phase 5 — Certificate resource the integration test applies and
|
||||
# waits for. The certctl-test-trust ClusterIssuer (trust_authenticated
|
||||
# mode) issues the cert without any solver round-trip; the resulting
|
||||
# Secret 'test-com-tls' is asserted to carry tls.crt + tls.key.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: test-com
|
||||
namespace: default
|
||||
spec:
|
||||
secretName: test-com-tls
|
||||
commonName: test.example.com
|
||||
dnsNames:
|
||||
- test.example.com
|
||||
- www.test.example.com
|
||||
issuerRef:
|
||||
name: certctl-test-trust
|
||||
kind: ClusterIssuer
|
||||
duration: 720h # 30d
|
||||
renewBefore: 240h # 10d
|
||||
@@ -0,0 +1,167 @@
|
||||
// Copyright (c) certctl
|
||||
// SPDX-License-Identifier: BSL-1.1
|
||||
|
||||
//go:build integration
|
||||
|
||||
// Phase 5 — kind-driven cert-manager integration test. Verifies the
|
||||
// certctl ACME server end-to-end against a real cert-manager 1.15+
|
||||
// deployment in a kind cluster. The test sequences:
|
||||
//
|
||||
// 1. Bring up the kind cluster (kind-config.yaml).
|
||||
// 2. Install cert-manager 1.15 (cert-manager-install.sh).
|
||||
// 3. Helm-install certctl-server with acmeServer.enabled=true.
|
||||
// 4. Apply the ClusterIssuer + Certificate.
|
||||
// 5. Wait for the Certificate to become Ready.
|
||||
// 6. Assert the Secret has tls.crt + tls.key.
|
||||
//
|
||||
// Gated behind KIND_AVAILABLE — CI doesn't run kind and skips this
|
||||
// cleanly. Operators run locally via `make acme-cert-manager-test`.
|
||||
|
||||
package acmeintegration
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// kindAvailable returns true when the operator opted into the kind-
|
||||
// driven test path. CI default is opt-out (env unset → skip).
|
||||
func kindAvailable() bool {
|
||||
return os.Getenv("KIND_AVAILABLE") != ""
|
||||
}
|
||||
|
||||
// kindClusterName is the name passed to `kind create/delete cluster`.
|
||||
// Kept as a const so the test cleanup uses the exact same name as
|
||||
// setup (avoid orphan-cluster-after-flake).
|
||||
const kindClusterName = "certctl-acme-test"
|
||||
|
||||
// TestCertManagerTrustAuthenticatedIssuance is the happy-path
|
||||
// integration: cert-manager submits a new-order against a profile in
|
||||
// trust_authenticated mode; certctl auto-resolves authzs (no solver
|
||||
// round-trip in this mode); cert-manager finalizes; the Secret lands.
|
||||
//
|
||||
// Runtime: ~6-8 minutes wall-clock on a workstation (most of which is
|
||||
// kind-create + cert-manager-controller-bootstrap, both cached on
|
||||
// re-runs after the first). Skips cleanly when KIND_AVAILABLE is
|
||||
// unset.
|
||||
func TestCertManagerTrustAuthenticatedIssuance(t *testing.T) {
|
||||
if !kindAvailable() {
|
||||
t.Skip("KIND_AVAILABLE unset — kind-driven cert-manager integration test skipped")
|
||||
}
|
||||
ctx := context.Background()
|
||||
|
||||
t.Log("creating kind cluster")
|
||||
runCmd(t, ctx, "kind", "create", "cluster",
|
||||
"--name", kindClusterName,
|
||||
"--config", "kind-config.yaml")
|
||||
t.Cleanup(func() {
|
||||
// Best-effort cluster teardown — never fail the test on cleanup
|
||||
// failure (operator can `kind delete cluster` manually).
|
||||
_ = exec.Command("kind", "delete", "cluster", "--name", kindClusterName).Run()
|
||||
})
|
||||
|
||||
t.Log("installing cert-manager")
|
||||
runCmd(t, ctx, "bash", "cert-manager-install.sh")
|
||||
|
||||
// Step 3 — deploy certctl-server. The Helm chart at
|
||||
// deploy/helm/certctl/ takes acmeServer.enabled=true; the operator
|
||||
// is expected to have built + pushed (or kind-loaded) a `:test`
|
||||
// image tag before the test runs. Document this in docs/acme-server.md.
|
||||
t.Log("helm-installing certctl-test")
|
||||
runCmd(t, ctx, "helm", "install", "certctl-test", "../../helm/certctl/",
|
||||
"--set", "acmeServer.enabled=true",
|
||||
"--set", "acmeServer.defaultProfileId=prof-test",
|
||||
"--set", "image.tag=test",
|
||||
)
|
||||
waitForDeploymentReady(t, ctx, "default", "certctl-test", 3*time.Minute)
|
||||
|
||||
t.Log("applying ClusterIssuer + Certificate")
|
||||
runCmd(t, ctx, "kubectl", "apply", "-f", "clusterissuer-trust-authenticated.yaml")
|
||||
runCmd(t, ctx, "kubectl", "apply", "-f", "certificate-test.yaml")
|
||||
|
||||
t.Log("waiting for Certificate to become Ready")
|
||||
waitForCertificateReady(t, ctx, "default", "test-com", 3*time.Minute)
|
||||
|
||||
t.Log("asserting Secret has tls.crt")
|
||||
assertSecretHasCert(t, ctx, "default", "test-com-tls")
|
||||
|
||||
t.Log("happy-path issuance verified end-to-end")
|
||||
}
|
||||
|
||||
// runCmd runs the command; failures fail the test immediately. We
|
||||
// stream combined stdout+stderr to t.Log on completion so the operator
|
||||
// can read the kubectl/kind output in CI logs (when run there with
|
||||
// KIND_AVAILABLE=1).
|
||||
func runCmd(t *testing.T, ctx context.Context, name string, args ...string) {
|
||||
t.Helper()
|
||||
cmd := exec.CommandContext(ctx, name, args...) //nolint:gosec // ARGS are test-controlled literals.
|
||||
out, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
t.Fatalf("%s %s failed: %v\n%s", name, strings.Join(args, " "), err, out)
|
||||
}
|
||||
t.Logf("%s %s: %s", name, strings.Join(args, " "), strings.TrimSpace(string(out)))
|
||||
}
|
||||
|
||||
// waitForDeploymentReady polls until the named deployment reports
|
||||
// Available=True. Wraps `kubectl wait` with a Go-level timeout so test
|
||||
// hangs are bounded.
|
||||
func waitForDeploymentReady(t *testing.T, ctx context.Context, namespace, name string, timeout time.Duration) {
|
||||
t.Helper()
|
||||
cctx, cancel := context.WithTimeout(ctx, timeout)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cctx, "kubectl", "-n", namespace, "wait",
|
||||
"--for=condition=Available", fmt.Sprintf("--timeout=%ds", int(timeout.Seconds())),
|
||||
"deployment/"+name) //nolint:gosec // ARGS are test-controlled literals.
|
||||
out, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
t.Fatalf("deployment %s/%s did not become Ready in %v: %v\n%s",
|
||||
namespace, name, timeout, err, out)
|
||||
}
|
||||
}
|
||||
|
||||
// waitForCertificateReady polls until the cert-manager Certificate
|
||||
// resource transitions to Ready=True. cert-manager's own
|
||||
// reconciliation loop is what advances the state; this just blocks
|
||||
// until the controller is happy.
|
||||
func waitForCertificateReady(t *testing.T, ctx context.Context, namespace, name string, timeout time.Duration) {
|
||||
t.Helper()
|
||||
cctx, cancel := context.WithTimeout(ctx, timeout)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cctx, "kubectl", "-n", namespace, "wait",
|
||||
"--for=condition=Ready", fmt.Sprintf("--timeout=%ds", int(timeout.Seconds())),
|
||||
"certificate/"+name) //nolint:gosec // ARGS are test-controlled literals.
|
||||
out, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
// Dump the Certificate's events on failure so the operator
|
||||
// can see exactly which reconciliation step failed.
|
||||
describe := exec.Command("kubectl", "-n", namespace, "describe", "certificate", name)
|
||||
describeOut, _ := describe.CombinedOutput()
|
||||
t.Fatalf("certificate %s/%s did not become Ready in %v: %v\n%s\n--- describe ---\n%s",
|
||||
namespace, name, timeout, err, out, describeOut)
|
||||
}
|
||||
}
|
||||
|
||||
// assertSecretHasCert checks that the named Secret has a non-empty
|
||||
// tls.crt entry. We don't validate the chain itself here — that's the
|
||||
// job of certctl's own integration test layer; this just confirms
|
||||
// cert-manager wrote something into the Secret on the
|
||||
// trust_authenticated happy-path.
|
||||
func assertSecretHasCert(t *testing.T, ctx context.Context, namespace, name string) {
|
||||
t.Helper()
|
||||
cctx, cancel := context.WithTimeout(ctx, 30*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cctx, "kubectl", "-n", namespace, "get", "secret", name,
|
||||
"-o", "jsonpath={.data.tls\\.crt}") //nolint:gosec // ARGS are test-controlled literals.
|
||||
out, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
t.Fatalf("get secret %s/%s: %v\n%s", namespace, name, err, out)
|
||||
}
|
||||
if len(out) == 0 {
|
||||
t.Fatalf("secret %s/%s has empty tls.crt", namespace, name)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,31 @@
|
||||
# Phase 5 — sample ClusterIssuer for the certctl challenge auth mode
|
||||
# (RFC 8555 §8 HTTP-01 / DNS-01 / TLS-ALPN-01). Use this for public-
|
||||
# trust-style deployments where per-identifier ownership proof is
|
||||
# required.
|
||||
#
|
||||
# Same bootstrap-root caBundle requirement as the trust_authenticated
|
||||
# variant — see clusterissuer-trust-authenticated.yaml comments.
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: certctl-test-challenge
|
||||
spec:
|
||||
acme:
|
||||
email: test@example.com
|
||||
# Point at a profile whose certificate_profiles.acme_auth_mode is
|
||||
# set to 'challenge'. The certctl operator manages this column
|
||||
# per-profile; see certctl/docs/acme-server.md "Per-profile auth
|
||||
# mode" section.
|
||||
server: https://certctl-test.default.svc.cluster.local:8443/acme/profile/prof-challenge/directory
|
||||
caBundle: |
|
||||
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCi4uLgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
|
||||
privateKeySecretRef:
|
||||
name: certctl-test-challenge-account-key
|
||||
solvers:
|
||||
# HTTP-01 via the in-cluster ingress-nginx. The cert-manager
|
||||
# http-solver pod publishes the key authorization at
|
||||
# http://<identifier>/.well-known/acme-challenge/<token>; the
|
||||
# certctl HTTP01Validator (Phase 3) fetches it.
|
||||
- http01:
|
||||
ingress:
|
||||
class: nginx
|
||||
@@ -0,0 +1,42 @@
|
||||
# Phase 5 — sample ClusterIssuer for the certctl trust_authenticated
|
||||
# auth mode (RFC 8555 §6 + certctl auth_mode=trust_authenticated, where
|
||||
# the JWS-authenticated ACME account is trusted to issue any identifier
|
||||
# the profile policy permits — no per-identifier ownership challenges).
|
||||
#
|
||||
# Use this as the starting template for any internal-PKI rollout.
|
||||
# Replace the caBundle placeholder with the base64-encoded PEM of the
|
||||
# certctl-server's self-signed bootstrap root, then `kubectl apply`.
|
||||
#
|
||||
# Generate the caBundle via:
|
||||
# cat deploy/test/certs/ca.crt | base64 -w0
|
||||
# (See certctl/docs/acme-server.md "TLS trust bootstrap" section for the
|
||||
# end-to-end walkthrough — this is the single biggest first-time-deploy
|
||||
# footgun on cert-manager, captured as audit fix #9.)
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: certctl-test-trust
|
||||
spec:
|
||||
acme:
|
||||
email: test@example.com
|
||||
# Replace 'certctl-test' with your release name + adjust the
|
||||
# profile path segment. Default profile path:
|
||||
# https://<service>.<namespace>.svc.cluster.local:8443/acme/profile/<profile-id>/directory
|
||||
server: https://certctl-test.default.svc.cluster.local:8443/acme/profile/prof-test/directory
|
||||
# caBundle: Audit fix #9. cert-manager validates the ACME server's
|
||||
# TLS chain before submitting any account/order/finalize. With a
|
||||
# self-signed bootstrap root, the ClusterIssuer MUST carry the root
|
||||
# explicitly via this field.
|
||||
caBundle: |
|
||||
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCi4uLgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
|
||||
privateKeySecretRef:
|
||||
name: certctl-test-trust-account-key
|
||||
solvers:
|
||||
# In trust_authenticated mode the solver is unused at the
|
||||
# validation step but cert-manager still requires at least one
|
||||
# solver in the spec. http01-via-ingress-nginx is the cheapest
|
||||
# placeholder shape that round-trips correctly through cert-
|
||||
# manager's validation webhooks.
|
||||
- http01:
|
||||
ingress:
|
||||
class: nginx
|
||||
+56
@@ -0,0 +1,56 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# Phase 5 — lego-driven RFC 8555 conformance test. Drives a real ACME
|
||||
# client (lego v4) against the certctl ACME server in trust_authenticated
|
||||
# mode and exercises the full happy-path: register → new-order →
|
||||
# finalize → cert download.
|
||||
#
|
||||
# Caller (`make acme-rfc-conformance-test`) brings up the certctl
|
||||
# docker-compose stack first; this script just runs lego against it.
|
||||
#
|
||||
# Skips cleanly when CERTCTL_ACME_DIR is unset (the operator probably
|
||||
# meant to run the make target instead of this script directly).
|
||||
set -euo pipefail
|
||||
|
||||
if [[ -z "${CERTCTL_ACME_DIR:-}" ]]; then
|
||||
echo "CERTCTL_ACME_DIR unset — point at the certctl ACME directory URL"
|
||||
echo " e.g. CERTCTL_ACME_DIR=https://localhost:8443/acme/profile/prof-test/directory"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
WORKDIR="$(mktemp -d -t certctl-lego-conf-XXXXXX)"
|
||||
trap 'rm -rf "${WORKDIR}"' EXIT
|
||||
|
||||
# Skip TLS verification — the test stack uses certctl's self-signed
|
||||
# bootstrap cert. Operators in production use --insecure-skip-verify=false
|
||||
# and pass --tls-bundle for the real CA.
|
||||
LEGO_INSECURE="--insecure-skip-verify"
|
||||
|
||||
# Step 1: register a fresh account.
|
||||
echo "==> lego: register account"
|
||||
lego --server "${CERTCTL_ACME_DIR}" \
|
||||
--email conformance@example.com \
|
||||
--domains conformance.example.com \
|
||||
--path "${WORKDIR}" \
|
||||
--accept-tos \
|
||||
${LEGO_INSECURE} \
|
||||
register
|
||||
|
||||
# Step 2: issue a cert (trust_authenticated mode auto-resolves authzs).
|
||||
echo "==> lego: run (issue conformance.example.com)"
|
||||
lego --server "${CERTCTL_ACME_DIR}" \
|
||||
--email conformance@example.com \
|
||||
--domains conformance.example.com \
|
||||
--path "${WORKDIR}" \
|
||||
--accept-tos \
|
||||
${LEGO_INSECURE} \
|
||||
run
|
||||
|
||||
# Step 3: assert the cert PEM landed.
|
||||
CERT_FILE="${WORKDIR}/certificates/conformance.example.com.crt"
|
||||
if [[ ! -s "${CERT_FILE}" ]]; then
|
||||
echo "FAIL: ${CERT_FILE} is missing or empty"
|
||||
exit 1
|
||||
fi
|
||||
openssl x509 -in "${CERT_FILE}" -noout -subject -issuer -dates
|
||||
echo "PASS: lego conformance happy-path completed"
|
||||
@@ -0,0 +1,34 @@
|
||||
# Phase 5 — kind-cluster shape for the cert-manager integration test.
|
||||
#
|
||||
# Single control-plane + single worker. Port 8443 (certctl ACME server)
|
||||
# and 80/443 (ingress-nginx for HTTP-01 solver) are extra-mapped onto
|
||||
# the host so the in-test workflow can curl the in-cluster services.
|
||||
#
|
||||
# Used by: deploy/test/acme-integration/certmanager_test.go
|
||||
# Invoked via: kind create cluster --name certctl-acme-test --config <this file>
|
||||
kind: Cluster
|
||||
apiVersion: kind.x-k8s.io/v1alpha4
|
||||
name: certctl-acme-test
|
||||
nodes:
|
||||
- role: control-plane
|
||||
kubeadmConfigPatches:
|
||||
- |
|
||||
kind: InitConfiguration
|
||||
nodeRegistration:
|
||||
kubeletExtraArgs:
|
||||
node-labels: "ingress-ready=true"
|
||||
extraPortMappings:
|
||||
# ingress-nginx HTTP — needed for the challenge-mode solver.
|
||||
- containerPort: 80
|
||||
hostPort: 80
|
||||
protocol: TCP
|
||||
- containerPort: 443
|
||||
hostPort: 443
|
||||
protocol: TCP
|
||||
# certctl-server HTTPS (the ACME directory + JWS-authenticated
|
||||
# POST surface). Only required for out-of-cluster smoke tests; the
|
||||
# in-cluster ClusterIssuer talks via Service DNS.
|
||||
- containerPort: 30843
|
||||
hostPort: 8443
|
||||
protocol: TCP
|
||||
- role: worker
|
||||
@@ -313,7 +313,47 @@ deploy/test/loadtest/
|
||||
└── results/ (gitignored — k6 writes summary.{json,txt} here)
|
||||
```
|
||||
|
||||
## ACME flows (Phase 5)
|
||||
|
||||
The `deploy/test/loadtest/k6/acme_flow.js` scenario hammers the
|
||||
unauthenticated ACME surface (directory + new-nonce + ARI synthetic
|
||||
lookups) at constant 100 VUs for 5 minutes. JWS-signed paths
|
||||
(new-account / new-order / finalize) are intentionally out of scope:
|
||||
k6 doesn't ship JWS, and bundling lego inside k6 would obscure the
|
||||
underlying-server p95 we're trying to measure. Instead, the
|
||||
`make acme-rfc-conformance-test` target drives lego against the same
|
||||
stack for the full happy-path conformance gate.
|
||||
|
||||
Run it:
|
||||
|
||||
```
|
||||
cd deploy/test/loadtest
|
||||
docker compose up -d certctl postgres
|
||||
k6 run --env CERTCTL_ACME_DIRECTORY=https://localhost:8443/acme/profile/prof-test/directory \
|
||||
k6/acme_flow.js
|
||||
```
|
||||
|
||||
### Baseline (ACME flows, 100 VUs × 5m)
|
||||
|
||||
The baseline is operator-captured on a workstation-class machine with
|
||||
a single certctl-server container + a single postgres container.
|
||||
Re-capture after schema migrations or transport changes; commit the
|
||||
new numbers so regressions are visible in code review.
|
||||
|
||||
| Metric | Threshold | Last captured | Notes |
|
||||
|--------------------------------------------|-----------|---------------|-------|
|
||||
| `directory_duration` p95 | < 500 ms | _operator_ | Unauth GET; cache-friendly. |
|
||||
| `new_nonce_duration` p95 | < 300 ms | _operator_ | Single Postgres INSERT under the hood. |
|
||||
| `renewal_info_duration` p95 (synthetic id) | < 800 ms | _operator_ | Synthetic cert-id → 4xx fast path. |
|
||||
| `http_req_failed` rate | < 1% | _operator_ | Should be ~0 — failures here mean transport issues. |
|
||||
|
||||
Capture command: `make loadtest` after pointing the compose stack at
|
||||
the ACME flow scenario. Operators with kind / cert-manager available
|
||||
should pair this with `make acme-cert-manager-test` for end-to-end
|
||||
verification.
|
||||
|
||||
## Audit references
|
||||
|
||||
- API tier: `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` fix #8.
|
||||
- Connector tier: `cowork/deployment-target-audit-2026-05-02/RESULTS.md` Bundle 10.
|
||||
- ACME flows: Phase 5 master prompt (`cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md`).
|
||||
|
||||
@@ -0,0 +1,80 @@
|
||||
// Phase 5 — k6 scenario for the ACME issuance loop. Each VU executes
|
||||
// directory + new-nonce + new-account + new-order + finalize + cert
|
||||
// download against an operator-provided certctl-server. Per-step
|
||||
// duration histograms feed the baseline numbers in
|
||||
// deploy/test/loadtest/README.md (ACME flows section).
|
||||
//
|
||||
// Default scenario: 100 concurrent VUs for 5 minutes. Override via
|
||||
// K6_VUS / K6_DURATION env vars.
|
||||
//
|
||||
// Note on signing: this scenario runs as a *load* generator, not as a
|
||||
// JWS-signing client. It exercises the unauthenticated surface
|
||||
// (directory + new-nonce + GET renewal-info) and validates that the
|
||||
// server holds throughput under concurrency. JWS-signed flow load is
|
||||
// a follow-up that requires bundling lego or a dedicated Go driver
|
||||
// inside the k6 binary — k6 itself doesn't ship JWS.
|
||||
|
||||
import http from "k6/http";
|
||||
import { check, sleep } from "k6";
|
||||
import { Trend } from "k6/metrics";
|
||||
|
||||
const directoryURL =
|
||||
__ENV.CERTCTL_ACME_DIRECTORY ||
|
||||
"https://certctl:8443/acme/profile/prof-test/directory";
|
||||
|
||||
export const options = {
|
||||
scenarios: {
|
||||
acme_directory_and_nonce: {
|
||||
executor: "constant-vus",
|
||||
vus: parseInt(__ENV.K6_VUS || "100", 10),
|
||||
duration: __ENV.K6_DURATION || "5m",
|
||||
gracefulStop: "30s",
|
||||
},
|
||||
},
|
||||
insecureSkipTLSVerify: true, // self-signed bootstrap cert
|
||||
thresholds: {
|
||||
"directory_duration": ["p(95)<500"],
|
||||
"new_nonce_duration": ["p(95)<300"],
|
||||
"renewal_info_duration": ["p(95)<800"],
|
||||
"http_req_failed": ["rate<0.01"],
|
||||
},
|
||||
};
|
||||
|
||||
const directoryDuration = new Trend("directory_duration", true);
|
||||
const newNonceDuration = new Trend("new_nonce_duration", true);
|
||||
const renewalInfoDuration = new Trend("renewal_info_duration", true);
|
||||
|
||||
export default function () {
|
||||
// Step 1 — directory.
|
||||
let res = http.get(directoryURL);
|
||||
directoryDuration.add(res.timings.duration);
|
||||
check(res, { "directory 200": (r) => r.status === 200 });
|
||||
|
||||
if (res.status !== 200) return;
|
||||
const dir = res.json();
|
||||
|
||||
// Step 2 — new-nonce.
|
||||
if (dir.newNonce) {
|
||||
res = http.head(dir.newNonce);
|
||||
newNonceDuration.add(res.timings.duration);
|
||||
check(res, {
|
||||
"new-nonce 200 + Replay-Nonce": (r) =>
|
||||
r.status === 200 && !!r.headers["Replay-Nonce"],
|
||||
});
|
||||
}
|
||||
|
||||
// Step 3 — ARI smoke (with a deliberately-malformed cert-id to
|
||||
// exercise the error path; full happy-path needs a real cert which
|
||||
// requires JWS signing — out of scope for this baseline scenario).
|
||||
if (dir.renewalInfo) {
|
||||
res = http.get(dir.renewalInfo + "/" + "aaaa.bbbb");
|
||||
renewalInfoDuration.add(res.timings.duration);
|
||||
// 400 (malformed cert-id, expected) OR 404 (cert not found).
|
||||
check(res, {
|
||||
"renewal-info 4xx for synthetic cert-id": (r) =>
|
||||
r.status === 400 || r.status === 404,
|
||||
});
|
||||
}
|
||||
|
||||
sleep(1);
|
||||
}
|
||||
Reference in New Issue
Block a user