mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:01:31 +00:00
b452013dd9
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/
and the section-by-section plan in testing-guide-tumor.md.
testing-guide.md was 30% of all docs/ content (8268 lines) but was
integration test code written in markdown, not operator documentation.
The audit's tumor analysis disposed of every Part:
- ~65% DELETE (test cases that already exist in code)
- ~22% MOVE to inline test code
- ~8% KEEP-COMPRESSED into focused operator-runbook docs
- Title + contents + release sign-off ~5% KEEP
This commit ships the KEEP-COMPRESSED dispersal:
docs/contributor/qa-prerequisites.md (NEW, ~120 lines):
From testing-guide.md "Prerequisites" section. Stack boot procedure,
demo data baseline, reference IDs operators reuse across QA docs.
docs/contributor/gui-qa-checklist.md (NEW, ~105 lines):
From testing-guide.md "Part 35: GUI Testing". Manual GUI verification
pass for release sign-off. 25-row table covering every dashboard page.
docs/contributor/release-sign-off.md (NEW, ~130 lines):
From testing-guide.md "Release Sign-Off" section (originally 1009
lines of per-test detail tables). Compressed to a release-day
checklist organized by gate category: code state, automated gates,
manual QA passes, release artefact verification, branch protection,
post-release.
docs/operator/performance-baselines.md (NEW, ~100 lines):
From testing-guide.md "Part 39: Performance Spot Checks". Four
operator-runnable benchmarks (API request handling, inventory list
pagination, scheduler tick, bulk revoke) with baseline numbers and
when-to-re-baseline guidance.
docs/operator/helm-deployment.md (NEW, ~120 lines):
From testing-guide.md "Part 52: Helm Chart Deployment". Operator
runbook for the bundled deploy/helm/certctl/ chart: prereqs,
install, four cert-source patterns, verify, upgrade, troubleshooting.
docs/reference/cli.md (NEW, ~120 lines):
From testing-guide.md "Part 28: CLI Tool". certctl-cli command
reference with command-group breakdown, common workflows
(list/filter, renew, revoke, bulk import, EST enrollment, status),
output formats, CI/CD integration patterns.
docs/README.md navigation index updated to include the 6 new docs:
Reference section gains: cli.md, release-verification.md (was added
in Phase 13)
Operator section gains: helm-deployment.md, performance-baselines.md
Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md,
release-sign-off.md
docs/testing-guide.md deleted. Git history preserves the 8268 lines —
if any specific test case is found missing from inline test code or
the destination docs during future work, lift from `git show
HEAD~1:docs/testing-guide.md`.
Net: docs/ total line count drops by ~7700 lines (28%), from 26,369
to 18,742. testing-guide.md was the single largest doc; pruning it is
the single biggest content-edit win of the entire restructure.
Phase 5 is the last major content phase. Remaining: Phase 4 follow-on
(per-connector page extractions from reference/connectors/index.md),
Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).
107 lines
4.1 KiB
Markdown
107 lines
4.1 KiB
Markdown
# Performance Baselines
|
|
|
|
> Last reviewed: 2026-05-05
|
|
|
|
Operator-runnable benchmarks for spot-checking certctl performance against published baselines. Useful as a regression detector after upgrades or infra changes.
|
|
|
|
## Why these specific spots?
|
|
|
|
certctl's hot paths are dominated by three workloads:
|
|
|
|
1. **API request handling** — auth, rate-limit decision, route dispatch, DB read
|
|
2. **Renewal scheduler** — periodic scan + dispatch
|
|
3. **Certificate inventory queries** — large list returns with sparse fields
|
|
|
|
The baselines below cover those three.
|
|
|
|
## Baseline #1: API request handling (single endpoint)
|
|
|
|
Hit a hot read endpoint with a tight loop and compare against the baseline.
|
|
|
|
```bash
|
|
SERVER=https://localhost:8443
|
|
CACERT="--cacert ./deploy/test/certs/ca.crt"
|
|
AUTH="Authorization: Bearer change-me-in-production"
|
|
|
|
# Warm the connection pool (5 requests, discard timing)
|
|
for i in $(seq 1 5); do
|
|
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
|
done
|
|
|
|
# Measured run: 100 requests, capture mean latency
|
|
time (for i in $(seq 1 100); do
|
|
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
|
done)
|
|
```
|
|
|
|
**Baseline (M3 MacBook Pro, Docker Desktop):** real time under 5 seconds for 100 sequential requests = mean ~50ms p50.
|
|
|
|
If you're seeing > 100ms mean, something is wrong: PostgreSQL connection pool exhaustion, agent flooding the work-poll endpoint, or rate-limiter mis-tuned.
|
|
|
|
## Baseline #2: Inventory list with cursor pagination
|
|
|
|
```bash
|
|
# Cursor-paginated full inventory walk
|
|
NEXT=""
|
|
PAGES=0
|
|
START=$(date +%s)
|
|
while true; do
|
|
RESP=$(curl -s $CACERT -H "$AUTH" "$SERVER/api/v1/certificates?limit=100&cursor=$NEXT")
|
|
NEXT=$(echo "$RESP" | jq -r '.next_cursor // empty')
|
|
PAGES=$((PAGES + 1))
|
|
[ -z "$NEXT" ] && break
|
|
done
|
|
END=$(date +%s)
|
|
echo "Walked $PAGES pages in $((END - START))s"
|
|
```
|
|
|
|
**Baseline:** for the demo dataset (15 certificates, 1 page), under 1 second total. For a 1000-cert inventory (10 pages of 100), under 3 seconds total = ~300ms per page.
|
|
|
|
If you're seeing > 1s per page on a 1000-cert inventory, the cursor index on `managed_certificates(created_at, id)` is missing or the query plan went wrong.
|
|
|
|
## Baseline #3: Scheduler tick (renewal scan)
|
|
|
|
The renewal scheduler runs every hour by default. Force a tick and observe the time-to-completion in the logs:
|
|
|
|
```bash
|
|
# Trigger an immediate renewal scan via the admin endpoint
|
|
curl -s $CACERT -H "$AUTH" -X POST $SERVER/api/v1/admin/scheduler/run-now/renewal | jq .
|
|
|
|
# Tail the log and look for the matching `renewal scan complete` line
|
|
docker compose logs -f certctl-server | grep 'renewal'
|
|
```
|
|
|
|
**Baseline (15-cert demo dataset):** "renewal scan complete" within 100ms of the trigger.
|
|
|
|
For a 1000-cert inventory: under 5 seconds. The dominant cost is the per-cert profile + policy + alert-channel resolve plus the threshold-comparison math. If you're seeing > 10 seconds, profile resolution is likely doing N+1 queries.
|
|
|
|
## Baseline #4: Bulk revoke
|
|
|
|
```bash
|
|
# Bulk-revoke all certs from a (test) issuer
|
|
TIME=$(date +%s)
|
|
curl -s $CACERT -H "$AUTH" -H "$CT" -X POST $SERVER/api/v1/certificates/bulk-revoke \
|
|
-d '{"filter":{"issuer_id":"iss-test"},"reason":"superseded"}' | jq .
|
|
echo "Bulk revoke: $(($(date +%s) - TIME))s"
|
|
```
|
|
|
|
**Baseline:** linear in cert count. For 100 certs from one issuer: under 5 seconds. For 1000 certs: under 30 seconds (dominated by per-cert audit row + per-cert CRL refresh).
|
|
|
|
## When to re-baseline
|
|
|
|
After any of:
|
|
|
|
- Postgres major-version upgrade
|
|
- Go major-version upgrade
|
|
- Significant migration (add a column to `managed_certificates`, add an index)
|
|
- Connection pool config change
|
|
- Changing the renewal scheduler interval
|
|
|
|
Capture timing in `cowork/loadtest-baselines/<date>.md` so future regressions surface against a real baseline rather than the operator's gut feeling.
|
|
|
|
## Related docs
|
|
|
|
- [`docs/contributor/ci-pipeline.md`](../contributor/ci-pipeline.md) — CI guard for performance regression
|
|
- [`docs/operator/security.md`](security.md) — rate limit tuning
|
|
- [`docs/reference/architecture.md`](../reference/architecture.md) — request path through handler → service → repository
|