ci: post-Phase-2-docs-overhaul cleanup of stale guards + missing config doc

CI run on the ecb8896 push surfaced two real failures rooted in the
2026-05-04 docs overhaul:

  1. G-3 env-docs-drift caught two phantom CERTCTL_* env vars I'd
     introduced in the Phase 4 follow-on connector pages
     (CERTCTL_CA_CERT_PATH_NEW in adcs.md was a placeholder I made
     up; CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS in ejbca.md does not
     exist in source). Both removed.

  2. QA-doc Part-count drift guard tried to grep
     docs/qa-test-guide.md and docs/testing-guide.md, both of which
     were renamed/deleted in Phase 2/Phase 5. The Part-count drift
     class died with testing-guide.md (Phase 5 prune dispersed its
     content); the seed-count drift class is still live but pointed
     at the wrong path.

Fixes:

- Removed the QA-doc Part-count drift guard from ci.yml (premise
  dead) plus its standalone scripts/qa-doc-part-count.sh peer.
- Retargeted the QA-doc seed-count drift guard from
  docs/qa-test-guide.md → docs/contributor/qa-test-suite.md (the
  Phase 2 target). Updated both ci.yml inline copy and
  scripts/qa-doc-seed-count.sh.
- Updated Makefile qa-stats: target to drop the testing-guide.md
  Parts metric (file is gone).
- Updated Makefile verify-docs: target to drop the part-count step.

G-3 was also failing in the second direction (env vars defined in
config.go but never documented anywhere). 16 vars surfaced —
features.md (deleted Phase 6) and testing-guide.md (deleted Phase 5)
had been their canonical home. Created
docs/reference/configuration.md as the new home: a compact
operator-facing env-var reference covering scheduler intervals, job
lifecycle, rate limiting, audit, deploy verify, database,
agent-side, and SCEP profile binding. Added to docs/README.md
Reference table.

Doc-side updates to qa-test-suite.md to reframe its references to
the deleted testing-guide.md (it's now self-contained: the
Part-by-Part Coverage Map IS the canonical Part inventory).

Cosmetic comment-only updates in ci.yml + scripts/ci-guards/*.sh +
scripts/dev-setup.sh to point at the new audience-organized doc
paths (docs/operator/security.md, docs/operator/tls.md,
docs/reference/architecture.md, etc.) instead of the pre-Phase-2
flat layout.

Verified: all 24 ci-guards/*.sh pass locally; qa-doc-seed-count.sh
clean. Net diff: 178 additions / 112 deletions across 13 files.
One file deleted (qa-doc-part-count.sh) and one file added
(docs/reference/configuration.md).
This commit is contained in:
shankar0123
2026-05-05 04:56:26 +00:00
parent ecb8896b1c
commit 3275f9f1e0
14 changed files with 178 additions and 112 deletions
+22 -32
View File
@@ -79,7 +79,7 @@ jobs:
# does call, this step fails the build until either upstream
# ships a fix OR we cut the dep. Deferred-call advisories that
# legitimately can't be remediated yet should be added to the
# NIST SSDF deviation log in docs/security.md, not silenced here.
# NIST SSDF deviation log in docs/operator/security.md, not silenced here.
run: govulncheck ./...
- name: Install staticcheck (Bundle-7 / D-001)
@@ -135,48 +135,38 @@ jobs:
GITHUB_REPOSITORY: ${{ github.repository }}
run: bash scripts/coverage-pr-comment.sh
# Bundle P / Strengthening #6 — QA-doc drift guards. Forces every PR
# that adds a Part to docs/testing-guide.md OR a seed row to
# migrations/seed_demo.sql to keep docs/qa-test-guide.md in sync. This
# eliminates the doc-drift class structurally — the symptom Bundle I
# had to clean up by hand becomes a CI-time error going forward.
- name: QA-doc Part-count drift guard
run: |
set -e
DOC_PARTS=$(grep -oE '49 of [0-9]+ Parts' docs/qa-test-guide.md | grep -oE '[0-9]+' | tail -1)
GUIDE_PARTS=$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md)
if [ -z "$DOC_PARTS" ]; then
echo "::error::Could not extract Part count from docs/qa-test-guide.md headline."
echo " Expected pattern: '49 of <N> Parts'"
exit 1
fi
if [ "$DOC_PARTS" != "$GUIDE_PARTS" ]; then
echo "::error::DRIFT — qa-test-guide.md headline claims $DOC_PARTS Parts; testing-guide.md has $GUIDE_PARTS Parts."
echo " Update docs/qa-test-guide.md to match. Bundle I patched this once;"
echo " Bundle P added this guard so the drift cannot recur silently."
exit 1
fi
echo "QA-doc Part-count drift guard: clean ($DOC_PARTS == $GUIDE_PARTS)."
# Bundle P / Strengthening #6 — QA-doc seed-count drift guard. Forces
# every PR that adds a seed row to migrations/seed_demo.sql to keep
# docs/contributor/qa-test-suite.md::Seed Data Reference in sync.
#
# Phase 5 of the 2026-05-04 docs overhaul (commit c64777f) deleted
# docs/testing-guide.md (its content dispersed across the new
# audience-organized doc tree); the previous QA-doc Part-count drift
# guard tracked Part counts between testing-guide.md and the old
# qa-test-guide.md headline. With testing-guide.md gone, that guard's
# premise is dead and it has been removed. The seed-count drift class
# is still live: qa-test-suite.md::Seed Data Reference enumerates
# certs/issuers and seed_demo.sql is the source of truth.
- name: QA-doc seed-count drift guard
run: |
set -e
DOC=docs/contributor/qa-test-suite.md
# Seed-cert count: agnostic to documented header format. The current
# documented count lives in `### Certificates (32 total in ...` —
# extract the first integer in that header.
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
# Authoritative count: unique mc-* IDs in seed_demo.sql.
SEED_CERTS=$(grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
if [ -z "$DOC_CERTS" ]; then
echo "::warning::Could not extract documented cert count from docs/qa-test-guide.md."
echo "::warning::Could not extract documented cert count from $DOC."
echo " Skipping cert-count drift check (header format may have changed)."
elif [ "$DOC_CERTS" != "$SEED_CERTS" ]; then
echo "::error::DRIFT — qa-test-guide.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
echo " Update docs/qa-test-guide.md::Seed Data Reference to match."
echo "::error::DRIFT — $DOC says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
echo " Update $DOC::Seed Data Reference to match."
exit 1
fi
# Issuers: seed-table count vs doc claim.
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
# Authoritative: unique iss-* IDs (close enough proxy; the issuers
# table count IS the unique-ID count for this prefix).
SEED_ISS=$(grep -oE 'iss-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
@@ -186,7 +176,7 @@ jobs:
# Allow up to 5pp slack — iss-* IDs appear in audit_events and
# other reference tables that aren't issuer-table rows. Drift
# only flags when the spread grows large.
echo "::error::DRIFT — qa-test-guide.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
echo "::error::DRIFT — $DOC says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
exit 1
fi
echo "QA-doc seed-count drift guard: clean."
@@ -209,7 +199,7 @@ jobs:
# 167 legitimate tests for no observable behavior change. The
# Test<Func>_<Scenario>_<ExpectedResult> form remains documented as
# the recommended pattern for parameterized scenarios in
# docs/qa-test-guide.md, but is not gated.
# docs/contributor/qa-test-suite.md, but is not gated.
- name: Regression guards (extracted to scripts/ci-guards/)
# All named regression guards live at scripts/ci-guards/<id>.sh per
# ci-pipeline-cleanup bundle Phase 1. Each guard is callable locally:
@@ -289,7 +279,7 @@ jobs:
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
# configured. Every lint/template invocation below must pick exactly one
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
# (certctl.tls.required) and docs/tls.md.
# (certctl.tls.required) and docs/operator/tls.md.
- name: Lint Helm Chart
run: |
helm lint deploy/helm/certctl/ \
+17 -12
View File
@@ -119,15 +119,18 @@ verify:
@echo ""
@echo "verify: PASS — safe to commit"
# verify-docs: pre-tag gate. Runs the QA-doc Part-count + seed-count
# drift guards that ci-pipeline-cleanup Phase 11 / frozen decision 0.13
# moved out of CI (was per-push blocking; now operator-runs pre-tag).
# These guards protect docs/qa-test-guide.md headlines from drifting
# vs the underlying source-of-truth (testing-guide Part count, seed
# row count). Operator-facing docs only — not product-affecting.
# verify-docs: pre-tag gate. Runs the QA-doc seed-count drift guard
# that ci-pipeline-cleanup Phase 11 / frozen decision 0.13 moved out
# of CI (was per-push blocking; now operator-runs pre-tag). Protects
# docs/contributor/qa-test-suite.md::Seed Data Reference from
# drifting vs migrations/seed_demo.sql. Operator-facing docs only —
# not product-affecting.
#
# The QA-doc Part-count drift guard retired in the 2026-05-04 docs
# overhaul Phase 5 when docs/testing-guide.md was pruned (its content
# dispersed across the audience-organized doc tree); the Part-count
# class no longer exists outside the qa_test.go file itself.
verify-docs:
@echo "==> QA-doc Part-count drift"
@bash scripts/qa-doc-part-count.sh
@echo "==> QA-doc seed-count drift"
@bash scripts/qa-doc-seed-count.sh
@echo ""
@@ -263,9 +266,12 @@ frontend-build:
@echo "Frontend build complete"
# QA Suite Stats — Bundle P / Strengthening #8.
# Single source-of-truth for every count claim in docs/qa-test-guide.md +
# docs/testing-guide.md. The Strengthening #6 CI drift guards consume the
# same numbers, eliminating the doc-drift class structurally.
# Single source-of-truth for every count claim in
# docs/contributor/qa-test-suite.md. The Strengthening #6 CI drift guards
# (now scoped to the seed-count class only — the Part-count class retired
# in the 2026-05-04 docs overhaul Phase 5 when testing-guide.md was
# pruned) consume the same numbers, eliminating the doc-drift class
# structurally.
qa-stats:
@echo "=== certctl QA Suite Stats ==="
@echo "Date: $$(date +%Y-%m-%d)"
@@ -278,7 +284,6 @@ qa-stats:
@echo "Fuzz targets: $$(grep -rE 'func Fuzz[A-Z]' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
@echo "t.Skip sites: $$(grep -rE 't\.Skip(Now|f)?\(' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
@echo "qa_test.go Part_ subtests: $$(grep -cE 't\.Run\(\"Part[0-9]+_' deploy/test/qa_test.go 2>/dev/null || echo 0)"
@echo "testing-guide.md Parts: $$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md 2>/dev/null || echo 0)"
@echo "Seed unique mc-* IDs: $$(grep -oE "mc-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
@echo "Seed unique ag-* IDs: $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 12)"
@echo "Seed unique iss-* IDs: $$(grep -oE "iss-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (issuers table count is 13)"
+1
View File
@@ -29,6 +29,7 @@ You're operating certctl in production or building integrations and need authori
| [Architecture](reference/architecture.md) | System design, data flow, security model, deployment topologies |
| [API](reference/api.md) | OpenAPI 3.1 spec, integration patterns, client SDK generation |
| [CLI](reference/cli.md) | certctl-cli command reference and CI/CD integration patterns |
| [Configuration](reference/configuration.md) | `CERTCTL_*` environment variable reference (scheduler, rate limits, deploy verify, audit, agent) |
| [MCP server](reference/mcp.md) | Model Context Protocol integration for AI assistants |
| [Release verification](reference/release-verification.md) | Cosign / SLSA / SBOM verification procedure |
| [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md) | Multi-level CA tree management — RFC 5280 §3.2/§4.2.1.9/§4.2.1.10 enforcement |
+16 -17
View File
@@ -4,13 +4,13 @@
> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
>
> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
> **Self-contained.** Through 2026-05-04 this doc was a companion to a separate `docs/testing-guide.md` (the *what* to test) — that companion was pruned during the Phase 5 docs overhaul (its content dispersed across the audience-organized doc tree). The Part-by-Part Coverage Map below is now the canonical inventory of QA Parts.
---
## Test Suite Health (regenerate via `make qa-stats`)
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; CI's QA-doc drift guards (`.github/workflows/ci.yml`) catch out-of-date Part / cert / issuer counts on every PR. **Last regenerated: 2026-04-27 (Bundle P).**
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; the QA-doc seed-count drift guard (`.github/workflows/ci.yml::QA-doc seed-count drift guard`) catches out-of-date cert / issuer counts on every PR. The Part-count drift guard retired in the 2026-05-04 docs overhaul Phase 5 (testing-guide.md was pruned; Part counts are now tracked inside `qa_test.go` itself, not against an external doc). **Last regenerated: 2026-04-27 (Bundle P).**
| Metric | Value | Target | Status |
|---|---|---|---|
@@ -20,23 +20,22 @@
| Frontend test files | 38 | n/a | |
| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
| `t.Skip` sites | 60 | each carries valid rationale (Bundle O audit) | ✓ |
| `qa_test.go` Part_* subtests | 53 | tracks `testing-guide.md` Parts (3 `## Part 15-17` covered indirectly via Parts 4246) | ✓ |
| `testing-guide.md` Parts | 56 | n/a | |
| `qa_test.go` Part_* subtests | 53 | covers 49 of 56 historical QA Parts directly + Parts 1517 indirectly via Parts 4246 | ✓ |
| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in `coverage-matrix.md` |
| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
| Race detector clean (`-count=10`) | partial (`-count=3` clean per Phase 0) | 0 races | ⚠ |
## What Is This File?
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates the historical QA Part inventory (preserved in the Part-by-Part Coverage Map below) against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
It covers **49 of 56 Parts** of the testing guide as automation; the remaining 7 are
either manual-only by design or pending QA-suite coverage:
- **49 `Part_*` automation wrappers**, **~159 leaf subtests** — API calls, database queries, source file checks, performance benchmarks
- **11 fully skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually per `docs/testing-guide.md` until QA-suite automation lands
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually until QA-suite automation lands; the Part-by-Part Coverage Map below describes the surface area each Part covers
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human (Coverage Map below describes each)
## Architecture
@@ -149,8 +148,8 @@ This table shows what each Part tests and what's left for manual verification.
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually per `docs/testing-guide.md::Part 23` |
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually per `docs/testing-guide.md::Part 24` |
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually — see the Coverage Map row |
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually — see the Coverage Map row |
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
@@ -180,12 +179,12 @@ This table shows what each Part tests and what's left for manual verification.
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually per `docs/testing-guide.md::Part 55` |
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually per `docs/testing-guide.md::Part 56` |
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually — see the Coverage Map row |
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually — see the Coverage Map row |
**Totals (verified 2026-04-27):** 49 `Part_*` automation wrappers, ~159 leaf subtests, 11 fully
skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE '^## Part [0-9]+:' docs/testing-guide.md`
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to count Part_* automation wrappers
and `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to re-verify.
## Coverage by Risk Class
@@ -194,14 +193,14 @@ A buyer's QA lead reading this doc wants "where are the existential bugs caught?
| Risk class | Description | Parts in scope | Automation status |
|---|---|---|---|
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in `testing-guide.md`) |
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in the Coverage Map below) |
| **High** (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 4246) |
| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
This is the table acquisition reviewers screenshot for their report. When a new Part lands in `testing-guide.md`, classify it here; the QA-doc Part-count drift guard (`.github/workflows/ci.yml::QA-doc Part-count drift guard`) catches the count mismatch.
This is the table acquisition reviewers screenshot for their report. When a new Part_* subtest lands in `qa_test.go`, classify it here.
## Test Categories
@@ -233,11 +232,11 @@ Timed API requests with threshold assertions:
## What This Test Does NOT Cover
These gaps must be filled by manual testing per `docs/testing-guide.md`:
These gaps must be filled by manual testing — see each Coverage Map row for surface-area description:
### Not Yet Automated (Parts 23, 24, 55, 56)
These Parts are documented in `docs/testing-guide.md` but have no `Part_*` automation
These historical QA Parts are listed in the Coverage Map below but have no `Part_*` automation
in `qa_test.go` yet. They are operator-runnable from the manual playbook; QA-suite
automation should land before the next acquisition-grade release.
@@ -431,7 +430,7 @@ grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
When a new feature ships:
1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
1. **Add a Part section** in `qa_test.go` following the numbering convention in the Coverage Map below
2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
+98
View File
@@ -0,0 +1,98 @@
# Configuration Reference
> Last reviewed: 2026-05-05
Compact reference for `CERTCTL_*` environment variables consumed by
`certctl-server` and `certctl-agent`. Most operators don't need to
touch these — defaults are tuned for the common case. Reach for them
when the system's behaviour needs tuning beyond what's exposed in the
GUI / API.
This page enumerates the operator-tunable knobs that don't have a
dedicated home elsewhere. Connector-specific env vars are documented
on the per-connector pages under
[`docs/reference/connectors/`](connectors/index.md). Protocol env
vars (ACME server, EST, SCEP) are documented under
[`docs/reference/protocols/`](protocols/). TLS env vars are
documented in [`docs/operator/tls.md`](../operator/tls.md).
## Scheduler intervals
The scheduler runs N background loops; intervals are tunable for
performance / contention tuning.
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | `2m` | How often the agent-health loop scans for stale heartbeats and transitions agents to `Unhealthy` / `Offline`. |
| `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | `30s` | How often the job-processor loop dispatches `Pending` jobs to agents. |
| `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | `1m` | How often the notification-dispatcher loop fans out queued alerts to channels. |
| `CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL` | `5m` | How often the short-lived-expiry loop watches certs whose TTL is less than 1h for imminent expiry. |
For the full scheduler topology (12 loops, 8 always-on + 4 opt-in)
see [`architecture.md`](architecture.md) "Scheduler topology".
## Job lifecycle
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_JOB_AWAITING_CSR_TIMEOUT` | `24h` | How long a job stays in `AwaitingCSR` before the scheduler marks it `Failed` (the agent never picked it up). |
## Rate limiting
The control plane API is rate-limited by default; tune for
high-volume environments (mass-rotation events, bulk imports).
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_RATE_LIMIT_ENABLED` | `true` | Master toggle. Disable only for trusted-network single-tenant deploys where the API is firewall-protected. |
| `CERTCTL_RATE_LIMIT_PER_USER_RPS` | `0` (= use global default) | Per-user requests-per-second cap. Zero opts each user into the global default in `internal/api/middleware`. |
| `CERTCTL_RATE_LIMIT_PER_USER_BURST` | `0` (= use global default) | Per-user token-bucket burst size. Same opt-in semantics. |
## Audit trail
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS` | `30` | How long the audit-event flush worker waits for the buffered batch to drain before forcing a flush at shutdown. |
## Deploy verification
The deploy-hardening primitive wraps every cert deploy in
atomic-write + post-verify + rollback. These env vars tune the
post-deploy TLS verification phase.
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_VERIFY_DEPLOYMENT` | `true` | Master toggle for post-deploy TLS verify. Disable only for connectors / environments where the verify endpoint is not reachable from the agent. |
| `CERTCTL_VERIFY_DELAY` | `2s` | How long to wait after the reload command completes before the first verify-handshake attempt (gives the daemon time to pick up new keys). |
| `CERTCTL_VERIFY_TIMEOUT` | `10s` | Per-attempt TLS-handshake timeout. |
| `CERTCTL_DEPLOY_BACKUP_RETENTION` | `3` | How many `.certctl-bak.<unix-nanos>.<ext>` rollback snapshots to keep per target after a successful deploy. `0` uses the default of 3; `-1` opts out of pruning entirely. |
For the full deploy contract see
[`deployment-model.md`](deployment-model.md).
## Database
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_DATABASE_MIGRATIONS_PATH` | `./migrations` | Filesystem path to the `*.up.sql` / `*.down.sql` migration set. Override only when running `certctl-server` from a non-standard layout. |
## Agent
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_AGENT_ID` | (none — required) | The agent's unique ID, issued by `POST /api/v1/agents/register` and bundled into the agent's registration response. Pass via this env var when the agent runs as a systemd unit / container without the `-agent-id` CLI flag. |
## SCEP profile binding (single-profile back-compat)
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_SCEP_PROFILE_ID` | (empty) | Optional certificate profile ID for the legacy single-profile SCEP path. The multi-profile path uses `CERTCTL_SCEP_PROFILES=<list>` + `CERTCTL_SCEP_PROFILE_<NAME>_PROFILE_ID` instead — see [`scep-server.md`](protocols/scep-server.md). |
## Related references
- [`architecture.md`](architecture.md) — scheduler topology, system design, security model
- [`deployment-model.md`](deployment-model.md) — atomic write + verify + rollback contract
- [`operator/security.md`](../operator/security.md) — full security posture (auth, rate limits, encryption at rest)
- [`operator/tls.md`](../operator/tls.md) — control-plane TLS env vars
- Per-connector pages under [`reference/connectors/`](connectors/index.md) for connector-specific config
- Per-protocol pages under [`reference/protocols/`](protocols/) for ACME / SCEP / EST / CRL+OCSP / async-CA polling
+4 -3
View File
@@ -87,10 +87,11 @@ When the certctl sub-CA cert is approaching expiry:
1. Generate a new keypair (re-keying is recommended at sub-CA
rotation time).
2. CSR + ADCS signing cycle as above.
3. Stage the new cert and key at fresh paths
(`CERTCTL_CA_CERT_PATH_NEW` etc.) and follow the
3. Stage the new cert and key at fresh on-disk paths and follow the
[intermediate-CA hierarchy
runbook](../intermediate-ca-hierarchy.md) for the cutover. The
runbook](../intermediate-ca-hierarchy.md) for the cutover (rotate
`CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH` to the new files
when ready). The
key concern is overlap: both the old and new sub-CA certs must
chain to the ADCS root during the rollover so existing leaves
keep validating.
+3 -4
View File
@@ -103,11 +103,10 @@ replaces the connector without restart. Prior issuance state
### Diagnosing approval-pending hangs
If `GetOrderStatus` consistently times out, the operator approval
queue in EJBCA is the most common cause. Bump
`CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS` so a single tick can wait
through the full approval window — see
queue in EJBCA is the most common cause. The connector consumes
the shared bounded-polling primitive — see
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
schedule shape.
schedule shape and tuning approach.
## Related docs
+1 -1
View File
@@ -101,7 +101,7 @@ if [ -n "$CONFIG_ONLY" ]; then
echo "::error::G-3 regression: env var(s) defined in Go source but never documented:"
echo "$CONFIG_ONLY"
echo ""
echo "Add an entry to docs/features.md (or another canonical doc) so operators can find it."
echo "Add an entry to the canonical config doc (docs/reference/architecture.md or the per-connector pages under docs/reference/connectors/) (or another canonical doc) so operators can find it."
exit 1
fi
echo "G-3 env-docs-drift: clean."
+2 -2
View File
@@ -4,7 +4,7 @@
# H-009 closed by Bundle D as verified-already-clean: at audit time
# the README does NOT advertise JWT support (certctl does not ship
# in-process JWT middleware; JWT/OIDC integration is via an
# authenticating gateway, see docs/architecture.md "Authenticating-
# authenticating gateway, see docs/reference/architecture.md "Authenticating-
# gateway pattern"). This script grep-fails the build if README ever
# re-introduces a sentence advertising JWT as a supported auth mode.
# Pattern: "JWT" within ~6 words of "support|auth|enabled|mode" in
@@ -20,7 +20,7 @@ if grep -inE 'JWT.{0,40}(support|auth|enabled|mode|provider)' README.md \
echo "::error::H-009 regression: README.md appears to advertise JWT auth support."
echo "certctl does NOT ship in-process JWT middleware. JWT/OIDC"
echo "integration is via an authenticating gateway — see"
echo "docs/architecture.md::Authenticating-gateway pattern."
echo "docs/reference/architecture.md::Authenticating-gateway pattern."
echo "If you added a sentence about JWT to README, either remove"
echo "it or rewrite it to point at the gateway pattern."
exit 1
+3 -3
View File
@@ -2,11 +2,11 @@
# scripts/ci-guards/L-001-insecure-skip-verify.sh
#
# L-001 audited every production InsecureSkipVerify=true call site
# and documented the justification per site in docs/tls.md. This
# and documented the justification per site in docs/operator/tls.md. This
# script grep-fails the build if any new `InsecureSkipVerify: true`
# lands in a non-test Go file without a `//nolint:gosec` comment
# carrying the justification. Test files (_test.go) are exempt.
# Updating the documented surface goes through the docs/tls.md
# Updating the documented surface goes through the docs/operator/tls.md
# table — net-new sites must be reasoned about before merge.
set -e
@@ -32,7 +32,7 @@ if [ -n "$BAD" ]; then
echo -e "$BAD"
echo ""
echo "Add a //nolint:gosec comment with justification on the same"
echo "or preceding line, AND add a row to the docs/tls.md table."
echo "or preceding line, AND add a row to the docs/operator/tls.md table."
exit 1
fi
echo "L-001 insecure-skip-verify: clean."
+1 -1
View File
@@ -11,7 +11,7 @@
# HEALTHCHECK that targets `http://` against the certctl server
# port.
#
# Comment lines and the docs/upgrade-to-tls.md:182 expected-to-
# Comment lines and the docs/archive/upgrades/to-tls-v2.2.md:182 expected-to-
# fail invariant ("plaintext is gone, expect Connection refused")
# are intentionally exempt — we DO want the upgrade-doc string
# `http://localhost:8443/health` to remain there, since it
Executable → Regular
+3 -3
View File
@@ -135,7 +135,7 @@ echo " 3. Test the API:"
echo " curl --cacert ./deploy/test/certs/ca.crt https://localhost:8443/health"
echo ""
echo " 4. Try the quick start guide:"
echo " cat docs/quickstart.md"
echo " cat docs/getting-started/quickstart.md"
echo ""
echo " 5. Access PgAdmin (optional):"
echo " make docker-up-dev"
@@ -150,6 +150,6 @@ echo " make docker-logs - View service logs"
echo ""
echo "For more information, see:"
echo " • README.md"
echo " • docs/architecture.md"
echo " • docs/quickstart.md"
echo " • docs/reference/architecture.md"
echo " • docs/getting-started/quickstart.md"
echo ""
-27
View File
@@ -1,27 +0,0 @@
#!/usr/bin/env bash
# scripts/qa-doc-part-count.sh
#
# Bundle P / Strengthening #6 — QA-doc Part-count drift guard.
# Forces every PR that adds a Part to docs/testing-guide.md to keep
# docs/qa-test-guide.md headline in sync.
#
# Per ci-pipeline-cleanup bundle Phase 11 / frozen decision 0.13:
# moved out of CI (was in ci.yml) — operator runs via 'make verify-docs'
# pre-tag. Protects docs-the-operator-reads, not anything the product
# depends on; CI-blocking on every push was overkill.
set -e
DOC_PARTS=$(grep -oE '49 of [0-9]+ Parts' docs/qa-test-guide.md | grep -oE '[0-9]+' | tail -1)
GUIDE_PARTS=$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md)
if [ -z "$DOC_PARTS" ]; then
echo "::error::Could not extract Part count from docs/qa-test-guide.md headline."
echo " Expected pattern: '49 of <N> Parts'"
exit 1
fi
if [ "$DOC_PARTS" != "$GUIDE_PARTS" ]; then
echo "::error::DRIFT — qa-test-guide.md headline claims $DOC_PARTS Parts; testing-guide.md has $GUIDE_PARTS Parts."
echo " Update docs/qa-test-guide.md to match. Bundle I patched this once;"
echo " Bundle P added this guard so the drift cannot recur silently."
exit 1
fi
echo "qa-doc-part-count: clean ($DOC_PARTS == $GUIDE_PARTS)."
Executable → Regular
+7 -7
View File
@@ -3,7 +3,7 @@
#
# Bundle P / Strengthening #6 — QA-doc seed-count drift guard.
# Forces every PR that adds a seed row to migrations/seed_demo.sql
# to keep docs/qa-test-guide.md::Seed Data Reference in sync.
# to keep docs/contributor/qa-test-suite.md::Seed Data Reference in sync.
#
# Per ci-pipeline-cleanup bundle Phase 11 / frozen decision 0.13:
# moved out of CI (was in ci.yml) — operator runs via 'make verify-docs'
@@ -13,19 +13,19 @@ set -e
# Seed-cert count: agnostic to documented header format. The current
# documented count lives in `### Certificates (32 total in ...` —
# extract the first integer in that header.
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/contributor/qa-test-suite.md | grep -oE '[0-9]+' | head -1)
# Authoritative count: unique mc-* IDs in seed_demo.sql.
SEED_CERTS=$(grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
if [ -z "$DOC_CERTS" ]; then
echo "::warning::Could not extract documented cert count from docs/qa-test-guide.md."
echo "::warning::Could not extract documented cert count from docs/contributor/qa-test-suite.md."
echo " Skipping cert-count drift check (header format may have changed)."
elif [ "$DOC_CERTS" != "$SEED_CERTS" ]; then
echo "::error::DRIFT — qa-test-guide.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
echo " Update docs/qa-test-guide.md::Seed Data Reference to match."
echo "::error::DRIFT — qa-test-suite.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
echo " Update docs/contributor/qa-test-suite.md::Seed Data Reference to match."
exit 1
fi
# Issuers: seed-table count vs doc claim.
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/contributor/qa-test-suite.md | grep -oE '[0-9]+' | head -1)
# Authoritative: unique iss-* IDs (close enough proxy; the issuers
# table count IS the unique-ID count for this prefix).
SEED_ISS=$(grep -oE 'iss-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
@@ -35,7 +35,7 @@ elif [ "$DOC_ISS" != "$SEED_ISS" ] && [ "$((SEED_ISS - DOC_ISS))" -gt 5 ]; then
# Allow up to 5pp slack — iss-* IDs appear in audit_events and
# other reference tables that aren't issuer-table rows. Drift
# only flags when the spread grows large.
echo "::error::DRIFT — qa-test-guide.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
echo "::error::DRIFT — qa-test-suite.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
exit 1
fi
echo "qa-doc-seed-count: clean."