docs: Phase 2 mechanical file moves to subdirectory structure

Pure git mv operations; no content edits. Internal links remain pointing at old paths and will be fixed in Phase 11. Per the Phase 1 audit recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/. 35 files moved across 8 audience-organized subdirectories: docs/getting-started/ (5): quickstart.md, concepts.md, examples.md, advanced-demo.md (was demo-advanced.md), why-certctl.md docs/reference/ (6): architecture.md, api.md (was openapi.md), mcp.md, intermediate-ca-hierarchy.md, deployment-model.md (was deployment-atomicity.md), vendor-matrix.md (was deployment-vendor-matrix.md) docs/reference/protocols/ (6): acme-server.md, acme-server-threat-model.md, scep-intune.md, est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md) docs/operator/ (4): security.md, tls.md, database-tls.md, approval-workflow.md docs/operator/runbooks/ (3): cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md (was runbook-expiry-alerts.md), disaster-recovery.md docs/migration/ (3): from-certbot.md (was migrate-from-certbot.md), from-acmesh.md (was migrate-from-acmesh.md), cert-manager-coexistence.md (was certctl-for-cert-manager-users.md) docs/compliance/ (4): index.md (was compliance.md), soc2.md (was compliance-soc2.md), pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was compliance-nist.md) docs/contributor/ (4): testing-strategy.md, test-environment.md (was test-env.md), ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md) Deferred to later Phase 2 sub-phases: - connectors.md split (Phase 4): docs/connectors.md + docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level - testing-guide.md prune (Phase 5): docs/testing-guide.md still at top level - features.md disperse (Phase 6): docs/features.md still at top level - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md still at top level - ACME walkthrough re-homing (Phase 8): three docs/acme-*-walkthrough.md still at top level - Upgrade docs archive (Phase 3): two docs/upgrade-*.md still at top level Cross-reference updates (Phase 11) will happen after all moves and content edits land. Internal links to docs/* paths are temporarily broken until that phase completes.
2026-06-09 06:58:54 +00:00 · 2026-05-05 02:49:28 +00:00
parent f347811cfb
commit b375df767e
35 changed files with 0 additions and 0 deletions
@@ -0,0 +1,230 @@
+# CI Pipeline — Operator Guide
+
+> Authoritative guide to certctl's CI pipeline shape.
+> Per `cowork/ci-pipeline-cleanup-prompt.md` Phase 12.
+
+## Trigger model
+
+Three triggers, each with its own scope. Don't mix.
+
+| Trigger | Workflow | Scope | Wall-clock target |
+|---|---|---|---|
+| Push to master, PR to master | `.github/workflows/ci.yml` + `.github/workflows/codeql.yml` | Blocking — every check earns its keep | <10 min |
+| Daily 06:00 UTC + `workflow_dispatch` | `.github/workflows/security-deep-scan.yml` | Slow scans (gosec, osv, trivy, ZAP, schemathesis, nuclei, testssl, semgrep, mutation, `-race -count=10`); best-effort, never blocks | 60 min budget |
+| Tag push (`v*`) | `.github/workflows/release.yml` | Cross-platform binaries, ghcr.io push, SLSA provenance, GitHub release | n/a |
+
+This guide covers the **on-push pipeline** only.
+
+## On-push pipeline (7 status checks)
+
+```mermaid
+flowchart TD
+    Push["push to master"]
+    CI["CI workflow (5 jobs)"]
+    CodeQL["CodeQL workflow (2 jobs)"]
+    GoBuild["go-build-and-test<br/>~6-7 min"]
+    Frontend["frontend-build<br/>~1 min"]
+    HelmLint["helm-lint<br/>~10 sec"]
+    Vendor["deploy-vendor-e2e<br/>~5 min, depends on go-build-and-test"]
+    Image["image-and-supply-chain<br/>~3 min, parallel"]
+    AnalyzeGo["Analyze (go)<br/>~5 min, parallel"]
+    AnalyzeJS["Analyze (javascript-typescript)<br/>~5 min, parallel"]
+    Push --> CI
+    Push --> CodeQL
+    CI --> GoBuild
+    CI --> Frontend
+    CI --> HelmLint
+    CI --> Vendor
+    CI --> Image
+    CodeQL --> AnalyzeGo
+    CodeQL --> AnalyzeJS
+    GoBuild -.depends on.-> Vendor
+```
+
+End-to-end wall-clock: dominated by `go-build-and-test` + `deploy-vendor-e2e` chain (~12 min) running in parallel with CodeQL (~5 min). Target ~10 min.
+
+## Per-job deep-dive
+
+### `go-build-and-test` (Ubuntu, ~6-7 min)
+
+Runs the Go build/test suite + 18 of 20 regression guards.
+
+Steps:
+1. `actions/checkout@v4`
+2. `actions/setup-go@v5` (Go 1.25.9)
+3. `go build ./cmd/...` (server, agent, mcp-server, cli)
+4. **gofmt drift** — `gofmt -l .` must be empty (Makefile::verify parity)
+5. **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
+6. `go vet ./...`
+7. Install + run **golangci-lint** v2.11.4 (`--timeout 5m`)
+8. Install + run **govulncheck** (hard gate)
+9. Install + run **staticcheck** (hard gate; `continue-on-error: false`)
+10. **Race Detection** — `go test -race -count=1 ./internal/...` (9-package list, 5min timeout)
+11. **Go Test with Coverage** — full coverage profile to `coverage.out`
+12. **Check Coverage Thresholds** — `bash scripts/check-coverage-thresholds.sh` (reads `.github/coverage-thresholds.yml`)
+13. **Upload Coverage Report** — artifact (`go-coverage`, 30-day retention)
+14. **Coverage PR comment** — posts/updates per-PR coverage table (PR builds only)
+15. **Regression guards** — loop runs all `scripts/ci-guards/*.sh` (18 of 20 guards)
+
+Local equivalent: `make verify` covers steps 4, 6, 7, 11 (with `-short`).
+
+### `frontend-build` (Ubuntu, ~1 min)
+
+Vitest tests + tsc check + vite build + 2 of 20 regression guards (already covered by the ci-guards loop in `go-build-and-test`).
+
+Steps:
+1. `actions/checkout@v4`
+2. `actions/setup-node@v4` (Node 22)
+3. `npm ci`
+4. `npx tsc --noEmit`
+5. `npx vitest run`
+6. `npx vite build`
+7. **Regression guards** — same `scripts/ci-guards/*.sh` loop as `go-build-and-test` (catches frontend-side guards: S-1, P-1, T-1, L-015, L-019, M-009, G-3)
+
+### `helm-lint` (Ubuntu, ~10 sec)
+
+Helm chart validation in 3 modes + inverse fail-loud test:
+1. `helm lint` with existingSecret
+2. `helm template` (existingSecret mode)
+3. `helm template` (cert-manager mode)
+4. `helm template` (no TLS source — MUST fail per fail-loud guard)
+
+### `deploy-vendor-e2e` (Ubuntu, ~5 min, depends on `go-build-and-test`)
+
+Single-job collapse of the prior 12-job matrix (per ci-pipeline-cleanup Phase 5 / frozen decision 0.4 — revises Bundle II decision 0.9).
+
+Steps:
+1. `actions/checkout@v5`
+2. `actions/setup-go@v5` (Go 1.25.9, cache: true)
+3. **Build f5-mock-icontrol sidecar** — only sidecar without published image
+4. **Bring up all vendor sidecars** — `docker compose --profile deploy-e2e up -d` (11 sidecars)
+5. **Run all vendor-edge e2e** — `go test -tags integration -race -count=1 -run 'VendorEdge_'`; output captured to `test-output.log`
+6. **Skip-count enforcement** — `bash scripts/ci-guards/vendor-e2e-skip-check.sh test-output.log` (catches sidecar boot failures via skip-count vs allowlist)
+7. **Tear down sidecars** — `docker compose down -v` (always runs)
+
+The `deploy-vendor-e2e-windows` matrix was deleted entirely (per ci-pipeline-cleanup Phase 6 / frozen decision 0.5 — revises Bundle II decision 0.4). IIS + WinCertStore validation moved to [`docs/connector-iis.md::Operator validation playbook`](connector-iis.md#operator-validation-playbook-windows-host).
+
+### `image-and-supply-chain` (Ubuntu, ~3 min, parallel)
+
+Three checks bundled (per ci-pipeline-cleanup Phases 7-9 / frozen decision 0.8):
+1. **Digest validity** — `bash scripts/ci-guards/digest-validity.sh`. Resolves every `@sha256:<digest>` ref in `deploy/**/*.{yml,Dockerfile*}` against its registry. Closes the H-001 lying-field gap.
+2. **Docker build smoke** — builds all 4 Dockerfiles (`Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`).
+3. **OpenAPI ↔ handler operationId parity** — `bash scripts/ci-guards/openapi-handler-parity.sh`. Every router route must have a matching `operationId` in `api/openapi.yaml` or be documented in `api/openapi-handler-exceptions.yaml`.
+
+### CodeQL (Ubuntu × 2 languages, ~5 min)
+
+`.github/workflows/codeql.yml` — interprocedural taint tracking. Two matrix jobs: `go` and `javascript-typescript`. Triggers on push, PR, and weekly Sunday cron.
+
+## The 20 regression guards
+
+Located at `scripts/ci-guards/<id>.sh`. Each script is callable locally:
+
+```bash
+bash scripts/ci-guards/G-3-env-docs-drift.sh
+```
+
+Or run all of them:
+
+```bash
+for g in scripts/ci-guards/*.sh; do
+  echo "=== $(basename "$g") ==="
+  bash "$g" || echo "  FAILED"
+done
+```
+
+| ID | Catches |
+|---|---|
+| `G-1-jwt-auth-literal` | JWT silent auth downgrade reappearing |
+| `L-001-insecure-skip-verify` | Bare `InsecureSkipVerify: true` without `//nolint:gosec` |
+| `H-001-bare-from` | Bare Dockerfile `FROM` without `@sha256:` digest pin |
+| `M-012-no-root-user` | Dockerfile missing terminal `USER <non-root>` |
+| `H-009-readme-jwt` | README re-introducing JWT-as-supported claim |
+| `G-2-api-key-hash-json` | `api_key_hash` in JSON-emitting surface |
+| `U-2-plaintext-healthcheck` | Plaintext `http://` in HEALTHCHECK |
+| `U-3-migration-mount` | Migration file mounted into postgres initdb |
+| `D-1-D-2-statusbadge-phantom` | Dead StatusBadge keys + 8 TS phantom fields across 4 interfaces |
+| `L-1-bulk-action-loop` | Client-side `for ... await` bulk action loops |
+| `B-1-orphan-crud` | 8 update/create/delete fns lose page consumers |
+| `S-2-strings-contains-err` | `strings.Contains(err.Error(), ...)` brittle dispatch |
+| `G-3-env-docs-drift` | `CERTCTL_*` env var defined OR documented but not both |
+| `test-naming-convention` | `func TestXxx` lowercase first letter (Go silently skips) |
+| `S-1-hardcoded-source-counts` | Hardcoded "N issuer connectors" prose |
+| `P-1-documented-orphan-fns` | 16 read-fn names removed from client.ts exports |
+| `T-1-frontend-page-coverage` | New page in `web/src/pages/` without sibling `.test.tsx` |
+| `bundle-8-L-015-target-blank-rel-noopener` | `target="_blank"` without `rel="noopener noreferrer"` |
+| `bundle-8-L-019-dangerously-set-inner-html` | `dangerouslySetInnerHTML` outside `safeHtml.ts` |
+| `bundle-8-M-009-bare-usemutation` | Bare `useMutation()` outside the `useTrackedMutation` wrapper |
+
+Plus three additional scripts for non-guard operator workflows:
+- `scripts/ci-guards/vendor-e2e-skip-check.sh` — vendor-e2e skip-count enforcement (used by `deploy-vendor-e2e` job)
+- `scripts/ci-guards/digest-validity.sh` — used by `image-and-supply-chain` job
+- `scripts/ci-guards/openapi-handler-parity.sh` — used by `image-and-supply-chain` job
+- `scripts/ci-guards/coverage-pr-comment.sh` — used by `go-build-and-test` job
+- `scripts/check-coverage-thresholds.sh` — used by `go-build-and-test` job
+
+## Coverage thresholds
+
+Manifest at `.github/coverage-thresholds.yml`. Each entry has `floor:` (integer percentage) + `why:` (load-bearing context). Lowering a floor REQUIRES corresponding code-side test work — never lower the gate to make CI green.
+
+To add a new gated package: add an entry to the YAML; no script changes needed.
+
+## Make targets — three-tier convention
+
+| Target | When | What |
+|---|---|---|
+| `make verify` | **Required pre-commit** | gofmt + vet + golangci-lint + go test -short |
+| `make verify-deploy` | Optional pre-push | digest-validity + OpenAPI parity + Docker build smoke (server + agent only — fast subset) |
+| `make verify-docs` | **Required pre-tag** | QA-doc Part-count + seed-count drift checks |
+
+## Adding a new check
+
+| Check type | Where it goes | Auto-picked-up by CI? |
+|---|---|---|
+| Regression guard (grep / shape pattern) | New `scripts/ci-guards/<id>.sh` script | Yes — loop step iterates `*.sh` |
+| Coverage threshold (per-package) | New entry in `.github/coverage-thresholds.yml` | Yes — bash loop reads YAML |
+| OpenAPI route exception | New entry in `api/openapi-handler-exceptions.yaml` | Yes — parity script reads YAML |
+| Vendor-e2e expected skip | New line in `scripts/ci-guards/vendor-e2e-skip-allowlist.txt` | Yes — skip-check script reads file |
+| New CI job | Edit `.github/workflows/ci.yml` directly | n/a (job definition is the source) |
+
+## Troubleshooting
+
+| CI step fails | Likely cause | Fix |
+|---|---|---|
+| `gofmt drift` | source needs `gofmt -w` | `make fmt` locally + commit |
+| `go mod tidy drift` | imported a package without committing go.mod | `go mod tidy` + commit |
+| `Run staticcheck` | new SA1019 deprecated-API site | migrate the API OR add `//lint:ignore SA1019 <reason>` |
+| `Check Coverage Thresholds` | per-package coverage dropped below floor | add tests; do NOT lower the floor |
+| `Regression guards` (any `<id>.sh`) | the audit-finding the guard pinned reappeared | read the guard's head-comment block for the closure rationale + fix the regression |
+| `Skip-count enforcement` | a vendor sidecar failed to start | check docker logs; fix sidecar; OR if a new Windows-only test was added, add to `scripts/ci-guards/vendor-e2e-skip-allowlist.txt` |
+| `Digest validity` | a `@sha256` digest doesn't resolve | re-resolve from registry, replace in compose / Dockerfile |
+| `OpenAPI ↔ handler parity` | new router route without operationId | add to `api/openapi.yaml` (preferred) OR `api/openapi-handler-exceptions.yaml` |
+| `Docker build smoke` | Dockerfile syntax error or COPY path drift | fix the Dockerfile |
+| `CodeQL Analyze` | interprocedural dataflow finding | review the SARIF in Security → Code scanning tab |
+
+## Status check accounting
+
+**Current (post-cleanup):** 7 status checks per push.
+- 1 × `Go Build & Test`
+- 1 × `Frontend Build`
+- 1 × `Helm Chart Validation`
+- 1 × `deploy-vendor-e2e`
+- 1 × `image-and-supply-chain`
+- 2 × `CodeQL Analyze (<lang>)` (go + javascript-typescript)
+
+**Pre-cleanup (HEAD `1de61e91`):** 19 status checks. The 12-vendor matrix + 2-vendor Windows matrix collapsed to 1 + 0 respectively; the 3 Go/Frontend/Helm jobs unchanged; 2 CodeQL unchanged; 1 new `image-and-supply-chain` added.
+
+## Required GitHub branch protection list
+
+When updating the `master` branch protection rule (Settings → Branches), the "Require status checks to pass" list should be exactly:
+
+```
+Go Build & Test
+Frontend Build
+Helm Chart Validation
+deploy-vendor-e2e
+image-and-supply-chain
+Analyze (go)
+Analyze (javascript-typescript)
+```
+
+Old-name checks (`deploy-vendor-e2e (<vendor>)` × 12, `deploy-vendor-e2e-windows (<vendor>)` × 2) won't appear on new PRs after the workflow change. Operator removes them from the required list.
@@ -0,0 +1,444 @@
+# QA Test Suite Guide (`qa_test.go`)
+
+> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
+>
+> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
+
+---
+
+## Test Suite Health (regenerate via `make qa-stats`)
+
+> Snapshot at HEAD. Re-run `make qa-stats` to refresh; CI's QA-doc drift guards (`.github/workflows/ci.yml`) catch out-of-date Part / cert / issuer counts on every PR. **Last regenerated: 2026-04-27 (Bundle P).**
+
+| Metric | Value | Target | Status |
+|---|---|---|---|
+| Backend test files | 221 | n/a | ℹ |
+| Backend `Test*` functions | 2,454 | n/a | ℹ |
+| Backend `t.Run` subtests | 778 | n/a | ℹ |
+| Frontend test files | 38 | n/a | ℹ |
+| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
+| `t.Skip` sites | 60 | each carries valid rationale (Bundle O audit) | ✓ |
+| `qa_test.go` Part_* subtests | 53 | tracks `testing-guide.md` Parts (3 `## Part 15-17` covered indirectly via Parts 42–46) | ✓ |
+| `testing-guide.md` Parts | 56 | n/a | ℹ |
+| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in `coverage-matrix.md` |
+| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
+| Race detector clean (`-count=10`) | partial (`-count=3` clean per Phase 0) | 0 races | ⚠ |
+
+## What Is This File?
+
+`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
+
+It covers **49 of 56 Parts** of the testing guide as automation; the remaining 7 are
+either manual-only by design or pending QA-suite coverage:
+
+- **49 `Part_*` automation wrappers**, **~159 leaf subtests** — API calls, database queries, source file checks, performance benchmarks
+- **11 fully skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
+- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually per `docs/testing-guide.md` until QA-suite automation lands
+- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
+
+## Architecture
+
+```mermaid
+flowchart LR
+    QA["qa_test.go (//go:build qa)<br/><br/>TestQA(t *testing.T)<br/>├─ Part01_Infra<br/>├─ Part02_Auth<br/>├─ Part03_CertCRUD<br/>├─ ...<br/>└─ Part52_HelmChart"]
+    subgraph Stack["certctl demo stack<br/>docker-compose.yml + docker-compose.demo.yml"]
+        Server["certctl-server :8443"]
+        Postgres["postgres :5432"]
+        Agents["certctl-agent (×N)<br/>↑ seed_demo.sql provisions 12 agent rows<br/>(1 active, 2 retired, 9 reserved/sentinel)<br/>for the soft-retire / FSM coverage Parts 55–56 exercise"]
+    end
+    QA --> Stack
+```
+
+> **Multi-agent demo stack (Bundle Q / L-004 closure).** The demo
+> stack runs a single live `certctl-agent` container by default but
+> the database is seeded with 12 agent rows (`migrations/seed_demo.sql`,
+> grep `mc-* | ag-*` IDs). The "(×N)" notation reflects the seed-data
+> reality: Parts 04 (Agents Listing), 05 (Agent Heartbeats), 55
+> (Agent Soft-Retirement), and FSM coverage tables in
+> `coverage-audit-2026-04-27/tables/fsm-coverage.md` exercise the full
+> multi-agent population, not the one live container. Operators
+> running the QA suite in a parallel-agent topology should set
+> `AGENT_COUNT=N` in compose-override and re-derive the seed counts
+> via `make qa-stats`.
+
+Key design choices:
+
+- **Build tag:** `//go:build qa` — never runs during `go test ./...` or CI. Only runs when explicitly requested.
+- **Package:** `integration_test` — same package as `integration_test.go` (which uses `//go:build integration` for the test stack). They coexist but never run together.
+- **Zero internal imports:** Uses only stdlib + `lib/pq` (from `go.mod`). All API interactions are plain HTTP. All JSON is decoded into lightweight local structs (`qaCert`, `qaJob`, etc.) — not the internal domain types.
+- **Self-cleaning:** Tests that create data use `t.Cleanup()` to delete it afterward. The seed data is not modified.
+
+## Prerequisites
+
+1. **Docker Compose demo stack running:**
+   ```bash
+   cd deploy
+   docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -d
+   ```
+   Wait ~15 seconds for health checks to pass.
+
+2. **Go 1.22+** installed (the project uses Go 1.25 in `go.mod`, but 1.22+ works for running tests).
+
+3. **PostgreSQL port exposed** — the demo stack exposes port 5432 for database verification tests (table counts, schema checks).
+
+4. **Repository checkout** — source file verification tests (`fileExists`, `fileContains`) read files relative to `qaRepoDir` (default: `../..` from `deploy/test/`).
+
+## Running the Tests
+
+### Full suite
+```bash
+cd deploy/test
+go test -tags qa -v -timeout 10m ./...
+```
+
+### Single Part
+```bash
+go test -tags qa -v -run TestQA/Part03 ./...
+```
+
+### Single subtest
+```bash
+go test -tags qa -v -run TestQA/Part03_CertCRUD/Create_Minimal ./...
+```
+
+### With custom environment
+```bash
+CERTCTL_QA_SERVER_URL=https://staging.internal:8443 \
+CERTCTL_QA_API_KEY=my-staging-key \
+CERTCTL_QA_DB_URL=postgres://certctl:secret@db.internal:5432/certctl?sslmode=require \
+CERTCTL_QA_REPO_DIR=/path/to/certctl \
+go test -tags qa -v -timeout 10m ./...
+```
+
+### Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `CERTCTL_QA_SERVER_URL` | `https://localhost:8443` | certctl server URL (HTTPS-only as of v2.2) |
+| `CERTCTL_QA_API_KEY` | `change-me-in-production` | API key for Bearer auth |
+| `CERTCTL_QA_DB_URL` | `postgres://certctl:certctl@localhost:5432/certctl?sslmode=disable` | PostgreSQL connection string |
+| `CERTCTL_QA_REPO_DIR` | `../..` | Path to certctl repo root (for source file checks) |
+| `CERTCTL_QA_CA_BUNDLE` | `./certs/ca.crt` | PEM CA bundle pinned for TLS verification. The demo stack's `certctl-tls-init` container writes here. |
+| `CERTCTL_QA_INSECURE` | `false` | Set to `"true"` to skip TLS verification (e.g. before the init container finishes). Never use outside the demo harness. |
+
+## Part-by-Part Coverage Map
+
+This table shows what each Part tests and what's left for manual verification.
+
+| Part | Testing Guide Section | Automated Subtests | What's Automated | What's Manual |
+|------|----------------------|-------------------|-----------------|--------------|
+| 1 | Infrastructure & Deployment | 8 | Table count, health/ready endpoints, seed data counts (certs, agents, issuers, targets, policies) | Docker container health, log inspection, volume mounts |
+| 2 | Authentication & Security | 4 | No-auth 401, bad-key 401, health-no-auth 200, no private keys in API | CORS preflight, rate limiting (429 + Retry-After), TLS config |
+| 3 | Certificate Lifecycle | 10 | Create (minimal + full), get, 404, list pagination, status/issuer filters, sparse fields, update, archive | Deployment trigger, version history, certificate detail UI |
+| 4 | Renewal Workflow | 3 | Trigger renewal, 404 on nonexistent, agent work endpoint | AwaitingCSR flow, agent key generation, full issuance cycle |
+| 5 | Revocation | 5 | Revoke (default reason), already-revoked, nonexistent, invalid reason, CRL JSON | DER CRL, OCSP responder, revocation notifications |
+| 6 | Policies & Profiles | 6 | Policy CRUD (create/delete), invalid type 400, profile CRUD, list | Policy violation detection, profile enforcement on CSR |
+| 7 | Ownership & Teams | 4 | Team CRUD, owner CRUD, agent groups list | Owner notification routing, dynamic group matching |
+| 8 | Job System | 2 | List jobs, 404 on nonexistent | Job state transitions, approval workflow, cancellation |
+| 9 | Issuer Connectors | 4 | List, get detail, create (GenericCA), missing name 400 | Test connection, issuer-specific issuance flow |
+| 10 | Sub-CA Mode | SKIP | — | Requires CA cert+key on disk |
+| 11 | ACME ARI | SKIP | — | Requires ARI-capable CA |
+| 12 | Vault PKI | SKIP | — | Requires live Vault server |
+| 13 | DigiCert | SKIP | — | Requires DigiCert sandbox |
+| 14 | Target Connectors | 3 | List, create NGINX target, delete 204 | Deploy to real target, validate deployment |
+| 15–17 | Apache/HAProxy, Traefik/Caddy, IIS | — | (Covered by source checks in Parts 42–46) | Requires real services or Windows |
+| 18 | Agent Operations | 3 | Heartbeat (register), metadata check, auto-create on heartbeat | Agent binary behavior, key storage, discovery scan |
+| 19 | Agent Work Routing | 1 | Empty work for agent with no targets | Scoped job assignment, multi-target fan-out |
+| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
+| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
+| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
+| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually per `docs/testing-guide.md::Part 23` |
+| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually per `docs/testing-guide.md::Part 24` |
+| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
+| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
+| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
+| 28 | CLI | SKIP | — | Requires compiled `certctl-cli` binary |
+| 29 | MCP Server | SKIP | — | Requires compiled `mcp-server` binary + stdio |
+| 30 | Observability | 7 | Dashboard summary, certs by status, expiration timeline, job trends, issuance rate, JSON metrics (uptime + gauges), Prometheus (content-type + 4 metric names) | Chart rendering (GUI), Grafana import |
+| 31 | Notifications | 2 | List, 404 on nonexistent | Notification content, mark-read, email/Slack delivery |
+| 32 | Audit Trail | 3 | List events (≥10), PUT immutability, DELETE immutability | Actor attribution, body hash, time range filters |
+| 33 | Background Scheduler | SKIP | — | Timing-dependent; verify via Docker logs |
+| 34 | Structured Logging | SKIP | — | Requires Docker log inspection |
+| 35 | GUI Testing | SKIP | — | Requires browser |
+| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
+| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
+| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
+| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
+| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
+| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
+| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
+| 44 | SSH Target | 4 | Domain type, connector file, agent dispatch (`sshconn`), OpenAPI | SSH deployment test (requires target host) |
+| 45 | Windows Certificate Store | 3 | Domain type, connector file, shared certutil package | Windows deployment (requires Windows) |
+| 46 | Java Keystore | 3 | Domain type, connector file, OpenAPI | JKS deployment (requires keytool) |
+| 47 | Certificate Digest Email | 3 | Preview endpoint (200/503), service file, adapter file | SMTP delivery, HTML template rendering |
+| 48 | Dynamic Issuer Config | 4 | Crypto package exists, create ACME issuer via API, config redaction check, migration exists | Test connection flow, registry rebuild |
+| 49 | Dynamic Target Config | 2 | Create NGINX target via API, migration exists | Test connection via agent heartbeat |
+| 50 | Onboarding Wizard | 2 | Wizard component exists, docker-compose split (clean vs demo) | Wizard UI flow, step completion |
+| 51 | ACME Profile Selection | 3 | Profile module exists, frontend config, RFC 9702→9773 renumber check | Profile-aware issuance against real CA |
+| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
+| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
+| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
+| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually per `docs/testing-guide.md::Part 55` |
+| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually per `docs/testing-guide.md::Part 56` |
+
+**Totals (verified 2026-04-27):** 49 `Part_*` automation wrappers, ~159 leaf subtests, 11 fully
+skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
+flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE '^## Part [0-9]+:' docs/testing-guide.md`
+and `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to re-verify.
+
+## Coverage by Risk Class
+
+A buyer's QA lead reading this doc wants "where are the existential bugs caught?" — Bundle P / Strengthening #1 surfaces that view directly. The table below classifies each Part by risk class so reviewers can answer the existential-coverage question in one glance.
+
+| Risk class | Description | Parts in scope | Automation status |
+|---|---|---|---|
+| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in `testing-guide.md`) |
+| **High** (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
+| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 42–46) |
+| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
+| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
+| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
+
+This is the table acquisition reviewers screenshot for their report. When a new Part lands in `testing-guide.md`, classify it here; the QA-doc Part-count drift guard (`.github/workflows/ci.yml::QA-doc Part-count drift guard`) catches the count mismatch.
+
+## Test Categories
+
+The automated tests fall into four categories:
+
+### 1. API Integration Tests (majority)
+Make real HTTP requests to the running server and verify status codes, response structure, and JSON field values. Examples:
+- `POST /api/v1/certificates` with valid payload → 201
+- `GET /api/v1/certificates?status=Active` → all returned certs have `status: "Active"`
+- `DELETE /api/v1/certificates/mc-qa-full` → 204
+
+### 2. Database Verification Tests
+Connect directly to PostgreSQL and verify schema state:
+- Table count ≥ 19 (from migrations 000001–000010)
+- Useful for catching migration regressions
+
+### 3. Source File Verification Tests
+Read files from the repo checkout and verify structure:
+- Domain types exist in `internal/domain/connector.go` (e.g., `TargetTypeEnvoy`)
+- Connector implementations exist (e.g., `internal/connector/target/envoy/envoy.go`)
+- Documentation contains expected content (all issuer/target types listed)
+- No stale RFC 9702 references (replaced by RFC 9773)
+
+### 4. Performance Spot Checks
+Timed API requests with threshold assertions:
+- `GET /api/v1/certificates?per_page=15` < 200ms
+- `GET /api/v1/stats/summary` < 500ms
+- `GET /api/v1/metrics/prometheus` < 300ms
+
+## What This Test Does NOT Cover
+
+These gaps must be filled by manual testing per `docs/testing-guide.md`:
+
+### Not Yet Automated (Parts 23, 24, 55, 56)
+
+These Parts are documented in `docs/testing-guide.md` but have no `Part_*` automation
+in `qa_test.go` yet. They are operator-runnable from the manual playbook; QA-suite
+automation should land before the next acquisition-grade release.
+
+- **Part 23: S/MIME & EKU Support** — profile-driven EKU enforcement; SMIMECapabilities extension
+- **Part 24: OCSP Responder & DER CRL** — OCSP request/response correctness, CRL generation, Must-Staple coordination
+- **Part 55: Agent Soft-Retirement (I-004)** — soft vs hard retire, FK cascade, reactivation
+- **Part 56: Notification Retry & Dead-Letter Queue (I-005)** — retry semantics, dead-letter transition, requeue
+
+### External CA Integrations (Parts 10–13)
+- **Sub-CA mode** — requires CA cert+key files on disk
+- **ACME ARI** — requires a CA that supports RFC 9773 Renewal Information
+- **Vault PKI** — requires a running HashiCorp Vault instance
+- **DigiCert / Sectigo / Google CAS** — requires sandbox API credentials
+
+### Browser/GUI Testing (Parts 35–37, 50)
+- Dashboard chart rendering (Recharts)
+- Onboarding wizard step-by-step flow
+- Issuer catalog card layout and create wizard
+- Bulk operations UI (multi-select, progress bars)
+- Discovery triage workflow
+
+### Real Deployment Testing (Parts 15–17)
+- NGINX/Apache/HAProxy file write + reload
+- Traefik/Caddy file provider or API reload
+- IIS PowerShell/WinRM (requires Windows)
+- F5 BIG-IP iControl REST (requires appliance or mock)
+- SSH agentless deployment (requires target host)
+
+### Agent Binary Behavior (Parts 18, 28–29)
+- Agent-side ECDSA key generation and CSR submission
+- Agent filesystem discovery scan
+- CLI tool (`certctl-cli`) — all 10 subcommands
+- MCP server (`mcp-server`) — stdio transport
+
+### Timing-Dependent Tests (Parts 33–34)
+- Background scheduler loop execution (renewal, jobs, health, notifications, digest, network scan)
+- Structured logging format verification (requires Docker log parsing)
+
+## How This Relates to `integration_test.go`
+
+Both files live in `deploy/test/` in the same Go package (`integration_test`):
+
+| | `qa_test.go` | `integration_test.go` |
+|---|---|---|
+| **Build tag** | `//go:build qa` | `//go:build integration` |
+| **Target stack** | Demo (`docker-compose.yml` + `docker-compose.demo.yml`) | Test (`docker-compose.test.yml`) |
+| **Port** | 8443 | Different (test stack config) |
+| **Seed data** | `seed_demo.sql` (32 certs, 12 agents, 13 issuers, 8 targets, realistic history) | Minimal (created by tests) |
+| **CA backends** | Local CA only (demo mode) | Pebble ACME, step-ca, NGINX |
+| **Purpose** | Release QA — broad coverage, spot checks | Functional — end-to-end issuance, renewal, revocation against real CAs |
+| **Run frequency** | Before each release tag | CI on every PR |
+
+They are complementary. Integration tests prove the machinery works. QA tests prove the product works at release quality.
+
+## Seed Data Reference
+
+The QA tests depend on `migrations/seed_demo.sql`. Key IDs used:
+
+### Certificates (32 total in `managed_certificates`)
+
+The full canonical list is generated by:
+```
+sed -n '/^INSERT INTO managed_certificates/,/^;/p' migrations/seed_demo.sql \
+  | grep -oE "^\s*\('mc-[a-z0-9_-]+" | sed -E "s/^\s*\('//" | sort -u
+```
+
+Hand-listing is unsustainable as the seed grows; tests reference IDs by lookup, not by enumeration.
+Sample IDs: `mc-api-prod`, `mc-web-prod`, `mc-pay-prod`, `mc-compromised`, `mc-smime-bob`, `mc-edge-eu`, `mc-k8s-ingress`, `mc-wildcard-prod`. See `migrations/seed_demo.sql:147` onward.
+
+### Agents (12 total in `agents` table)
+
+8 named workload agents + 1 server-side sentinel + 3 cloud-discovery sentinels:
+
+- **Workload agents:** `ag-web-prod`, `ag-web-staging`, `ag-lb-prod`, `ag-iis-prod`, `ag-data-prod`, `ag-edge-01`, `ag-k8s-prod`, `ag-mac-dev`
+- **Server-side sentinel:** `server-scanner`
+- **Cloud-discovery sentinels:** `cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`
+
+Full list via:
+```
+sed -n '/^INSERT INTO agents/,/^;/p' migrations/seed_demo.sql \
+  | grep -oE "^\s*\('[a-z][a-z0-9_-]+" | sed -E "s/^\s*\('//"
+```
+
+(The `agent_groups` table also contains entries with `ag-*` IDs — `ag-linux-prod`, `ag-windows`, `ag-datacenter-a`, `ag-arm64`, `ag-manual` — but those are *group* IDs, not agents. Don't confuse the two.)
+
+### Issuers (13 total)
+
+`iss-local`, `iss-acme-le`, `iss-stepca`, `iss-acme-zs`, `iss-openssl`, `iss-vault`, `iss-digicert`, `iss-sectigo`, `iss-googlecas`, `iss-awsacmpca`, `iss-entrust`, `iss-globalsign`, `iss-ejbca`.
+
+Full list via:
+```
+sed -n '/^INSERT INTO issuers/,/^;/p' migrations/seed_demo.sql \
+  | grep -oE "^\s*\('iss-[a-z0-9_-]+" | sed -E "s/^\s*\('//"
+```
+
+### Targets (8 total in `deployment_targets`)
+`tgt-nginx-prod`, `tgt-nginx-staging`, `tgt-haproxy-prod`, `tgt-apache-prod`, `tgt-iis-prod`, `tgt-traefik-prod`, `tgt-caddy-prod`, `tgt-nginx-data`
+
+### Network Scan Targets (4 total in `network_scan_targets`)
+`nst-dc1-web`, `nst-dc2-apps`, `nst-dmz`, `nst-edge`
+
+**Maintenance note:** when adding new seed rows, also update this section, OR remove the
+per-table counts and rely on the `sed | grep` commands so the doc stops drifting on every
+seed-data change. A CI guard that fails when the doc count diverges from the seed file is
+proposed in `coverage-audit-2026-04-27/tables/qa-doc-strengthening.md` (Strengthening #6).
+
+## Troubleshooting
+
+### "Server unreachable" on startup
+The test pings `GET /health` before running anything. If this fails:
+```bash
+# Check if the stack is running
+docker compose -f docker-compose.yml -f docker-compose.demo.yml ps
+
+# Check server logs
+docker compose -f docker-compose.yml -f docker-compose.demo.yml logs certctl-server
+
+# Check if the port is exposed (self-signed cert — pin CA bundle)
+curl --cacert ./deploy/test/certs/ca.crt -s https://localhost:8443/health
+```
+
+### "connect to QA DB" failure
+The database tests connect directly to PostgreSQL. Ensure port 5432 is exposed:
+```bash
+docker compose -f docker-compose.yml -f docker-compose.demo.yml port postgres 5432
+```
+
+### Performance tests flaking
+The performance thresholds (200ms, 300ms, 500ms) assume a local Docker stack. On slow CI runners or remote Docker hosts, increase the thresholds or skip Part 39:
+```bash
+go test -tags qa -v -run 'TestQA/Part(?!39)' ./...
+```
+
+### Source file checks failing
+The `fileExists` and `fileContains` helpers read from `CERTCTL_QA_REPO_DIR` (default `../..`). If running from a non-standard location:
+```bash
+CERTCTL_QA_REPO_DIR=/absolute/path/to/certctl go test -tags qa -v ./...
+```
+
+## Release Day Sign-Off Matrix
+
+Before tagging a release, the QA-on-call engineer signs off on each row. This matrix replaces the previous ad-hoc release checklist and ties test execution directly to release approval. Acquisition-grade releases have this kind of matrix; the doc previously didn't.
+
+| Sign-off | Evidence | Owner | Result | Date |
+|---|---|---|---|---|
+| `make verify` clean on master | CI run URL | Eng-on-call | ☐ | |
+| `go test -tags qa ./deploy/test/...` ≥ 95% pass rate (skips counted as pass) | Test output | QA-on-call | ☐ | |
+| `go test -race -count=10 ./internal/...` 0 races | `tool-output/race-x10.txt` | QA-on-call | ☐ | |
+| Coverage ≥ thresholds in `ci.yml` (service / handler / crypto / local-issuer / acme / stepca / mcp) | `tool-output/cover-summary.txt` | QA-on-call | ☐ | |
+| Helm chart `helm lint && helm template` clean | `tool-output/helm.txt` | DevOps-on-call | ☐ | |
+| All `t.Skip` sites have current rationales (see Bundle O audit; CI guard catches new orphans) | `make qa-stats` t.Skip count | QA-on-call | ☐ | |
+| Frontend: Vitest run clean; per-page coverage ≥ 70% | `web/tool-output/vitest.txt` | Frontend-on-call | ☐ | |
+| Manual Parts 23, 24, 55, 56 executed (or explicit defer with rationale) | This sheet | QA-on-call | ☐ | |
+| Demo stack `docker compose up -d --build` smoke (`/health` 200, `/ready` 200) | curl receipt | QA-on-call | ☐ | |
+| `govulncheck ./...` clean (or deferred-call advisories tracked in `gap-backlog`) | `tool-output/govulncheck.json` | Security-on-call | ☐ | |
+| QA-doc drift guards green (Part-count + cert-count) | CI run URL | QA-on-call | ☐ | |
+| FSM transition coverage tables (`coverage-audit-2026-04-27/tables/fsm-coverage.md`) — Existential FSMs ≥80% legal + 100% illegal | This sheet | QA-on-call | ☐ | |
+
+**Sign-off owner:** ______________________ &nbsp;&nbsp;**Date:** ______ &nbsp;&nbsp;**Tag:** v__.__.__
+
+## Mutation Testing Targets & Kill Rate
+
+Mutation testing exposes which assertions are actually load-bearing — tests can pass against broken code if mutations survive, which is a coverage trap. The audit's Phase 0 attempted to run `go-mutesting` on the Existential cluster but was blocked by a Go 1.25 / arm64 incompatibility in `osutil@v1.6.1` (uses `syscall.Dup2` which is undefined on linux/arm64). The operator-runnable workaround uses a fork that targets `unix.Dup3` instead.
+
+| Package | Risk class | Target kill rate | Last measured | Tool |
+|---|---|---|---|---|
+| `internal/crypto` | Existential | ≥90% | unmeasured (sandbox-blocked, operator-runnable) | go-mutesting |
+| `internal/pkcs7` | Existential | ≥90% | unmeasured | go-mutesting |
+| `internal/connector/issuer/local` | Existential | ≥90% | unmeasured | go-mutesting |
+| `internal/connector/issuer/acme` | Existential | ≥80% (catch-up; failure-mode coverage 55.6% per Bundle J) | unmeasured | go-mutesting |
+| `internal/connector/issuer/stepca` | Existential | ≥85% (post-Bundle-L.B coverage at 90.4%) | unmeasured | go-mutesting |
+| `internal/api/middleware` | High | ≥80% | unmeasured | go-mutesting |
+| `internal/validation` | Existential (CWE-78 / CWE-113 boundary) | ≥90% | unmeasured | go-mutesting |
+| `web/src/utils/safeHtml.ts` | Frontend (XSS gate) | ≥90% | unmeasured | Stryker |
+
+### Operator command (per package)
+
+```bash
+# Use the avito-tech fork that supports linux/arm64 + Go 1.25.
+go install github.com/avito-tech/go-mutesting/cmd/go-mutesting@latest
+
+mkdir -p tool-output
+$(go env GOPATH)/bin/go-mutesting --debug ./internal/crypto/... \
+  > tool-output/mutation-crypto.txt 2>&1
+grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
+```
+
+**Acceptance:** ≥80% (Existential) / ≥70% (High). Anything below is a Medium finding; triage entries go in `coverage-audit-2026-04-27/gap-backlog.md`. This subsection moves mutation testing from "future work" to "documented release gate."
+
+## Adding New Tests
+
+When a new feature ships:
+
+1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
+2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
+3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
+4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
+5. **Cleanup**: always use `t.Cleanup()` for data created during tests
+6. **Skip if external**: use `t.Skip("Requires X — manual test")` with a clear reason
+
+## Version History
+
+- **v1.3** (April 2026, post-Bundle-P) — QA Doc Strengthening shipped. New top-of-doc Test Suite Health dashboard (regenerated via `make qa-stats`). New Coverage by Risk Class table after the Coverage Map. New Release Day Sign-Off Matrix and Mutation Testing Targets sections. CI seed-count + Part-count drift guards land in `.github/workflows/ci.yml` so future doc drift fails CI. Bundle P closes M-007 / M-010 / M-011 / M-012 (structural strengthening) + M-008 (Mutation Testing Targets).
+- **v1.2** (April 2026, post-coverage-audit) — Documented Parts 55–56 (I-004 Agent Soft-Retirement, I-005 Notification Retry & Dead-Letter) and surfaced Parts 23–24 (S/MIME & EKU; OCSP/CRL) as not-yet-automated. 56 Parts total in `testing-guide.md`; 49 live `Part_*` automation wrappers in `qa_test.go` + 4 new `Skip` stubs for Parts 23/24/55/56 = 53 wrappers (Parts 15–17 remain covered by source-checks in Parts 42–46). Reconciled seed-data section to actual `seed_demo.sql` counts (12 agents, 13 issuers; certs were already accurate at 32). Bundle I of the 2026-04-27 coverage-audit closure plan.
+- **v1.1** (April 2026) — Added Parts 53–54 (M47: Kubernetes Secrets target + AWS ACM PCA issuer). 54 Parts total, ~164 automated subtests.
+- **v1.0** (April 2026) — Initial release covering all 52 Parts of testing-guide.md v2.1. Replaces `qa-smoke-test.sh`.
@@ -0,0 +1,198 @@
+# certctl Testing Strategy & Deep-Scan Operator Runbook
+
+This doc covers the **testing topology** (per-PR fast gates vs. daily deep-scan
+gates), and the **operator runbook** for re-running each deep-scan tool locally
+when the CI receipt is ambiguous or when an operator wants to validate a fix
+before the next scheduled scan.
+
+For the manual end-to-end QA playbook, see [`testing-guide.md`](testing-guide.md).
+For the security posture / per-finding closure log, see [`security.md`](security.md).
+
+## CI workflow split
+
+certctl runs two GitHub Actions workflows:
+
+- **`.github/workflows/ci.yml`** — runs on every push/PR. Fast feedback only.
+  Includes `gofmt`, `go vet`, `golangci-lint`, `go test -short -count=1`,
+  `govulncheck`, the per-layer coverage gates, and the regression-grep guards
+  (the M-009 mutation budget, the L-001 InsecureSkipVerify guard, the H-001
+  Dockerfile SHA-pin guard, the M-012 USER-directive guard, etc.).
+- **`.github/workflows/security-deep-scan.yml`** — runs daily 06:00 UTC and on
+  manual dispatch. Heavyweight tools that need docker, network egress to
+  scanner registries, or wall-clock budgets the per-PR check can't tolerate.
+  Includes `gosec`, `osv-scanner`, the `-race -count=10` full-suite run,
+  `trivy` image scan, `syft` SBOM, ZAP baseline DAST, `nuclei`,
+  `schemathesis` OpenAPI fuzz, `testssl.sh`, `go-mutesting` mutation testing,
+  and `semgrep p/react-security`.
+
+Receipts from each scheduled run are uploaded as a 30-day-retention artefact
+named `security-deep-scan-<run-id>`. Audit them via the GitHub Actions UI;
+download the artefact zip for any scan that surfaces a finding.
+
+## Operator runbook — local re-run procedures
+
+These are the same commands the workflow runs, intended for an operator with
+a workstation that has docker + the Go toolchain installed. The local-run
+shape is identical to CI; the difference is wall-clock and the artefact
+location (CI uploads; local writes to `$PWD`).
+
+### Mutation testing (D-003)
+
+**Tool:** [`go-mutesting`](https://github.com/zimmski/go-mutesting). Mutates
+each AST node in turn (flips comparisons, swaps return values, removes
+statements) and re-runs the package's tests. A mutant is **killed** if any
+test fails; **surviving** mutants indicate a coverage gap (no test caught
+the bug the mutant introduced).
+
+**Targets:** the three security-critical packages whose coverage gate is
+**85%** in `ci.yml`:
+
+- `internal/crypto/`
+- `internal/pkcs7/`
+- `internal/connector/issuer/local/`
+
+**Acceptance threshold:** ≥80% mutation kill ratio per package. Surviving
+mutants below that threshold get triaged in
+`cowork/comprehensive-audit-2026-04-25/d003-mutation-results.md` — either
+ship a targeted unit test that kills the mutant, or document an
+equivalent-mutation justification.
+
+**Local run:**
+
+```
+go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
+for pkg in ./internal/crypto/... ./internal/pkcs7/... ./internal/connector/issuer/local/...; do
+  echo "=== $pkg ==="
+  $(go env GOPATH)/bin/go-mutesting "$pkg"
+done
+```
+
+The tool prints one line per mutant (`PASS` = killed, `FAIL` = surviving)
+plus a per-package summary `The mutation score is X.YZ`. CPU-bound, single
+core, takes ~10 minutes on a 2024-era laptop for the three packages combined.
+
+**Sandbox note:** `go-mutesting` writes a mutant copy of the source tree to
+`/tmp/go-mutesting/` per run; needs ≥2 GB free disk. Sandboxed CI runners
+are sized for this; constrained dev sandboxes are not.
+
+### DAST baseline (D-004)
+
+**Tool:** [OWASP ZAP `baseline`](https://www.zaproxy.org/docs/docker/baseline-scan/).
+Spiders the running server's URL surface and runs the OWASP-ZAP active+passive
+rule pack. **Baseline** mode skips the destructive active-scan rules; it's safe
+against a non-throwaway environment.
+
+**Target:** the live `deploy/docker-compose.yml` stack on `https://localhost:8443`.
+
+**Acceptance:** zero HIGH/CRITICAL alerts. WARN/INFO alerts get triaged in the
+ZAP report; some are unavoidable (e.g., HSTS preload-list nag is a deployment
+recommendation, not a server defect).
+
+**Local run:**
+
+```
+docker compose -f deploy/docker-compose.yml up -d
+sleep 20  # wait for /ready to flip OK; check `curl --cacert deploy/test/certs/ca.crt https://localhost:8443/ready`
+docker run --rm --network host \
+  -v "$PWD":/zap/wrk \
+  ghcr.io/zaproxy/zaproxy:stable \
+  zap-baseline.py -t https://localhost:8443 \
+  -r zap-report.html -J zap-report.json
+docker compose -f deploy/docker-compose.yml down
+```
+
+The HTML report opens in a browser; the JSON is machine-readable for triage.
+
+### TLS audit (D-005)
+
+**Tool:** [`testssl.sh`](https://testssl.sh/). Probes the TLS handshake and
+each enabled cipher suite; reports protocol-version weaknesses, cipher
+weaknesses, certificate-chain issues, and known CVE patterns (Heartbleed,
+ROBOT, BEAST, etc.).
+
+**Target:** the live stack on `https://localhost:8443`.
+
+**Acceptance:** zero HIGH/CRITICAL findings. certctl pins
+`tls.Config.MinVersion = tls.VersionTLS13` (`cmd/server/tls.go`), so anything
+that surfaces is either (a) a real defect, (b) a testssl false positive, or
+(c) a deployment-config issue worth documenting in the operator runbook.
+
+**Local run:**
+
+```
+docker compose -f deploy/docker-compose.yml up -d
+sleep 20
+docker run --rm --network host \
+  -v "$PWD":/data \
+  drwetter/testssl.sh:latest \
+  --jsonfile /data/testssl.json https://localhost:8443
+docker compose -f deploy/docker-compose.yml down
+
+# Filter to actionable severities
+jq '[.scanResult[] | select(.severity == "HIGH" or .severity == "CRITICAL")]' testssl.json
+```
+
+### Frontend semgrep (D-007)
+
+**Tool:** [`semgrep`](https://semgrep.dev/) with the maintained
+[`p/react-security` ruleset](https://semgrep.dev/p/react-security). Catches
+React-specific XSS / injection patterns: `dangerouslySetInnerHTML` without
+sanitization, `target="_blank"` without `rel="noopener noreferrer"`,
+`href={userInput}`, `eval`, `document.write`, etc.
+
+**Target:** the frontend source tree at `web/src/`.
+
+**Acceptance:** zero findings. Bundle 8 already verified
+`dangerouslySetInnerHTML` count at zero and the `target="_blank"`
+rel-noopener pin via simple grep guards in `ci.yml`; semgrep adds defence
+in depth — it catches escape patterns the greps don't see (e.g.,
+`href={user_input}`, runtime `eval`, `document.write`).
+
+**Local run:**
+
+```
+docker run --rm -v "$PWD":/src returntocorp/semgrep:latest \
+  semgrep --config=p/react-security --json /src/web/src \
+  > semgrep-react.json
+
+# Count findings
+jq '.results | length' semgrep-react.json
+
+# Pretty-print findings
+jq '.results[] | {rule_id: .check_id, path, line: .start.line, message: .extra.message}' semgrep-react.json
+```
+
+If the count is non-zero, every result has a `check_id` (e.g.
+`react.dangerouslySetInnerHTML`) and a `message` describing the escape
+pattern. Triage each: either fix the call site, or — for legitimate edge
+cases — add a `// nosem: <check_id> — <reason>` directive on the
+preceding line.
+
+## Cadence
+
+| Tool                 | Trigger                            | Wall-clock | Owner          |
+|----------------------|------------------------------------|------------|----------------|
+| go-mutesting         | daily deep-scan + manual dispatch  | ~10 min    | maintainers    |
+| ZAP baseline (DAST)  | daily deep-scan + manual dispatch  | ~5 min     | maintainers    |
+| testssl.sh           | daily deep-scan + manual dispatch  | ~3 min     | maintainers    |
+| semgrep react        | daily deep-scan + manual dispatch  | ~1 min     | maintainers    |
+| `make verify`        | every commit (pre-push)            | ~1 min     | every developer |
+| ci.yml fast gates    | every push/PR                      | ~3 min     | every developer |
+
+Re-run any of the deep-scan tools locally when:
+
+- A CI receipt surfaces an unexpected finding and you want to bisect against
+  a local change before pushing.
+- You're cutting a release tag and want belt-and-suspenders evidence beyond
+  the most recent scheduled scan.
+- You're adding a new feature in the relevant surface (crypto code →
+  re-run mutation testing; new HTTP handler → re-run schemathesis + ZAP;
+  new TLS-config knob → re-run testssl).
+
+## Related docs
+
+- [`docs/security.md`](security.md) — security posture, per-finding closure log.
+- [`docs/testing-guide.md`](testing-guide.md) — manual end-to-end QA playbook.
+- [`.github/workflows/ci.yml`](../.github/workflows/ci.yml) — per-PR fast gates.
+- [`.github/workflows/security-deep-scan.yml`](../.github/workflows/security-deep-scan.yml) — daily deep-scan gates.
+- [`scripts/install-security-tools.sh`](../scripts/install-security-tools.sh) — Go-host-installed tools (the docker-based tools are not in this script).