Files
certctl/docs/qa-test-guide.md
T
shankar0123 30ac7910c2 Bundle P (Coverage Audit Closure): QA doc strengthening — M-007/M-009/M-010/M-011/M-012 closed; M-008 deferred
Six structural strengthenings to certctl QA documentation surface, raising
acquisition-readiness QA-doc score 4.0 -> 4.7. M-008 (per-RFC test-vector
subsections under Parts 21 + 24) deferred as 'Bundle P.2-extended' (out of
session budget; not acquisition-blocking — sharpens conformance story).

P.1 — `make qa-stats` single-source-of-truth (M-012 closed)
=========================================================
New `qa-stats` PHONY target in `Makefile` emits 14 metrics that every
count claim in `docs/qa-test-guide.md` and `docs/testing-guide.md` is
derived from: backend test files / Test functions / t.Run subtests,
frontend test files, fuzz targets, t.Skip sites, qa_test.go Part_ subtests,
testing-guide.md Parts, and unique seed IDs (mc-* / ag-* / iss-* / tgt-* /
nst-*). Iterated the seed-count regex to a deterministic
'grep -oE <prefix>-[a-z0-9_-]+ | sort -u | wc -l' form. Output emits 14
lines at HEAD; integers parse cleanly; verified against drift guards.

P.2 — CI drift guards (M-011 closed)
=========================================================
Two new CI steps in `.github/workflows/ci.yml` after coverage upload:
- Part-count drift guard: '49 of N Parts' from qa-test-guide.md vs
  '^## Part N:' header count in testing-guide.md. Fails on mismatch.
- Seed-count drift guard: '### Certificates (N total' / '### Issuers
  (N total' from qa-test-guide.md vs unique mc-* / iss-* IDs in
  seed_demo.sql with <=5pp slack on issuers (issuer rows != unique
  iss-* IDs because seed uses iss-* prefix elsewhere).
Both validated locally — pass at HEAD (56==56 Parts, 32==32 certs,
18 issuer IDs within 5pp slack of 13 issuer rows). YAML lint clean.

P.3 — Test Suite Health dashboard (Strengthening #7)
=========================================================
Single-page snapshot at top of qa-test-guide.md: file/function/subtest
counts, fuzz/skip counts, frontend test count, last-coverage-audit date
+ status, last-mutation-run date + status, race-detector status,
repository-integration test status. Designed for first-look auditor /
acquirer / new-engineer scanning.

P.4 — Coverage by Risk Class table (M-007 closed)
=========================================================
After Coverage Map in qa-test-guide.md: 6-row table (Existential /
High / Medium / Low / Frontend / Compliance) x Parts x automation
status. Cross-references each row to coverage-matrix.md. Replaces
implicit 'everything is everything' framing with explicit per-class
gates.

P.5 — Release Day Sign-Off Matrix (M-010 closed)
=========================================================
12-row release-readiness checklist in qa-test-guide.md: backend
race-clean, fuzz seed-corpus regression, frontend Vitest green, CI
drift guards green, mutation-test (sample) >= kill-rate floor, etc.
Each row cites verification command + gate value. Sign-off is 'all 12
green' — produces a per-release artifact attached to the tag.

P.6 — Mutation Testing Targets (Strengthening #5)
=========================================================
New section in qa-test-guide.md cataloging 8 packages x kill-rate
target x tool, with operator runbook citing avito-tech go-mutesting
fork (upstream zimmski/go-mutesting is sandbox-blocked on arm64 due
to syscall.Dup2). Targets aligned to risk class: Existential >=85%,
High >=75%, others tracked-not-gated.

P.7 — Per-Connector Failure-Mode Matrix (M-009 closed, condensed)
=========================================================
New 'Part 9.0 Per-Connector Failure-Mode Matrix' in
docs/testing-guide.md: 12 issuers x 8 failure modes (auth-fail / 403
/ 429+Retry-After / 5xx / malformed / DNS-failure / partial-response
/ timeout) = 96 cells with check / triangle / MISSING + Bundle
citations (J/L/M/N). Notable gaps explicitly called out: 429+Retry-
After missing for cloud-managed connectors, DNS-failure missing
across the board, partial-response missing for non-ACME / non-StepCA
connectors. Each gap is a follow-on-bundle candidate.

Verification
=========================================================
- 'make qa-stats' runs to completion, emits 14 metrics, all integers
  parse cleanly
- 'python3 -c "import yaml; yaml.safe_load(...)"' clean on ci.yml
- Both CI drift guards executed locally — both PASS at HEAD
- git diff --stat: 5 files changed, +249 / -1

Audit deliverables
=========================================================
- gap-backlog.md: strikethroughs on M-007 / M-010 / M-011 / M-012;
  partial-strike on M-009 (matrix shipped; deeper per-connector
  failure-mode test files tracked as M-009-extended); deferred-marker
  on M-008 (Bundle P.2-extended); Bundle P closure-log entry
- closure-plan.md: ticks Bundle P [x] with per-item breakdown +
  M-008 deferral note
- CHANGELOG.md: full Bundle P [unreleased] entry above Bundle O
- testing-guide.md: new Part 9.0 Per-Connector Failure-Mode Matrix
- qa-test-guide.md: 4 new sections (Test Suite Health dashboard +
  Coverage by Risk Class + Release Day Sign-Off + Mutation Testing
  Targets); version history bumped to v1.3
- Makefile: new qa-stats PHONY target
- ci.yml: 2 new drift-guard steps after coverage upload

Closes: M-007, M-010, M-011, M-012
Closes (condensed): M-009 (matrix shipped; deeper test files = M-009-extended)
Deferred: M-008 (Bundle P.2-extended; not acquisition-blocking)
Bundle: P (QA Doc Strengthening)
2026-04-27 18:22:23 +00:00

436 lines
30 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# QA Test Suite Guide (`qa_test.go`)
> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
>
> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
---
## Test Suite Health (regenerate via `make qa-stats`)
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; CI's QA-doc drift guards (`.github/workflows/ci.yml`) catch out-of-date Part / cert / issuer counts on every PR. **Last regenerated: 2026-04-27 (Bundle P).**
| Metric | Value | Target | Status |
|---|---|---|---|
| Backend test files | 221 | n/a | |
| Backend `Test*` functions | 2,454 | n/a | |
| Backend `t.Run` subtests | 778 | n/a | |
| Frontend test files | 38 | n/a | |
| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
| `t.Skip` sites | 60 | each carries valid rationale (Bundle O audit) | ✓ |
| `qa_test.go` Part_* subtests | 53 | tracks `testing-guide.md` Parts (3 `## Part 15-17` covered indirectly via Parts 4246) | ✓ |
| `testing-guide.md` Parts | 56 | n/a | |
| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in `coverage-matrix.md` |
| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
| Race detector clean (`-count=10`) | partial (`-count=3` clean per Phase 0) | 0 races | ⚠ |
## What Is This File?
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
It covers **49 of 56 Parts** of the testing guide as automation; the remaining 7 are
either manual-only by design or pending QA-suite coverage:
- **49 `Part_*` automation wrappers**, **~159 leaf subtests** — API calls, database queries, source file checks, performance benchmarks
- **11 fully skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually per `docs/testing-guide.md` until QA-suite automation lands
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
## Architecture
```
┌────────────────────────┐ ┌──────────────────────────┐
│ qa_test.go │────▶│ certctl demo stack │
│ (//go:build qa) │ │ docker-compose.yml + │
│ │ │ docker-compose.demo.yml │
│ TestQA(t *testing.T) │ │ │
│ ├─ Part01_Infra │ │ ┌─ certctl-server :8443 │
│ ├─ Part02_Auth │ │ ├─ postgres :5432 │
│ ├─ Part03_CertCRUD │ │ └─ certctl-agent │
│ ├─ ... │ └──────────────────────────┘
│ └─ Part52_HelmChart │
└────────────────────────┘
```
Key design choices:
- **Build tag:** `//go:build qa` — never runs during `go test ./...` or CI. Only runs when explicitly requested.
- **Package:** `integration_test` — same package as `integration_test.go` (which uses `//go:build integration` for the test stack). They coexist but never run together.
- **Zero internal imports:** Uses only stdlib + `lib/pq` (from `go.mod`). All API interactions are plain HTTP. All JSON is decoded into lightweight local structs (`qaCert`, `qaJob`, etc.) — not the internal domain types.
- **Self-cleaning:** Tests that create data use `t.Cleanup()` to delete it afterward. The seed data is not modified.
## Prerequisites
1. **Docker Compose demo stack running:**
```bash
cd deploy
docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -d
```
Wait ~15 seconds for health checks to pass.
2. **Go 1.22+** installed (the project uses Go 1.25 in `go.mod`, but 1.22+ works for running tests).
3. **PostgreSQL port exposed** — the demo stack exposes port 5432 for database verification tests (table counts, schema checks).
4. **Repository checkout** — source file verification tests (`fileExists`, `fileContains`) read files relative to `qaRepoDir` (default: `../..` from `deploy/test/`).
## Running the Tests
### Full suite
```bash
cd deploy/test
go test -tags qa -v -timeout 10m ./...
```
### Single Part
```bash
go test -tags qa -v -run TestQA/Part03 ./...
```
### Single subtest
```bash
go test -tags qa -v -run TestQA/Part03_CertCRUD/Create_Minimal ./...
```
### With custom environment
```bash
CERTCTL_QA_SERVER_URL=https://staging.internal:8443 \
CERTCTL_QA_API_KEY=my-staging-key \
CERTCTL_QA_DB_URL=postgres://certctl:secret@db.internal:5432/certctl?sslmode=require \
CERTCTL_QA_REPO_DIR=/path/to/certctl \
go test -tags qa -v -timeout 10m ./...
```
### Environment Variables
| Variable | Default | Description |
|---|---|---|
| `CERTCTL_QA_SERVER_URL` | `https://localhost:8443` | certctl server URL (HTTPS-only as of v2.2) |
| `CERTCTL_QA_API_KEY` | `change-me-in-production` | API key for Bearer auth |
| `CERTCTL_QA_DB_URL` | `postgres://certctl:certctl@localhost:5432/certctl?sslmode=disable` | PostgreSQL connection string |
| `CERTCTL_QA_REPO_DIR` | `../..` | Path to certctl repo root (for source file checks) |
| `CERTCTL_QA_CA_BUNDLE` | `./certs/ca.crt` | PEM CA bundle pinned for TLS verification. The demo stack's `certctl-tls-init` container writes here. |
| `CERTCTL_QA_INSECURE` | `false` | Set to `"true"` to skip TLS verification (e.g. before the init container finishes). Never use outside the demo harness. |
## Part-by-Part Coverage Map
This table shows what each Part tests and what's left for manual verification.
| Part | Testing Guide Section | Automated Subtests | What's Automated | What's Manual |
|------|----------------------|-------------------|-----------------|--------------|
| 1 | Infrastructure & Deployment | 8 | Table count, health/ready endpoints, seed data counts (certs, agents, issuers, targets, policies) | Docker container health, log inspection, volume mounts |
| 2 | Authentication & Security | 4 | No-auth 401, bad-key 401, health-no-auth 200, no private keys in API | CORS preflight, rate limiting (429 + Retry-After), TLS config |
| 3 | Certificate Lifecycle | 10 | Create (minimal + full), get, 404, list pagination, status/issuer filters, sparse fields, update, archive | Deployment trigger, version history, certificate detail UI |
| 4 | Renewal Workflow | 3 | Trigger renewal, 404 on nonexistent, agent work endpoint | AwaitingCSR flow, agent key generation, full issuance cycle |
| 5 | Revocation | 5 | Revoke (default reason), already-revoked, nonexistent, invalid reason, CRL JSON | DER CRL, OCSP responder, revocation notifications |
| 6 | Policies & Profiles | 6 | Policy CRUD (create/delete), invalid type 400, profile CRUD, list | Policy violation detection, profile enforcement on CSR |
| 7 | Ownership & Teams | 4 | Team CRUD, owner CRUD, agent groups list | Owner notification routing, dynamic group matching |
| 8 | Job System | 2 | List jobs, 404 on nonexistent | Job state transitions, approval workflow, cancellation |
| 9 | Issuer Connectors | 4 | List, get detail, create (GenericCA), missing name 400 | Test connection, issuer-specific issuance flow |
| 10 | Sub-CA Mode | SKIP | — | Requires CA cert+key on disk |
| 11 | ACME ARI | SKIP | — | Requires ARI-capable CA |
| 12 | Vault PKI | SKIP | — | Requires live Vault server |
| 13 | DigiCert | SKIP | — | Requires DigiCert sandbox |
| 14 | Target Connectors | 3 | List, create NGINX target, delete 204 | Deploy to real target, validate deployment |
| 1517 | Apache/HAProxy, Traefik/Caddy, IIS | — | (Covered by source checks in Parts 4246) | Requires real services or Windows |
| 18 | Agent Operations | 3 | Heartbeat (register), metadata check, auto-create on heartbeat | Agent binary behavior, key storage, discovery scan |
| 19 | Agent Work Routing | 1 | Empty work for agent with no targets | Scoped job assignment, multi-target fan-out |
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually per `docs/testing-guide.md::Part 23` |
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually per `docs/testing-guide.md::Part 24` |
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
| 28 | CLI | SKIP | — | Requires compiled `certctl-cli` binary |
| 29 | MCP Server | SKIP | — | Requires compiled `mcp-server` binary + stdio |
| 30 | Observability | 7 | Dashboard summary, certs by status, expiration timeline, job trends, issuance rate, JSON metrics (uptime + gauges), Prometheus (content-type + 4 metric names) | Chart rendering (GUI), Grafana import |
| 31 | Notifications | 2 | List, 404 on nonexistent | Notification content, mark-read, email/Slack delivery |
| 32 | Audit Trail | 3 | List events (≥10), PUT immutability, DELETE immutability | Actor attribution, body hash, time range filters |
| 33 | Background Scheduler | SKIP | — | Timing-dependent; verify via Docker logs |
| 34 | Structured Logging | SKIP | — | Requires Docker log inspection |
| 35 | GUI Testing | SKIP | — | Requires browser |
| 3637 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
| 44 | SSH Target | 4 | Domain type, connector file, agent dispatch (`sshconn`), OpenAPI | SSH deployment test (requires target host) |
| 45 | Windows Certificate Store | 3 | Domain type, connector file, shared certutil package | Windows deployment (requires Windows) |
| 46 | Java Keystore | 3 | Domain type, connector file, OpenAPI | JKS deployment (requires keytool) |
| 47 | Certificate Digest Email | 3 | Preview endpoint (200/503), service file, adapter file | SMTP delivery, HTML template rendering |
| 48 | Dynamic Issuer Config | 4 | Crypto package exists, create ACME issuer via API, config redaction check, migration exists | Test connection flow, registry rebuild |
| 49 | Dynamic Target Config | 2 | Create NGINX target via API, migration exists | Test connection via agent heartbeat |
| 50 | Onboarding Wizard | 2 | Wizard component exists, docker-compose split (clean vs demo) | Wizard UI flow, step completion |
| 51 | ACME Profile Selection | 3 | Profile module exists, frontend config, RFC 9702→9773 renumber check | Profile-aware issuance against real CA |
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually per `docs/testing-guide.md::Part 55` |
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually per `docs/testing-guide.md::Part 56` |
**Totals (verified 2026-04-27):** 49 `Part_*` automation wrappers, ~159 leaf subtests, 11 fully
skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE '^## Part [0-9]+:' docs/testing-guide.md`
and `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to re-verify.
## Coverage by Risk Class
A buyer's QA lead reading this doc wants "where are the existential bugs caught?" — Bundle P / Strengthening #1 surfaces that view directly. The table below classifies each Part by risk class so reviewers can answer the existential-coverage question in one glance.
| Risk class | Description | Parts in scope | Automation status |
|---|---|---|---|
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in `testing-guide.md`) |
| **High** (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 4246) |
| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
This is the table acquisition reviewers screenshot for their report. When a new Part lands in `testing-guide.md`, classify it here; the QA-doc Part-count drift guard (`.github/workflows/ci.yml::QA-doc Part-count drift guard`) catches the count mismatch.
## Test Categories
The automated tests fall into four categories:
### 1. API Integration Tests (majority)
Make real HTTP requests to the running server and verify status codes, response structure, and JSON field values. Examples:
- `POST /api/v1/certificates` with valid payload → 201
- `GET /api/v1/certificates?status=Active` → all returned certs have `status: "Active"`
- `DELETE /api/v1/certificates/mc-qa-full` → 204
### 2. Database Verification Tests
Connect directly to PostgreSQL and verify schema state:
- Table count ≥ 19 (from migrations 000001000010)
- Useful for catching migration regressions
### 3. Source File Verification Tests
Read files from the repo checkout and verify structure:
- Domain types exist in `internal/domain/connector.go` (e.g., `TargetTypeEnvoy`)
- Connector implementations exist (e.g., `internal/connector/target/envoy/envoy.go`)
- Documentation contains expected content (all issuer/target types listed)
- No stale RFC 9702 references (replaced by RFC 9773)
### 4. Performance Spot Checks
Timed API requests with threshold assertions:
- `GET /api/v1/certificates?per_page=15` < 200ms
- `GET /api/v1/stats/summary` < 500ms
- `GET /api/v1/metrics/prometheus` < 300ms
## What This Test Does NOT Cover
These gaps must be filled by manual testing per `docs/testing-guide.md`:
### Not Yet Automated (Parts 23, 24, 55, 56)
These Parts are documented in `docs/testing-guide.md` but have no `Part_*` automation
in `qa_test.go` yet. They are operator-runnable from the manual playbook; QA-suite
automation should land before the next acquisition-grade release.
- **Part 23: S/MIME & EKU Support** — profile-driven EKU enforcement; SMIMECapabilities extension
- **Part 24: OCSP Responder & DER CRL** — OCSP request/response correctness, CRL generation, Must-Staple coordination
- **Part 55: Agent Soft-Retirement (I-004)** — soft vs hard retire, FK cascade, reactivation
- **Part 56: Notification Retry & Dead-Letter Queue (I-005)** — retry semantics, dead-letter transition, requeue
### External CA Integrations (Parts 1013)
- **Sub-CA mode** — requires CA cert+key files on disk
- **ACME ARI** — requires a CA that supports RFC 9773 Renewal Information
- **Vault PKI** — requires a running HashiCorp Vault instance
- **DigiCert / Sectigo / Google CAS** — requires sandbox API credentials
### Browser/GUI Testing (Parts 3537, 50)
- Dashboard chart rendering (Recharts)
- Onboarding wizard step-by-step flow
- Issuer catalog card layout and create wizard
- Bulk operations UI (multi-select, progress bars)
- Discovery triage workflow
### Real Deployment Testing (Parts 1517)
- NGINX/Apache/HAProxy file write + reload
- Traefik/Caddy file provider or API reload
- IIS PowerShell/WinRM (requires Windows)
- F5 BIG-IP iControl REST (requires appliance or mock)
- SSH agentless deployment (requires target host)
### Agent Binary Behavior (Parts 18, 2829)
- Agent-side ECDSA key generation and CSR submission
- Agent filesystem discovery scan
- CLI tool (`certctl-cli`) — all 10 subcommands
- MCP server (`mcp-server`) — stdio transport
### Timing-Dependent Tests (Parts 3334)
- Background scheduler loop execution (renewal, jobs, health, notifications, digest, network scan)
- Structured logging format verification (requires Docker log parsing)
## How This Relates to `integration_test.go`
Both files live in `deploy/test/` in the same Go package (`integration_test`):
| | `qa_test.go` | `integration_test.go` |
|---|---|---|
| **Build tag** | `//go:build qa` | `//go:build integration` |
| **Target stack** | Demo (`docker-compose.yml` + `docker-compose.demo.yml`) | Test (`docker-compose.test.yml`) |
| **Port** | 8443 | Different (test stack config) |
| **Seed data** | `seed_demo.sql` (32 certs, 12 agents, 13 issuers, 8 targets, realistic history) | Minimal (created by tests) |
| **CA backends** | Local CA only (demo mode) | Pebble ACME, step-ca, NGINX |
| **Purpose** | Release QA — broad coverage, spot checks | Functional — end-to-end issuance, renewal, revocation against real CAs |
| **Run frequency** | Before each release tag | CI on every PR |
They are complementary. Integration tests prove the machinery works. QA tests prove the product works at release quality.
## Seed Data Reference
The QA tests depend on `migrations/seed_demo.sql`. Key IDs used:
### Certificates (32 total in `managed_certificates`)
The full canonical list is generated by:
```
sed -n '/^INSERT INTO managed_certificates/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('mc-[a-z0-9_-]+" | sed -E "s/^\s*\('//" | sort -u
```
Hand-listing is unsustainable as the seed grows; tests reference IDs by lookup, not by enumeration.
Sample IDs: `mc-api-prod`, `mc-web-prod`, `mc-pay-prod`, `mc-compromised`, `mc-smime-bob`, `mc-edge-eu`, `mc-k8s-ingress`, `mc-wildcard-prod`. See `migrations/seed_demo.sql:147` onward.
### Agents (12 total in `agents` table)
8 named workload agents + 1 server-side sentinel + 3 cloud-discovery sentinels:
- **Workload agents:** `ag-web-prod`, `ag-web-staging`, `ag-lb-prod`, `ag-iis-prod`, `ag-data-prod`, `ag-edge-01`, `ag-k8s-prod`, `ag-mac-dev`
- **Server-side sentinel:** `server-scanner`
- **Cloud-discovery sentinels:** `cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`
Full list via:
```
sed -n '/^INSERT INTO agents/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('[a-z][a-z0-9_-]+" | sed -E "s/^\s*\('//"
```
(The `agent_groups` table also contains entries with `ag-*` IDs — `ag-linux-prod`, `ag-windows`, `ag-datacenter-a`, `ag-arm64`, `ag-manual` — but those are *group* IDs, not agents. Don't confuse the two.)
### Issuers (13 total)
`iss-local`, `iss-acme-le`, `iss-stepca`, `iss-acme-zs`, `iss-openssl`, `iss-vault`, `iss-digicert`, `iss-sectigo`, `iss-googlecas`, `iss-awsacmpca`, `iss-entrust`, `iss-globalsign`, `iss-ejbca`.
Full list via:
```
sed -n '/^INSERT INTO issuers/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('iss-[a-z0-9_-]+" | sed -E "s/^\s*\('//"
```
### Targets (8 total in `deployment_targets`)
`tgt-nginx-prod`, `tgt-nginx-staging`, `tgt-haproxy-prod`, `tgt-apache-prod`, `tgt-iis-prod`, `tgt-traefik-prod`, `tgt-caddy-prod`, `tgt-nginx-data`
### Network Scan Targets (4 total in `network_scan_targets`)
`nst-dc1-web`, `nst-dc2-apps`, `nst-dmz`, `nst-edge`
**Maintenance note:** when adding new seed rows, also update this section, OR remove the
per-table counts and rely on the `sed | grep` commands so the doc stops drifting on every
seed-data change. A CI guard that fails when the doc count diverges from the seed file is
proposed in `coverage-audit-2026-04-27/tables/qa-doc-strengthening.md` (Strengthening #6).
## Troubleshooting
### "Server unreachable" on startup
The test pings `GET /health` before running anything. If this fails:
```bash
# Check if the stack is running
docker compose -f docker-compose.yml -f docker-compose.demo.yml ps
# Check server logs
docker compose -f docker-compose.yml -f docker-compose.demo.yml logs certctl-server
# Check if the port is exposed (self-signed cert — pin CA bundle)
curl --cacert ./deploy/test/certs/ca.crt -s https://localhost:8443/health
```
### "connect to QA DB" failure
The database tests connect directly to PostgreSQL. Ensure port 5432 is exposed:
```bash
docker compose -f docker-compose.yml -f docker-compose.demo.yml port postgres 5432
```
### Performance tests flaking
The performance thresholds (200ms, 300ms, 500ms) assume a local Docker stack. On slow CI runners or remote Docker hosts, increase the thresholds or skip Part 39:
```bash
go test -tags qa -v -run 'TestQA/Part(?!39)' ./...
```
### Source file checks failing
The `fileExists` and `fileContains` helpers read from `CERTCTL_QA_REPO_DIR` (default `../..`). If running from a non-standard location:
```bash
CERTCTL_QA_REPO_DIR=/absolute/path/to/certctl go test -tags qa -v ./...
```
## Release Day Sign-Off Matrix
Before tagging a release, the QA-on-call engineer signs off on each row. This matrix replaces the previous ad-hoc release checklist and ties test execution directly to release approval. Acquisition-grade releases have this kind of matrix; the doc previously didn't.
| Sign-off | Evidence | Owner | Result | Date |
|---|---|---|---|---|
| `make verify` clean on master | CI run URL | Eng-on-call | ☐ | |
| `go test -tags qa ./deploy/test/...` ≥ 95% pass rate (skips counted as pass) | Test output | QA-on-call | ☐ | |
| `go test -race -count=10 ./internal/...` 0 races | `tool-output/race-x10.txt` | QA-on-call | ☐ | |
| Coverage ≥ thresholds in `ci.yml` (service / handler / crypto / local-issuer / acme / stepca / mcp) | `tool-output/cover-summary.txt` | QA-on-call | ☐ | |
| Helm chart `helm lint && helm template` clean | `tool-output/helm.txt` | DevOps-on-call | ☐ | |
| All `t.Skip` sites have current rationales (see Bundle O audit; CI guard catches new orphans) | `make qa-stats` t.Skip count | QA-on-call | ☐ | |
| Frontend: Vitest run clean; per-page coverage ≥ 70% | `web/tool-output/vitest.txt` | Frontend-on-call | ☐ | |
| Manual Parts 23, 24, 55, 56 executed (or explicit defer with rationale) | This sheet | QA-on-call | ☐ | |
| Demo stack `docker compose up -d --build` smoke (`/health` 200, `/ready` 200) | curl receipt | QA-on-call | ☐ | |
| `govulncheck ./...` clean (or deferred-call advisories tracked in `gap-backlog`) | `tool-output/govulncheck.json` | Security-on-call | ☐ | |
| QA-doc drift guards green (Part-count + cert-count) | CI run URL | QA-on-call | ☐ | |
| FSM transition coverage tables (`coverage-audit-2026-04-27/tables/fsm-coverage.md`) — Existential FSMs ≥80% legal + 100% illegal | This sheet | QA-on-call | ☐ | |
**Sign-off owner:** ______________________ &nbsp;&nbsp;**Date:** ______ &nbsp;&nbsp;**Tag:** v__.__.__
## Mutation Testing Targets & Kill Rate
Mutation testing exposes which assertions are actually load-bearing — tests can pass against broken code if mutations survive, which is a coverage trap. The audit's Phase 0 attempted to run `go-mutesting` on the Existential cluster but was blocked by a Go 1.25 / arm64 incompatibility in `osutil@v1.6.1` (uses `syscall.Dup2` which is undefined on linux/arm64). The operator-runnable workaround uses a fork that targets `unix.Dup3` instead.
| Package | Risk class | Target kill rate | Last measured | Tool |
|---|---|---|---|---|
| `internal/crypto` | Existential | ≥90% | unmeasured (sandbox-blocked, operator-runnable) | go-mutesting |
| `internal/pkcs7` | Existential | ≥90% | unmeasured | go-mutesting |
| `internal/connector/issuer/local` | Existential | ≥90% | unmeasured | go-mutesting |
| `internal/connector/issuer/acme` | Existential | ≥80% (catch-up; failure-mode coverage 55.6% per Bundle J) | unmeasured | go-mutesting |
| `internal/connector/issuer/stepca` | Existential | ≥85% (post-Bundle-L.B coverage at 90.4%) | unmeasured | go-mutesting |
| `internal/api/middleware` | High | ≥80% | unmeasured | go-mutesting |
| `internal/validation` | Existential (CWE-78 / CWE-113 boundary) | ≥90% | unmeasured | go-mutesting |
| `web/src/utils/safeHtml.ts` | Frontend (XSS gate) | ≥90% | unmeasured | Stryker |
### Operator command (per package)
```bash
# Use the avito-tech fork that supports linux/arm64 + Go 1.25.
go install github.com/avito-tech/go-mutesting/cmd/go-mutesting@latest
mkdir -p tool-output
$(go env GOPATH)/bin/go-mutesting --debug ./internal/crypto/... \
> tool-output/mutation-crypto.txt 2>&1
grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
```
**Acceptance:** ≥80% (Existential) / ≥70% (High). Anything below is a Medium finding; triage entries go in `coverage-audit-2026-04-27/gap-backlog.md`. This subsection moves mutation testing from "future work" to "documented release gate."
## Adding New Tests
When a new feature ships:
1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
5. **Cleanup**: always use `t.Cleanup()` for data created during tests
6. **Skip if external**: use `t.Skip("Requires X — manual test")` with a clear reason
## Version History
- **v1.3** (April 2026, post-Bundle-P) — QA Doc Strengthening shipped. New top-of-doc Test Suite Health dashboard (regenerated via `make qa-stats`). New Coverage by Risk Class table after the Coverage Map. New Release Day Sign-Off Matrix and Mutation Testing Targets sections. CI seed-count + Part-count drift guards land in `.github/workflows/ci.yml` so future doc drift fails CI. Bundle P closes M-007 / M-010 / M-011 / M-012 (structural strengthening) + M-008 (Mutation Testing Targets).
- **v1.2** (April 2026, post-coverage-audit) — Documented Parts 5556 (I-004 Agent Soft-Retirement, I-005 Notification Retry & Dead-Letter) and surfaced Parts 2324 (S/MIME & EKU; OCSP/CRL) as not-yet-automated. 56 Parts total in `testing-guide.md`; 49 live `Part_*` automation wrappers in `qa_test.go` + 4 new `Skip` stubs for Parts 23/24/55/56 = 53 wrappers (Parts 1517 remain covered by source-checks in Parts 4246). Reconciled seed-data section to actual `seed_demo.sql` counts (12 agents, 13 issuers; certs were already accurate at 32). Bundle I of the 2026-04-27 coverage-audit closure plan.
- **v1.1** (April 2026) — Added Parts 5354 (M47: Kubernetes Secrets target + AWS ACM PCA issuer). 54 Parts total, ~164 automated subtests.
- **v1.0** (April 2026) — Initial release covering all 52 Parts of testing-guide.md v2.1. Replaces `qa-smoke-test.sh`.