Per operator decision the framework-mapping docs are gone. They
were aspirational (no audit, no certification, no validated
mapping); keeping them around was misleading.
Files deleted (1,883 lines):
- docs/compliance/index.md
- docs/compliance/soc2.md
- docs/compliance/pci-dss.md
- docs/compliance/nist-sp-800-57.md
Hyperlinks removed:
- README.md: 'Auditor / compliance' row in the doc table; the
'(compliance mapping included)' parenthetical in the
positioning paragraph
- docs/README.md: the '## Compliance' section table; the
'Auditor / compliance team' reading-order-by-role row
Prose name-drops swept across 24 files:
- README.md: 'FedRAMP boundary CAs / financial-services policy
CAs' → '4-level boundary CAs / 3-level policy CAs';
'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High,
SOC 2 Type II, HIPAA' → cut entirely
- getting-started/{quickstart,concepts,examples,why-certctl,
advanced-demo}.md: 'compliance' → 'audit' / 'policy';
'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut;
''pci': 'true'' tag example → ''environment': 'production''
- migration/cert-manager-coexistence.md: 'compliance rules' →
'policy rules'
- operator/approval-workflow.md: 'Compliance customers (PCI-DSS
Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' →
'Operators'; entire 'Compliance control mapping' table
(PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1
/ HIPAA §164.308(a)(4)) deleted; 'compliance contract' →
'two-person-integrity contract'; 'compliance auditors' →
'reviewers'
- operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5'
audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5
attestation' section retitled to 'TLS posture summary' and
rewritten without framework framing; 'PCI-DSS, NIST, and
major browsers will eventually deprecate TLS 1.2' →
'Major browsers and OS vendors will eventually deprecate
TLS 1.2'
- operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref →
CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS
Req 4 v4.0 prose footing → cut
- operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI
procurement-team deliverable' → 'on-call deliverable';
'compliance auditors' → 'reviewers'
- reference/connectors/{acme,aws-acm,azure-kv,globalsign,
local-ca,openssl,ssh,index}.md: 'compliance reporting
(PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting';
'Compliance environments (PCI-DSS Level 1, FedRAMP High,
HIPAA)' → 'Regulated environments'; 'compliance audits' →
'audit'; 'FedRAMP boundary CA' pattern names →
'4-level boundary CA' (technically descriptive)
- reference/protocols/est.md: 'compliance-hook seam' →
'device-state hook seam'; 'compliance gating' → 'device-state
gating'; 'est_compliance_failed' → 'est_device_state_failed'
- reference/protocols/scep-intune.md: 'Optional compliance
check' → 'Optional device-state check'; failure-counter
'compliance_failed' → 'device_state_failed'; 'Conditional
Access compliance gating' → 'Conditional Access
device-state gating'
- reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA
deployments where the regulator requires...' →
'Boundary-CA deployments where you want separation of policy
and issuing authorities'; pattern A retitled '4-level FedRAMP
boundary CA' → '4-level boundary CA'
- reference/architecture.md: broken Related-docs link to
compliance.md removed; the rest of that block had stale
pre-Phase-2 paths (quickstart.md, demo-advanced.md,
connectors.md, openapi.md, testing-guide.md, test-env.md) —
retargeted to current locations
- reference/deployment-model.md: 'SOC 2 evidence-report
generator' → 'Audit-evidence report generator'
- reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this
into evidence packs' → 'reviewers paste this into
vendor-evaluation packs'
- contributor/qa-test-suite.md: 'compliance exist' coverage
description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)'
risk-class label → 'Audit-relevant'
What was kept:
- CWE references (legitimate technical pointers)
- Microsoft API/feature names that happen to use 'compliance'
literally ('Microsoft Graph compliance API',
'device-compliance validators' — these are MS product names,
not framework name-drops)
- 'NIST PQC' on the landing page (Post-Quantum Cryptography is
the actual NIST standard family, not a compliance framework)
Verified: zero hyperlinks into docs/compliance/ remain. All 24
ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean.
Net diff: 26 files / -1,883 deletions in compliance/ + -32 net
across the prose sweep.
Companion edits in cowork/ (CLAUDE.md doc-tree summary +
WORKSPACE-CHANGELOG.md retirement note) land separately.
31 KiB
QA Test Suite Guide (qa_test.go)
Last reviewed: 2026-05-05
Audience: Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
Self-contained. Through 2026-05-04 this doc was a companion to a separate
docs/testing-guide.md(the what to test) — that companion was pruned during the Phase 5 docs overhaul (its content dispersed across the audience-organized doc tree). The Part-by-Part Coverage Map below is now the canonical inventory of QA Parts.
Test Suite Health (regenerate via make qa-stats)
Snapshot at HEAD. Re-run
make qa-statsto refresh; the QA-doc seed-count drift guard (.github/workflows/ci.yml::QA-doc seed-count drift guard) catches out-of-date cert / issuer counts on every PR. The Part-count drift guard retired in the 2026-05-04 docs overhaul Phase 5 (testing-guide.md was pruned; Part counts are now tracked insideqa_test.goitself, not against an external doc). Last regenerated: 2026-04-27 (Bundle P).
| Metric | Value | Target | Status |
|---|---|---|---|
| Backend test files | 221 | n/a | ℹ |
Backend Test* functions |
2,454 | n/a | ℹ |
Backend t.Run subtests |
778 | n/a | ℹ |
| Frontend test files | 38 | n/a | ℹ |
| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
t.Skip sites |
60 | each carries valid rationale (Bundle O audit) | ✓ |
qa_test.go Part_* subtests |
53 | covers 49 of 56 historical QA Parts directly + Parts 15–17 indirectly via Parts 42–46 | ✓ |
| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in coverage-matrix.md |
| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
Race detector clean (-count=10) |
partial (-count=3 clean per Phase 0) |
0 races | ⚠ |
What Is This File?
deploy/test/qa_test.go is a single Go test file (~1700 lines) that automates the historical QA Part inventory (preserved in the Part-by-Part Coverage Map below) against a running certctl Docker Compose demo stack. It replaces the legacy qa-smoke-test.sh bash script.
It covers 49 of 56 Parts of the testing guide as automation; the remaining 7 are either manual-only by design or pending QA-suite coverage:
- 49
Part_*automation wrappers, ~159 leaf subtests — API calls, database queries, source file checks, performance benchmarks - 11 fully skipped Parts — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
- 4 Parts NOT YET AUTOMATED — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually until QA-suite automation lands; the Part-by-Part Coverage Map below describes the surface area each Part covers
- Manual-only flows in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human (Coverage Map below describes each)
Architecture
flowchart LR
QA["qa_test.go (//go:build qa)<br/><br/>TestQA(t *testing.T)<br/>├─ Part01_Infra<br/>├─ Part02_Auth<br/>├─ Part03_CertCRUD<br/>├─ ...<br/>└─ Part52_HelmChart"]
subgraph Stack["certctl demo stack<br/>docker-compose.yml + docker-compose.demo.yml"]
Server["certctl-server :8443"]
Postgres["postgres :5432"]
Agents["certctl-agent (×N)<br/>↑ seed_demo.sql provisions 12 agent rows<br/>(1 active, 2 retired, 9 reserved/sentinel)<br/>for the soft-retire / FSM coverage Parts 55–56 exercise"]
end
QA --> Stack
Multi-agent demo stack (Bundle Q / L-004 closure). The demo stack runs a single live
certctl-agentcontainer by default but the database is seeded with 12 agent rows (migrations/seed_demo.sql, grepmc-* | ag-*IDs). The "(×N)" notation reflects the seed-data reality: Parts 04 (Agents Listing), 05 (Agent Heartbeats), 55 (Agent Soft-Retirement), and FSM coverage tables incoverage-audit-2026-04-27/tables/fsm-coverage.mdexercise the full multi-agent population, not the one live container. Operators running the QA suite in a parallel-agent topology should setAGENT_COUNT=Nin compose-override and re-derive the seed counts viamake qa-stats.
Key design choices:
- Build tag:
//go:build qa— never runs duringgo test ./...or CI. Only runs when explicitly requested. - Package:
integration_test— same package asintegration_test.go(which uses//go:build integrationfor the test stack). They coexist but never run together. - Zero internal imports: Uses only stdlib +
lib/pq(fromgo.mod). All API interactions are plain HTTP. All JSON is decoded into lightweight local structs (qaCert,qaJob, etc.) — not the internal domain types. - Self-cleaning: Tests that create data use
t.Cleanup()to delete it afterward. The seed data is not modified.
Prerequisites
-
Docker Compose demo stack running:
cd deploy docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -dWait ~15 seconds for health checks to pass.
-
Go 1.22+ installed (the project uses Go 1.25 in
go.mod, but 1.22+ works for running tests). -
PostgreSQL port exposed — the demo stack exposes port 5432 for database verification tests (table counts, schema checks).
-
Repository checkout — source file verification tests (
fileExists,fileContains) read files relative toqaRepoDir(default:../..fromdeploy/test/).
Running the Tests
Full suite
cd deploy/test
go test -tags qa -v -timeout 10m ./...
Single Part
go test -tags qa -v -run TestQA/Part03 ./...
Single subtest
go test -tags qa -v -run TestQA/Part03_CertCRUD/Create_Minimal ./...
With custom environment
CERTCTL_QA_SERVER_URL=https://staging.internal:8443 \
CERTCTL_QA_API_KEY=my-staging-key \
CERTCTL_QA_DB_URL=postgres://certctl:secret@db.internal:5432/certctl?sslmode=require \
CERTCTL_QA_REPO_DIR=/path/to/certctl \
go test -tags qa -v -timeout 10m ./...
Environment Variables
| Variable | Default | Description |
|---|---|---|
CERTCTL_QA_SERVER_URL |
https://localhost:8443 |
certctl server URL (HTTPS-only as of v2.2) |
CERTCTL_QA_API_KEY |
change-me-in-production |
API key for Bearer auth |
CERTCTL_QA_DB_URL |
postgres://certctl:certctl@localhost:5432/certctl?sslmode=disable |
PostgreSQL connection string |
CERTCTL_QA_REPO_DIR |
../.. |
Path to certctl repo root (for source file checks) |
CERTCTL_QA_CA_BUNDLE |
./certs/ca.crt |
PEM CA bundle pinned for TLS verification. The demo stack's certctl-tls-init container writes here. |
CERTCTL_QA_INSECURE |
false |
Set to "true" to skip TLS verification (e.g. before the init container finishes). Never use outside the demo harness. |
Part-by-Part Coverage Map
This table shows what each Part tests and what's left for manual verification.
| Part | Testing Guide Section | Automated Subtests | What's Automated | What's Manual |
|---|---|---|---|---|
| 1 | Infrastructure & Deployment | 8 | Table count, health/ready endpoints, seed data counts (certs, agents, issuers, targets, policies) | Docker container health, log inspection, volume mounts |
| 2 | Authentication & Security | 4 | No-auth 401, bad-key 401, health-no-auth 200, no private keys in API | CORS preflight, rate limiting (429 + Retry-After), TLS config |
| 3 | Certificate Lifecycle | 10 | Create (minimal + full), get, 404, list pagination, status/issuer filters, sparse fields, update, archive | Deployment trigger, version history, certificate detail UI |
| 4 | Renewal Workflow | 3 | Trigger renewal, 404 on nonexistent, agent work endpoint | AwaitingCSR flow, agent key generation, full issuance cycle |
| 5 | Revocation | 5 | Revoke (default reason), already-revoked, nonexistent, invalid reason, CRL JSON | DER CRL, OCSP responder, revocation notifications |
| 6 | Policies & Profiles | 6 | Policy CRUD (create/delete), invalid type 400, profile CRUD, list | Policy violation detection, profile enforcement on CSR |
| 7 | Ownership & Teams | 4 | Team CRUD, owner CRUD, agent groups list | Owner notification routing, dynamic group matching |
| 8 | Job System | 2 | List jobs, 404 on nonexistent | Job state transitions, approval workflow, cancellation |
| 9 | Issuer Connectors | 4 | List, get detail, create (GenericCA), missing name 400 | Test connection, issuer-specific issuance flow |
| 10 | Sub-CA Mode | SKIP | — | Requires CA cert+key on disk |
| 11 | ACME ARI | SKIP | — | Requires ARI-capable CA |
| 12 | Vault PKI | SKIP | — | Requires live Vault server |
| 13 | DigiCert | SKIP | — | Requires DigiCert sandbox |
| 14 | Target Connectors | 3 | List, create NGINX target, delete 204 | Deploy to real target, validate deployment |
| 15–17 | Apache/HAProxy, Traefik/Caddy, IIS | — | (Covered by source checks in Parts 42–46) | Requires real services or Windows |
| 18 | Agent Operations | 3 | Heartbeat (register), metadata check, auto-create on heartbeat | Agent binary behavior, key storage, discovery scan |
| 19 | Agent Work Routing | 1 | Empty work for agent with no targets | Scoped job assignment, multi-target fan-out |
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually — see the Coverage Map row |
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually — see the Coverage Map row |
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
| 28 | CLI | SKIP | — | Requires compiled certctl-cli binary |
| 29 | MCP Server | SKIP | — | Requires compiled mcp-server binary + stdio |
| 30 | Observability | 7 | Dashboard summary, certs by status, expiration timeline, job trends, issuance rate, JSON metrics (uptime + gauges), Prometheus (content-type + 4 metric names) | Chart rendering (GUI), Grafana import |
| 31 | Notifications | 2 | List, 404 on nonexistent | Notification content, mark-read, email/Slack delivery |
| 32 | Audit Trail | 3 | List events (≥10), PUT immutability, DELETE immutability | Actor attribution, body hash, time range filters |
| 33 | Background Scheduler | SKIP | — | Timing-dependent; verify via Docker logs |
| 34 | Structured Logging | SKIP | — | Requires Docker log inspection |
| 35 | GUI Testing | SKIP | — | Requires browser |
| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
| 40 | Documentation | 8 | README, quickstart, architecture, connectors exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | errors.Is(errors.New()) anti-pattern source scan |
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
| 44 | SSH Target | 4 | Domain type, connector file, agent dispatch (sshconn), OpenAPI |
SSH deployment test (requires target host) |
| 45 | Windows Certificate Store | 3 | Domain type, connector file, shared certutil package | Windows deployment (requires Windows) |
| 46 | Java Keystore | 3 | Domain type, connector file, OpenAPI | JKS deployment (requires keytool) |
| 47 | Certificate Digest Email | 3 | Preview endpoint (200/503), service file, adapter file | SMTP delivery, HTML template rendering |
| 48 | Dynamic Issuer Config | 4 | Crypto package exists, create ACME issuer via API, config redaction check, migration exists | Test connection flow, registry rebuild |
| 49 | Dynamic Target Config | 2 | Create NGINX target via API, migration exists | Test connection via agent heartbeat |
| 50 | Onboarding Wizard | 2 | Wizard component exists, docker-compose split (clean vs demo) | Wizard UI flow, step completion |
| 51 | ACME Profile Selection | 3 | Profile module exists, frontend config, RFC 9702→9773 renumber check | Profile-aware issuance against real CA |
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | helm template rendering, helm install |
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually — see the Coverage Map row |
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (POST /api/v1/notifications/{id}/requeue), idempotency on retry. Test manually — see the Coverage Map row |
Totals (verified 2026-04-27): 49 Part_* automation wrappers, ~159 leaf subtests, 11 fully
skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
flows (GUI, scheduler timing, Docker log inspection). Run grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go to count Part_* automation wrappers
and grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go to re-verify.
Coverage by Risk Class
A buyer's QA lead reading this doc wants "where are the existential bugs caught?" — Bundle P / Strengthening #1 surfaces that view directly. The table below classifies each Part by risk class so reviewers can answer the existential-coverage question in one glance.
| Risk class | Description | Parts in scope | Automation status |
|---|---|---|---|
| Existential (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in qa_test.go; manual playbook in the Coverage Map below) |
| High (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
| Medium (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 42–46) |
| Low (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
| Frontend (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under web/); this doc punts to manual + Vitest |
| Audit-relevant | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
This is the table acquisition reviewers screenshot for their report. When a new Part_* subtest lands in qa_test.go, classify it here.
Test Categories
The automated tests fall into four categories:
1. API Integration Tests (majority)
Make real HTTP requests to the running server and verify status codes, response structure, and JSON field values. Examples:
POST /api/v1/certificateswith valid payload → 201GET /api/v1/certificates?status=Active→ all returned certs havestatus: "Active"DELETE /api/v1/certificates/mc-qa-full→ 204
2. Database Verification Tests
Connect directly to PostgreSQL and verify schema state:
- Table count ≥ 19 (from migrations 000001–000010)
- Useful for catching migration regressions
3. Source File Verification Tests
Read files from the repo checkout and verify structure:
- Domain types exist in
internal/domain/connector.go(e.g.,TargetTypeEnvoy) - Connector implementations exist (e.g.,
internal/connector/target/envoy/envoy.go) - Documentation contains expected content (all issuer/target types listed)
- No stale RFC 9702 references (replaced by RFC 9773)
4. Performance Spot Checks
Timed API requests with threshold assertions:
GET /api/v1/certificates?per_page=15< 200msGET /api/v1/stats/summary< 500msGET /api/v1/metrics/prometheus< 300ms
What This Test Does NOT Cover
These gaps must be filled by manual testing — see each Coverage Map row for surface-area description:
Not Yet Automated (Parts 23, 24, 55, 56)
These historical QA Parts are listed in the Coverage Map below but have no Part_* automation
in qa_test.go yet. They are operator-runnable from the manual playbook; QA-suite
automation should land before the next acquisition-grade release.
- Part 23: S/MIME & EKU Support — profile-driven EKU enforcement; SMIMECapabilities extension
- Part 24: OCSP Responder & DER CRL — OCSP request/response correctness, CRL generation, Must-Staple coordination
- Part 55: Agent Soft-Retirement (I-004) — soft vs hard retire, FK cascade, reactivation
- Part 56: Notification Retry & Dead-Letter Queue (I-005) — retry semantics, dead-letter transition, requeue
External CA Integrations (Parts 10–13)
- Sub-CA mode — requires CA cert+key files on disk
- ACME ARI — requires a CA that supports RFC 9773 Renewal Information
- Vault PKI — requires a running HashiCorp Vault instance
- DigiCert / Sectigo / Google CAS — requires sandbox API credentials
Browser/GUI Testing (Parts 35–37, 50)
- Dashboard chart rendering (Recharts)
- Onboarding wizard step-by-step flow
- Issuer catalog card layout and create wizard
- Bulk operations UI (multi-select, progress bars)
- Discovery triage workflow
Real Deployment Testing (Parts 15–17)
- NGINX/Apache/HAProxy file write + reload
- Traefik/Caddy file provider or API reload
- IIS PowerShell/WinRM (requires Windows)
- F5 BIG-IP iControl REST (requires appliance or mock)
- SSH agentless deployment (requires target host)
Agent Binary Behavior (Parts 18, 28–29)
- Agent-side ECDSA key generation and CSR submission
- Agent filesystem discovery scan
- CLI tool (
certctl-cli) — all 10 subcommands - MCP server (
mcp-server) — stdio transport
Timing-Dependent Tests (Parts 33–34)
- Background scheduler loop execution (renewal, jobs, health, notifications, digest, network scan)
- Structured logging format verification (requires Docker log parsing)
How This Relates to integration_test.go
Both files live in deploy/test/ in the same Go package (integration_test):
qa_test.go |
integration_test.go |
|
|---|---|---|
| Build tag | //go:build qa |
//go:build integration |
| Target stack | Demo (docker-compose.yml + docker-compose.demo.yml) |
Test (docker-compose.test.yml) |
| Port | 8443 | Different (test stack config) |
| Seed data | seed_demo.sql (32 certs, 12 agents, 13 issuers, 8 targets, realistic history) |
Minimal (created by tests) |
| CA backends | Local CA only (demo mode) | Pebble ACME, step-ca, NGINX |
| Purpose | Release QA — broad coverage, spot checks | Functional — end-to-end issuance, renewal, revocation against real CAs |
| Run frequency | Before each release tag | CI on every PR |
They are complementary. Integration tests prove the machinery works. QA tests prove the product works at release quality.
Seed Data Reference
The QA tests depend on migrations/seed_demo.sql. Key IDs used:
Certificates (32 total in managed_certificates)
The full canonical list is generated by:
sed -n '/^INSERT INTO managed_certificates/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('mc-[a-z0-9_-]+" | sed -E "s/^\s*\('//" | sort -u
Hand-listing is unsustainable as the seed grows; tests reference IDs by lookup, not by enumeration.
Sample IDs: mc-api-prod, mc-web-prod, mc-pay-prod, mc-compromised, mc-smime-bob, mc-edge-eu, mc-k8s-ingress, mc-wildcard-prod. See migrations/seed_demo.sql:147 onward.
Agents (12 total in agents table)
8 named workload agents + 1 server-side sentinel + 3 cloud-discovery sentinels:
- Workload agents:
ag-web-prod,ag-web-staging,ag-lb-prod,ag-iis-prod,ag-data-prod,ag-edge-01,ag-k8s-prod,ag-mac-dev - Server-side sentinel:
server-scanner - Cloud-discovery sentinels:
cloud-aws-sm,cloud-azure-kv,cloud-gcp-sm
Full list via:
sed -n '/^INSERT INTO agents/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('[a-z][a-z0-9_-]+" | sed -E "s/^\s*\('//"
(The agent_groups table also contains entries with ag-* IDs — ag-linux-prod, ag-windows, ag-datacenter-a, ag-arm64, ag-manual — but those are group IDs, not agents. Don't confuse the two.)
Issuers (13 total)
iss-local, iss-acme-le, iss-stepca, iss-acme-zs, iss-openssl, iss-vault, iss-digicert, iss-sectigo, iss-googlecas, iss-awsacmpca, iss-entrust, iss-globalsign, iss-ejbca.
Full list via:
sed -n '/^INSERT INTO issuers/,/^;/p' migrations/seed_demo.sql \
| grep -oE "^\s*\('iss-[a-z0-9_-]+" | sed -E "s/^\s*\('//"
Targets (8 total in deployment_targets)
tgt-nginx-prod, tgt-nginx-staging, tgt-haproxy-prod, tgt-apache-prod, tgt-iis-prod, tgt-traefik-prod, tgt-caddy-prod, tgt-nginx-data
Network Scan Targets (4 total in network_scan_targets)
nst-dc1-web, nst-dc2-apps, nst-dmz, nst-edge
Maintenance note: when adding new seed rows, also update this section, OR remove the
per-table counts and rely on the sed | grep commands so the doc stops drifting on every
seed-data change. A CI guard that fails when the doc count diverges from the seed file is
proposed in coverage-audit-2026-04-27/tables/qa-doc-strengthening.md (Strengthening #6).
Troubleshooting
"Server unreachable" on startup
The test pings GET /health before running anything. If this fails:
# Check if the stack is running
docker compose -f docker-compose.yml -f docker-compose.demo.yml ps
# Check server logs
docker compose -f docker-compose.yml -f docker-compose.demo.yml logs certctl-server
# Check if the port is exposed (self-signed cert — pin CA bundle)
curl --cacert ./deploy/test/certs/ca.crt -s https://localhost:8443/health
"connect to QA DB" failure
The database tests connect directly to PostgreSQL. Ensure port 5432 is exposed:
docker compose -f docker-compose.yml -f docker-compose.demo.yml port postgres 5432
Performance tests flaking
The performance thresholds (200ms, 300ms, 500ms) assume a local Docker stack. On slow CI runners or remote Docker hosts, increase the thresholds or skip Part 39:
go test -tags qa -v -run 'TestQA/Part(?!39)' ./...
Source file checks failing
The fileExists and fileContains helpers read from CERTCTL_QA_REPO_DIR (default ../..). If running from a non-standard location:
CERTCTL_QA_REPO_DIR=/absolute/path/to/certctl go test -tags qa -v ./...
Release Day Sign-Off Matrix
Before tagging a release, the QA-on-call engineer signs off on each row. This matrix replaces the previous ad-hoc release checklist and ties test execution directly to release approval. Acquisition-grade releases have this kind of matrix; the doc previously didn't.
| Sign-off | Evidence | Owner | Result | Date |
|---|---|---|---|---|
make verify clean on master |
CI run URL | Eng-on-call | ☐ | |
go test -tags qa ./deploy/test/... ≥ 95% pass rate (skips counted as pass) |
Test output | QA-on-call | ☐ | |
go test -race -count=10 ./internal/... 0 races |
tool-output/race-x10.txt |
QA-on-call | ☐ | |
Coverage ≥ thresholds in ci.yml (service / handler / crypto / local-issuer / acme / stepca / mcp) |
tool-output/cover-summary.txt |
QA-on-call | ☐ | |
Helm chart helm lint && helm template clean |
tool-output/helm.txt |
DevOps-on-call | ☐ | |
All t.Skip sites have current rationales (see Bundle O audit; CI guard catches new orphans) |
make qa-stats t.Skip count |
QA-on-call | ☐ | |
| Frontend: Vitest run clean; per-page coverage ≥ 70% | web/tool-output/vitest.txt |
Frontend-on-call | ☐ | |
| Manual Parts 23, 24, 55, 56 executed (or explicit defer with rationale) | This sheet | QA-on-call | ☐ | |
Demo stack docker compose up -d --build smoke (/health 200, /ready 200) |
curl receipt | QA-on-call | ☐ | |
govulncheck ./... clean (or deferred-call advisories tracked in gap-backlog) |
tool-output/govulncheck.json |
Security-on-call | ☐ | |
| QA-doc drift guards green (Part-count + cert-count) | CI run URL | QA-on-call | ☐ | |
FSM transition coverage tables (coverage-audit-2026-04-27/tables/fsm-coverage.md) — Existential FSMs ≥80% legal + 100% illegal |
This sheet | QA-on-call | ☐ |
Sign-off owner: ______________________ Date: ______ Tag: v__..
Mutation Testing Targets & Kill Rate
Mutation testing exposes which assertions are actually load-bearing — tests can pass against broken code if mutations survive, which is a coverage trap. The audit's Phase 0 attempted to run go-mutesting on the Existential cluster but was blocked by a Go 1.25 / arm64 incompatibility in osutil@v1.6.1 (uses syscall.Dup2 which is undefined on linux/arm64). The operator-runnable workaround uses a fork that targets unix.Dup3 instead.
| Package | Risk class | Target kill rate | Last measured | Tool |
|---|---|---|---|---|
internal/crypto |
Existential | ≥90% | unmeasured (sandbox-blocked, operator-runnable) | go-mutesting |
internal/pkcs7 |
Existential | ≥90% | unmeasured | go-mutesting |
internal/connector/issuer/local |
Existential | ≥90% | unmeasured | go-mutesting |
internal/connector/issuer/acme |
Existential | ≥80% (catch-up; failure-mode coverage 55.6% per Bundle J) | unmeasured | go-mutesting |
internal/connector/issuer/stepca |
Existential | ≥85% (post-Bundle-L.B coverage at 90.4%) | unmeasured | go-mutesting |
internal/api/middleware |
High | ≥80% | unmeasured | go-mutesting |
internal/validation |
Existential (CWE-78 / CWE-113 boundary) | ≥90% | unmeasured | go-mutesting |
web/src/utils/safeHtml.ts |
Frontend (XSS gate) | ≥90% | unmeasured | Stryker |
Operator command (per package)
# Use the avito-tech fork that supports linux/arm64 + Go 1.25.
go install github.com/avito-tech/go-mutesting/cmd/go-mutesting@latest
mkdir -p tool-output
$(go env GOPATH)/bin/go-mutesting --debug ./internal/crypto/... \
> tool-output/mutation-crypto.txt 2>&1
grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
Acceptance: ≥80% (Existential) / ≥70% (High). Anything below is a Medium finding; triage entries go in coverage-audit-2026-04-27/gap-backlog.md. This subsection moves mutation testing from "future work" to "documented release gate."
Adding New Tests
When a new feature ships:
- Add a Part section in
qa_test.gofollowing the numbering convention in the Coverage Map below - API tests: use
c.get(),c.post(),c.bodyStr(),c.getJSON(),c.timedGet() - Source checks: use
fileExists(t, "relative/path")andfileContains(t, "path", "substring") - DB checks: use
openQADB(t)anddb.queryInt(t, "SELECT ...") - Cleanup: always use
t.Cleanup()for data created during tests - Skip if external: use
t.Skip("Requires X — manual test")with a clear reason
Version History
- v1.3 (April 2026, post-Bundle-P) — QA Doc Strengthening shipped. New top-of-doc Test Suite Health dashboard (regenerated via
make qa-stats). New Coverage by Risk Class table after the Coverage Map. New Release Day Sign-Off Matrix and Mutation Testing Targets sections. CI seed-count + Part-count drift guards land in.github/workflows/ci.ymlso future doc drift fails CI. Bundle P closes M-007 / M-010 / M-011 / M-012 (structural strengthening) + M-008 (Mutation Testing Targets). - v1.2 (April 2026, post-coverage-audit) — Documented Parts 55–56 (I-004 Agent Soft-Retirement, I-005 Notification Retry & Dead-Letter) and surfaced Parts 23–24 (S/MIME & EKU; OCSP/CRL) as not-yet-automated. 56 Parts total in
testing-guide.md; 49 livePart_*automation wrappers inqa_test.go+ 4 newSkipstubs for Parts 23/24/55/56 = 53 wrappers (Parts 15–17 remain covered by source-checks in Parts 42–46). Reconciled seed-data section to actualseed_demo.sqlcounts (12 agents, 13 issuers; certs were already accurate at 32). Bundle I of the 2026-04-27 coverage-audit closure plan. - v1.1 (April 2026) — Added Parts 53–54 (M47: Kubernetes Secrets target + AWS ACM PCA issuer). 54 Parts total, ~164 automated subtests.
- v1.0 (April 2026) — Initial release covering all 52 Parts of testing-guide.md v2.1. Replaces
qa-smoke-test.sh.