Files
certctl/docs/contributor/ci-guards.md
T
shankar0123 9f7b5d89a5 docs(contributor): document the Auditable Codebase Bundle guards
Three doc changes for the bundle's discoverability:

1. New docs/contributor/ci-guards.md (185 lines)
   Entry-point doc for new contributors. Explains the four categories
   of guards (code-shape, contract-parity, build/dep, operational),
   the discipline that keeps them honest (allowlist + expiration),
   and how to add a new one. Cross-references scripts/ci-guards/README.md
   for the exhaustive list.

2. scripts/ci-guards/README.md — added a 'Forward-looking guards'
   subsection naming complete-path-config-coverage, doc-rot-detector,
   and cold-db-compose-smoke with their item references + a
   one-sentence description of what each catches. Replaced the
   stale '22 guards' header with 'Count: re-derive via ls' per the
   no-version-stamped-numbers convention from CLAUDE.md.

3. docs/README.md — wired ci-guards.md into the Contributor section
   navigation table.

Bumped 'Last reviewed:' to 2026-05-12 on the two docs touched
(docs/README.md, docs/contributor/ci-pipeline.md).

Verified: doc-rot-detector.sh green at 91 docs scanned, 89 dated, 0
warns, 0 fails.

Audit-Closes: post-v2.1.0-anti-rot/item-1
Audit-Closes: post-v2.1.0-anti-rot/item-2
Audit-Closes: post-v2.1.0-anti-rot/item-5
Audit-Closes: post-v2.1.0-anti-rot/item-6
2026-05-12 14:15:13 +00:00

6.6 KiB

CI guards

Last reviewed: 2026-05-12

CI guards are small scripts (shell + Python) and Go tests that pin invariants the v2 audit history showed are easy to lose. Each one runs on every push, fails the build on regression with a useful error message, and produces no output on the happy path. The canonical source is scripts/ci-guards/ for shell guards and internal/ciparity/ for Go-based parity tests.

This page lives at docs/contributor/ci-guards.md and is the entry point for contributors who want to understand why a CI step is red, how to add a new guard, or where the allowlist for a given guard lives. The exhaustive list of shell guards is at scripts/ci-guards/README.md; this doc explains the categories + the discipline.

Why guards exist

Two failure modes the v2 audit cycle surfaced repeatedly:

The codebase grew faster than the docs and config could keep up. Env vars got added without consumers; OpenAPI ops were registered without router routes; docs went stale; a migration broke on cold-DB without any test catching it. Each one of those classes has a one-time-fix per-instance pattern (re-read the doc, wire the env var) and a structural per-class pattern (write a guard that fails the next time it happens). CI guards are the second.

The team grew. Reviewers had to remember what each commit author had forgotten. CI guards externalize the institutional knowledge into checks — the build refuses to ship the lying field, the stale doc, the broken migration. New contributors don't need to know the audit history.

Categories

The guards fall into four buckets, organized by what they pin:

Code-shape guards

Catch defects in source files BEFORE they ship. Examples: G-3-env-docs-drift.sh (no env var defined-but-undocumented or documented-but-undefined), complete-path-config-coverage.sh (every env var has a non-config consumer), T-1-frontend-page-coverage.sh (every new GUI page has a sibling test file).

Contract-parity guards

Catch drift across the four product surfaces — OpenAPI spec, HTTP router, MCP tool catalogue, CLI verb dispatcher. The router ↔ OpenAPI pin lives at internal/api/router/openapi_parity_test.go::TestRouter_OpenAPIParity. The MCP + CLI sweep lives at internal/ciparity/surface_parity_test.go (post-v2.1.0 anti-rot item 2). One hard gate: the MCP tool count cannot regress below mcpBaselineFloor. The CLI parity sweep is informational until the CLI surface stabilizes.

Build / dependency guards

H-001-bare-from.sh (Dockerfile pin to @sha256:), digest-validity.sh (every digest actually resolves on the registry), M-012-no-root-user.sh (no Dockerfile ends as root), bundle-8-*.sh (frontend XSS / reverse-tabnabbing surface). These come out of specific audits and pin the closure.

Operational guards

cold-db-compose-smoke.sh (wipe postgres volume, bring stack up cold, issue/renew/revoke, audit-row check). doc-rot-detector.sh (every doc reviewed within 120 days). These pin the operational reality, not the source shape.

When the build is red

Find the failing step in the GitHub Actions UI. Every guard's output starts with the guard's own identifier and ends with one of:

::error::<one-line description of the regression> followed by 2-4 remediation paths. The fastest path: read the remediation list, pick the option that fits, fix.

exit 1 without an ::error:: annotation — likely an set -e trap on an internal command. Re-run with bash -x scripts/ci-guards/<id>.sh locally to see where it died.

If a guard is fundamentally wrong (e.g., refactor moved the code it scans), update the guard in the same PR that triggered the failure. Don't add a one-off allowlist to silence a real bug.

Adding a new guard

The discipline in five steps. The first three are non-negotiable; the last two are courtesy.

Drop a new <id>.sh in scripts/ci-guards/ with a head-comment block that names the bug class, lists the audit finding (if any) it closes, and explains the failure mode. Mirror the shape of an existing guard — G-3-env-docs-drift.sh and digest-validity.sh are the canonical bash+Python and pure-bash examples.

Use set -e early; use ::error:: annotations on regression; exit 0 with one happy-path confirmation line. Take no arguments, require no env vars. The CI loop iterates every *.sh without args.

Write the allowlist file alongside (<id>-exceptions.yaml) with the shape - path: ... / - name: ... + justification + expires. Make expires a required field — every exception has a hard expiration date, typically 90 days out.

Verify on a deliberately broken state: introduce the regression, confirm the guard fires with a useful message, revert, confirm green. Capture the negative-test output in your PR description.

Add a row to scripts/ci-guards/README.md. The CI loop auto-picks up the new file — no ci.yml edit required, unless the guard needs Docker (in which case it gets its own dedicated job; see cold-db-compose-smoke for the pattern).

Discipline: the allowlist trap

Allowlists are dangerous. They start as a small concession ("this one env var is documented for an external script, not consumed by Go code") and become a junk drawer of unverified exemptions that mask real defects. The discipline that keeps that from happening:

Every entry MUST carry a justification: field with a one-line reason. "Tech debt" is not a reason; "documented contract surface consumed by the ACME DNS-01 helper script — see deploy/test/acme/dns01-export.sh" is.

Every entry MUST carry an expires: field with a hard date, typically 90 days out. The guards reject entries past their expiration. When an entry expires, the only paths forward are (a) close the underlying gap so the entry is no longer needed, (b) re-justify with a fresh expiration. Both force a real review.

If you're adding more than one entry to an allowlist in a single PR, that's a smell — usually the underlying class needs a small refactor, not three allowlist rows.

Where the bundles live

The Audit-Closes: commit trailer convention (post-v2.1.0 anti-rot item 4) is the cross-reference between audit findings and the commits that closed them. Re-derive the closure history of any audit with:

git log --grep='Audit-Closes: <audit-id>'

The audit folder structure under cowork/ (workspace-local; not in this repo) carries the per-audit RESULTS.md + findings.yaml. CLAUDE.md's "Audit closures" subsection is the current-state index of which audits are open vs closed.

The exhaustive guard list — scripts/ci-guards/README.md. The CI pipeline architecture — docs/contributor/ci-pipeline.md. The QA test suite — docs/contributor/qa-test-suite.md.