Files
certctl/scripts/ci-guards
shankar0123 a975ccfca0 docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard
Closes acquisition-diligence Bundle 6 findings on secret custody, config
encryption, and local artifact hygiene. Source IDs: S6, R4, SEC-M2,
RT-M1, RT-M2, RT-L1.

Surgical closures (artifact-only audit-framed memos stay out of the
public repo per the Bundle 5 lesson):

R4 / RT-L1 — local EC private key artifact
  rm cmd/agent/mc-001.key (gitignored, never in git history, leftover
  from a 2025-era agent dev run on the operator's workstation).
  Added scripts/ci-guards/B6-no-private-keys-in-tree.sh that fails the
  build if any TRACKED non-test file contains a PEM private-key block,
  so the next attempt to commit similar material gets caught at CI.
  Allowlist: *_test.go (hermetic-test PEMs), examples/*.md (sample
  walkthroughs), internal/scep/intune/testdata/ (certificates, not
  keys).

RT-M1 — landing-page HSM implication
  certctl.io/index.html: 'their hardware' / 'your hardware' colloquial
  comparisons rephrased to 'their custody' / 'your servers'. The phrase
  'Your keys. Your hardware. Your data. Your terms.' becomes 'Your
  keys. Your servers. Your data. Your terms.' to remove any inferred
  HSM-backed key-storage claim. The technical disclosure now lives in
  docs/operator/secret-custody.md (linked below); the landing page no
  longer makes a claim it cannot back.

S6 + SEC-M2 + RT-M2 (composite documentation closure)
  Added docs/operator/secret-custody.md — public operator reference
  enumerating every secret material on the control plane and on
  agents:
    - Local CA private key (FileDriver, file-on-disk, heap-resident
      with the L-014 carve-out documented in
      internal/connector/issuer/local/local.go).
    - Agent ECDSA P-256 keys (file on agent host, never transmitted).
    - OIDC client secret (AES-256-GCM v3, PBKDF2 600k).
    - Session signing key (same encryption regime).
    - Break-glass credential (Argon2id, never encrypted).
    - API-key bearer tokens (SHA-256 hash only; plaintext shown once).
    - CSR private keys mid-issuance (agent memory only).
    - Issuer-connector backend secrets (encrypted_config column,
      fail-closed for source='database', plaintext-by-design for
      source='env' with rationale).
  The Env-seeded-vs-DB-seeded plaintext policy is explained in plain
  text so a buyer review can independently verify the startup guard at
  cmd/server/main.go:222-262 makes sense.

  Added docs/operator/runbooks/config-encryption-upgrade.md — the
  procedural arm: how to force v1/v2 -> v3 re-seal across the
  database, plus the passphrase-rotation order. Documents the
  AEAD-driven read fallback (v3 -> v2 -> v1) and the fact that
  re-sealing happens passively on UPDATE. Open roadmap item: a
  certctl admin reseal --all command (tracked in
  WORKSPACE-ROADMAP.md).

  Both docs wired into docs/README.md Operator + Runbooks tables.

Verification:
  rg -n 'CONFIG_ENCRYPTION|encrypt|v1|private key|HSM|PKCS11|mc-001.key|\.key|Local CA' \
     internal cmd docs .gitignore README.md   # ambient (no NEW leaks)
  find . -name '*.key' \
     -not -path './.git/*' -not -path './web/node_modules/*'   # empty
  git ls-files | xargs grep -lE 'BEGIN .* PRIVATE KEY' \
     | grep -vE '_test\.go$|^examples/|^internal/scep/intune/testdata/'   # empty
  bash scripts/ci-guards/B6-no-private-keys-in-tree.sh   # PASS
  bash scripts/ci-guards/G-3-env-docs-drift.sh           # PASS
  bash scripts/ci-guards/doc-rot-detector.sh             # PASS

Residual roadmap (deliberately deferred):
  - signer.PKCS11Driver (HSM-token-backed CA-key custody).
  - signer.CloudKMSDriver (AWS/GCP/Azure KMS-backed CA-key custody).
  - FIPS 140-3 mode for the whole control plane.
  - HSM-backed session signing key.
  - Built-in 'certctl admin reseal --all' command.
  All five tracked in WORKSPACE-ROADMAP.md, not retracted.
2026-05-13 01:48:40 +00:00
..
2026-05-05 18:18:38 +00:00

scripts/ci-guards/ — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

  1. Be exit-code 0 on a clean repo (no regression present).
  2. Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
  3. Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
  4. Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
  5. Use set -e early to fail-fast on internal command errors.
  6. Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

  • scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
  • scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
  • scripts/check-coverage-thresholds.sh — consumes coverage.out
    • .github/coverage-thresholds.yml
  • scripts/qa-doc-part-count.sh + scripts/qa-doc-seed-count.sh — invoked via make verify-docs pre-tag, not in CI

Adding a new guard

  1. Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
  2. Make it executable: chmod +x scripts/ci-guards/<id>.sh.
  3. Verify it fails on a deliberate regression and passes on clean repo.
  4. CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID Finding Catches
G-1-jwt-auth-literal G-1 JWT silent auth downgrade "jwt" literal in additive auth-type surfaces
L-001-insecure-skip-verify L-001 unjustified InsecureSkipVerify InsecureSkipVerify: true without //nolint:gosec
H-001-bare-from H-001 (CWE-829) tag-swap attack Bare FROM line without @sha256 digest pin
M-012-no-root-user M-012 (CWE-250) container-as-root Dockerfile missing terminal USER <non-root>
H-009-readme-jwt H-009 README JWT advertising README.md re-introducing JWT-as-supported claim
G-2-api-key-hash-json G-2 cat-s5-apikey_leak api_key_hash in JSON-emitting surface
U-2-plaintext-healthcheck U-2 healthcheck protocol mismatch Plaintext http:// in HEALTHCHECK directive
U-3-migration-mount U-3 seed initdb schema drift Migration file mounted into postgres initdb
D-1-D-2-statusbadge-phantom D-1 + D-2 dead keys + TS phantoms StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
L-1-bulk-action-loop L-1 client-side bulk loops for ... await triggerRenewal/updateCertificate in CertificatesPage
B-1-orphan-crud B-1 orphan-CRUD client fns 8 update/create/delete fns lose their page consumer
S-2-strings-contains-err S-2 brittle error-dispatch strings.Contains(err.Error(), "not found"|"violates foreign key") in handlers
G-3-env-docs-drift G-3 env-var docs drift CERTCTL_* env var defined OR documented but not both
test-naming-convention I-001-extended func TestXxx (lowercase first letter) — Go silently skips
S-1-hardcoded-source-counts S-1 stale numeric prose Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
P-1-documented-orphan-fns P-1 documented orphans 16 read-fn names removed from client.ts exports
T-1-frontend-page-coverage T-1 untested frontend pages New page in web/src/pages/ without sibling .test.tsx and not on the deferred allowlist
bundle-8-L-015-target-blank-rel-noopener L-015 (CWE-1022) reverse-tabnabbing target="_blank" without rel="noopener noreferrer"
bundle-8-L-019-dangerously-set-inner-html L-019 (CWE-79) XSS dangerouslySetInnerHTML outside safeHtml.ts
bundle-8-M-009-bare-usemutation M-009 + M-029 mutation contract Bare useMutation() outside useTrackedMutation wrapper
H-1-encryption-key-min-length H-1 closure follow-up (post-Phase-5 surfacing) CERTCTL_CONFIG_ENCRYPTION_KEY literal in any deploy/docker-compose*.yml shorter than the 32-byte floor enforced by internal/config/config.go::Validate()
test-compose-scep-coherence post-Phase-5 surfacing of dead SCEP test config CERTCTL_SCEP_ENABLED=true in test compose without (a) a CI job that runs the SCEP integration test, (b) the ra.crt + ra.key + intune_trust_anchor.pem fixtures committed to deploy/test/fixtures/, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID Item Catches
complete-path-config-coverage post-v2.1.0 / item-1 "Lying field" — CERTCTL_* env var defined in internal/config/config.go that no consumer outside internal/config/ actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at internal/config/coverage_test.go.
doc-rot-detector post-v2.1.0 / item-5 Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. docs/archive/ allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Guards explicitly NOT here

  • QA-doc Part-count drift + QA-doc seed-count drift — these protect docs-the-operator-reads, not anything the product depends on. Moved to make verify-docs (operator runs pre-tag, not on every push). See the ci-pipeline-cleanup spec, Phase 11.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done