Files
certctl/scripts/ci-guards
shankar0123 69a2b5c55a config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)
Eleven findings from the architecture diligence audit's Phase 2 bundle
closed in one PR. All touch the same backend config + Helm chart +
operator docs surface, so reviewing in one diff is the natural fit.

config.go: three new fail-closed Validate() branches behind sentinels
=====================================================================

Three new error sentinels exported from internal/config/config.go for
tests to pin via errors.Is + message-text:
  - ErrAgentBootstrapTokenRequired (SEC-H1)
  - ErrACMEInsecureWithoutAck      (SEC-M4)
  - ErrDemoModeAckExpired          (SEC-H3)

SEC-H1 (staged): introduces CERTCTL_AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY
as an opt-in feature flag. When true AND the bootstrap token is empty,
Validate() returns ErrAgentBootstrapTokenRequired and the server
refuses to start. Default in THIS release: false (warn-mode
pass-through preserved). WORKSPACE-ROADMAP.md schedules the default
flip to true for v2.2.0 — operators get one upgrade window.

SEC-M4: upgrades the existing boot-time WARN log for
CERTCTL_ACME_INSECURE=true into a hard refuse-to-start gate behind
CERTCTL_ACME_INSECURE_ACK=true. The ACK env var must be paired with
the existing INSECURE flag; either alone fails closed. The boot-time
WARN log at cmd/server/main.go:611 continues to fire for the ACK'd
case so every restart logs the reminder.

SEC-H3: tightens the sticky DemoModeAck bit so it expires after 24h.
When DemoModeAck=true, Validate() now requires CERTCTL_DEMO_MODE_ACK_TS
to be set as a unix-epoch timestamp within the last 24h (24h-tolerance
on the past side, 1-minute clock-skew on the future side). Catches the
"forgotten demo deployment promoted to production" failure mode —
next container restart past 24h refuses unless re-ack'd.

Tests in internal/config/config_test.go cover every new branch:
positive (passes when properly set), negative (each fail-closed path
fires with the matching sentinel + message-text). 11 new tests added.

Helm chart + HA runbook (DEPL-H1)
=================================

Created docs/operator/runbooks/ha.md documenting the three values
flips required for production HA: server.replicas, podDisruptionBudget,
service.sessionAffinity. Cross-link comments added to
deploy/helm/certctl/values.yaml next to the server.replicas (line 19)
and podDisruptionBudget (line 566) defaults. DEFAULTS DO NOT CHANGE
— that's the point per the prompt's 'do not flip networkPolicy default'
guidance: a default-enabled PDB blocks fresh helm install on
single-node clusters.

CI guard (DEPL-M2)
==================

scripts/ci-guards/no-change-me-in-prod-compose.sh grep-fails any
'change-me-' literal in compose files OTHER than docker-compose.demo.yml.
Catches the placeholder-credential-leak regression one layer earlier
than the runtime Validate() fail-closed guards from Bundle 2 (2026-05-12).
Excludes comment lines so docs explaining the pattern don't trip the
guard. Verified to fire on a synthetic leak; clean on the current tree.

Consolidated 'Security carve-outs' doc section
==============================================

docs/operator/security.md grows by one new section documenting the
seven existing carve-outs in one canonical place:
  - SEC-M3: 3 InsecureSkipVerify=true sites (Agent dev, verify probe, tlsprobe)
  - SEC-M5: F5 connector InsecureSkipVerify per-config field
  - SEC-M4: ACME insecure + new ACK gate
  - SEC-L1: CSP 'unsafe-inline' on style-src (Tailwind carve-out)
  - SEC-L2: break-glass Argon2id rest-defense reminder
  - SEC-L3: 1 MB body-size cap + CERTCTL_MAX_BODY_SIZE override
  - DEPL-M2: change-me-* placeholder credentials in demo overlay
  - DEPL-M3: K8s NetworkPolicy operator-opt-in default

Each entry cites the file:line, the rationale for the carve-out, and
the operator action.

CHANGELOG + ENVIRONMENTS coverage
==================================

CHANGELOG.md grows by one new '### Breaking changes (scheduled for
v2.2.0)' section under Unreleased, documenting SEC-H1 / SEC-M4 / SEC-H3
with explicit upgrade-window guidance for each.

deploy/ENVIRONMENTS.md adds five rows: AGENT_BOOTSTRAP_TOKEN +
AGENT_BOOTSTRAP_TOKEN_DENY_EMPTY + DEMO_MODE_ACK + DEMO_MODE_ACK_TS +
ACME_INSECURE_ACK. G-3 env-docs-drift CI guard stays clean.

WORKSPACE-ROADMAP.md (cowork-side) schedules the SEC-H1 default-flip
for v2.2.0.

Sandbox limitation
==================

The certctl repo's working tree is 6.1 GB which fills the sandbox
volume; the go1.25.10 toolchain download (go.mod requires it,
sandbox has 1.25.9) keeps failing on disk-full. Local 'go build' /
'go test' were NOT run in this commit's verification path.
make verify MUST be run on the operator's workstation before push
per CLAUDE.md operating rules.

CI guards (no-change-me, G-3 env-docs-drift, doc-rot-detector, +
all existing) verified clean by running each individually.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-SEC-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-H3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M4,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M2,
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-M5,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L1,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L2,
        cowork/certctl-architecture-diligence-audit.html#fix-SEC-L3
2026-05-13 19:50:00 +00:00
..

scripts/ci-guards/ — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

  1. Be exit-code 0 on a clean repo (no regression present).
  2. Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
  3. Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
  4. Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
  5. Use set -e early to fail-fast on internal command errors.
  6. Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

  • scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
  • scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
  • scripts/check-coverage-thresholds.sh — consumes coverage.out
    • .github/coverage-thresholds.yml

Adding a new guard

  1. Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
  2. Make it executable: chmod +x scripts/ci-guards/<id>.sh.
  3. Verify it fails on a deliberate regression and passes on clean repo.
  4. CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID Finding Catches
G-1-jwt-auth-literal G-1 JWT silent auth downgrade "jwt" literal in additive auth-type surfaces
L-001-insecure-skip-verify L-001 unjustified InsecureSkipVerify InsecureSkipVerify: true without //nolint:gosec
H-001-bare-from H-001 (CWE-829) tag-swap attack Bare FROM line without @sha256 digest pin
M-012-no-root-user M-012 (CWE-250) container-as-root Dockerfile missing terminal USER <non-root>
H-009-readme-jwt H-009 README JWT advertising README.md re-introducing JWT-as-supported claim
G-2-api-key-hash-json G-2 cat-s5-apikey_leak api_key_hash in JSON-emitting surface
U-2-plaintext-healthcheck U-2 healthcheck protocol mismatch Plaintext http:// in HEALTHCHECK directive
U-3-migration-mount U-3 seed initdb schema drift Migration file mounted into postgres initdb
D-1-D-2-statusbadge-phantom D-1 + D-2 dead keys + TS phantoms StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
L-1-bulk-action-loop L-1 client-side bulk loops for ... await triggerRenewal/updateCertificate in CertificatesPage
B-1-orphan-crud B-1 orphan-CRUD client fns 8 update/create/delete fns lose their page consumer
S-2-strings-contains-err S-2 brittle error-dispatch strings.Contains(err.Error(), "not found"|"violates foreign key") in handlers
G-3-env-docs-drift G-3 env-var docs drift CERTCTL_* env var defined OR documented but not both
test-naming-convention I-001-extended func TestXxx (lowercase first letter) — Go silently skips
S-1-hardcoded-source-counts S-1 stale numeric prose Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
P-1-documented-orphan-fns P-1 documented orphans 16 read-fn names removed from client.ts exports
T-1-frontend-page-coverage T-1 untested frontend pages New page in web/src/pages/ without sibling .test.tsx and not on the deferred allowlist
bundle-8-L-015-target-blank-rel-noopener L-015 (CWE-1022) reverse-tabnabbing target="_blank" without rel="noopener noreferrer"
bundle-8-L-019-dangerously-set-inner-html L-019 (CWE-79) XSS dangerouslySetInnerHTML outside safeHtml.ts
bundle-8-M-009-bare-usemutation M-009 + M-029 mutation contract Bare useMutation() outside useTrackedMutation wrapper
H-1-encryption-key-min-length H-1 closure follow-up (post-Phase-5 surfacing) CERTCTL_CONFIG_ENCRYPTION_KEY literal in any deploy/docker-compose*.yml shorter than the 32-byte floor enforced by internal/config/config.go::Validate()
test-compose-scep-coherence post-Phase-5 surfacing of dead SCEP test config CERTCTL_SCEP_ENABLED=true in test compose without (a) a CI job that runs the SCEP integration test, (b) the ra.crt + ra.key + intune_trust_anchor.pem fixtures committed to deploy/test/fixtures/, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID Item Catches
complete-path-config-coverage post-v2.1.0 / item-1 "Lying field" — CERTCTL_* env var defined in internal/config/config.go that no consumer outside internal/config/ actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at internal/config/coverage_test.go.
doc-rot-detector post-v2.1.0 / item-5 Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. docs/archive/ allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done