mirror of https://github.com/shankar0123/certctl.git synced 2026-07-26 15:38:12 +00:00

Files

T

shankar0123 02438ad9e1 ci: floor raise + doc drift (Phase 3 closure — TEST-H1/H2/M1/M2/M3/M4/L1, ARCH-H3/L1/L2/L3/L4)

Twelve findings from the architecture diligence audit's Phase 3 bundle
closed in one PR. All touch the CI workflows + small doc-drift fixes
across the production Go tree + migration headers.

CI workflow changes
====================

TEST-H1 — Race detection on ./... -short
  .github/workflows/ci.yml:106 was a 9-package explicit list. Audit
  finding TEST-H1 flagged that 25+ packages (internal/auth/*,
  internal/repository/*, internal/mcp, internal/scep, internal/pkcs7,
  internal/api/router, internal/api/acme, internal/cli, internal/cms,
  internal/config, internal/deploy, internal/integration,
  internal/ratelimit, internal/secret, internal/trustanchor, all of
  cmd/) silently dropped off race coverage.
  Post-fix: 'go test -race -short ./... -count=1 -timeout 600s'.
  76 testing.Short() guards already cover testcontainers + live-DB
  integration suites, so -short keeps the long-running tests out.

TEST-H2 — Cross-platform build matrix
  New 'cross-platform-build' job in ci.yml. Matrix:
  ubuntu-latest + windows-latest + macos-latest, fail-fast: false.
  Builds cmd/server + cmd/agent + cmd/cli + cmd/mcp-server on each.
  Catches Windows-specific regressions (path separators, file
  permissions, exec.Command semantics) the pre-Phase-3 Ubuntu-only
  CI missed.

TEST-L1 — actions/setup-go cache: true (explicit)
  setup-go v5 defaults cache: true; making it explicit so a future
  setup-go upgrade can't silently flip it. Re-runs hit the Go module
  + build cache instead of recompiling cold.

TEST-M1 — Mutation-testing floor at 55%
  security-deep-scan.yml::go-mutesting step rewritten. Removed
  continue-on-error + per-package '|| true'. New post-loop check
  extracts every 'The mutation score is X.YZ' line and fails the
  step if any package drops below 0.55. Floor rationale: starter
  ratio catches major regressions without rejecting the audit's
  'this is OK' steady state; raise quarterly.

TEST-M2 — 3 advisory deep-scan gates promoted to blocking
  Removed continue-on-error: true from:
    - gosec (filtered to G201/G202/G304/G108 high-signal rules:
      SQL-injection + path-traversal + pprof-exposed)
    - osv-scanner (multi-ecosystem CVE; complements govulncheck
      which is already blocking in ci.yml)
    - trivy image scan (--severity HIGH,CRITICAL --exit-code 1)
  continue-on-error count: 15 → 11.
  ZAP / schemathesis / nuclei / testssl stay advisory because their
  false-positive rates on https://localhost:8443-targeted DAST runs
  are high.

TEST-M3 — Playwright harness stub
  web/package.json adds '@playwright/test' devDep + 'e2e' / 'e2e:install'
  npm scripts. web/playwright.config.ts ships single chromium project
  with webServer block pointing at 'npm run dev'. web/src/__tests__/
  e2e/smoke.spec.ts proves the harness wires through. The full 15-flow
  suite ships in frontend-design-audit Phase 8 (TEST-H1 in THAT audit);
  this is the wiring + a single smoke test as the regression floor.
  New Makefile target: 'make e2e-test'.

Doc/code drift fixes
====================

TEST-M4 + ARCH-L2 — Skip inventory artifact + CI guard
  scripts/skip-inventory.sh walks every t.Skip site under cmd/ +
  internal/ + deploy/test/ and emits docs/testing/skip-inventory.md
  grouped by package with file:line:expression triples. Current
  inventory: 142 t.Skip sites, 76 testing.Short() guards.
  scripts/ci-guards/skip-inventory-drift.sh regenerates and fails on
  diff (excluding the 'Last reviewed' timestamp line which drifts
  daily). The Markdown is the canonical acquisition-diligence artifact
  for 'what tests are being skipped and why.'

ARCH-H3 — MCP catalogue floor reconciliation
  Audit framing was '121 vs floor 150 — doc/code drift.' Live count
  via the test's actual regex over all 5 tool files (tools.go +
  tools_audit_fix.go + tools_auth.go + tools_auth_bundle2.go +
  tools_est.go): 155 unique 'Name: "certctl_*"' declarations.
  Pre-Phase-3 audit measured tools.go in isolation (121) and missed
  the other 4 files (+34 unique names). The test at
  internal/ciparity/surface_parity_test.go::TestSurfaceParity_MCP
  passes today (155 ≥ 150). Added a clarifying comment near
  mcpBaselineFloor explaining the measurement scope so future
  reviewers don't repeat the audit's framing error.
  STATUS: stale — no code drift, just a measurement scoping error in
  the audit.

ARCH-L1 — panic() rationale comments
  5 panic sites in production Go (excluding _test.go):
    - internal/repository/postgres/tx.go:84
    - internal/service/issuer.go:861 (mustJSON)
    - internal/service/est.go:728 (mustParseTime)
    - internal/service/acme.go:1288 (rand source failure — already documented)
    - internal/pkcs7/certrep.go:270 (OID marshal — already documented)
  Added ARCH-L1 rationale comments to the 3 sites that didn't have
  them. All 5 are defensible impossible-path / rethrow / hardcoded-
  constant guards.

ARCH-L3 — Migration IF-NOT-EXISTS carve-outs
  4 migrations skip the literal 'IF NOT EXISTS' token but ARE
  idempotent via different Postgres patterns:
    - 000014_policy_violation_severity_check.up.sql: ALTER TABLE
      ADD CONSTRAINT CHECK doesn't accept IF NOT EXISTS; idempotency
      via DROP CONSTRAINT IF EXISTS preamble.
    - 000018_audit_events_worm.up.sql: CREATE OR REPLACE FUNCTION
      + DROP TRIGGER IF EXISTS + CREATE TRIGGER + DO $$ pg_roles
      existence check. CREATE TRIGGER doesn't take IF NOT EXISTS.
    - 000030_rbac_admin_perms.up.sql: INSERT ... ON CONFLICT DO NOTHING.
    - 000039_audit_crit1_perms.up.sql: same INSERT + ON CONFLICT pattern.
  Added ARCH-L3 header comments to each explaining the carve-out so
  reviewers don't flag the missing literal token.
  STATUS: largely stale — migrations are already idempotent.

ARCH-L4 — TODO/FIXME → see #<descriptor>
  5 TODOs rewritten to the allowed 'see #<descriptor>' pattern:
    - internal/repository/postgres/auth.go:220 → see #bundle-2-scope-fk
    - internal/connector/discovery/gcpsm/gcpsm.go:547 → see #gcpsm-pagination
    - internal/service/audit.go:244 → see #audit-pagination-count
    - internal/service/job.go:295, 299 → see #validation-job-impl
  New CI guard scripts/ci-guards/no-todo-in-prod.sh grep-fails any
  new TODO/FIXME in cmd/ + internal/ (excluding _test.go); allows
  'see #N' / 'see #<descriptor>' patterns.

Sandbox limitation
==================
The 6.1 GB certctl working tree fills the sandbox volume; go1.25.10
toolchain download fails with 'no space left on device' (sandbox has
1.25.9; go.mod requires 1.25.10). Local 'go test' / 'go build' NOT
run in this commit. Operator must run 'make verify' on their
workstation before push per CLAUDE.md operating rules.

The smoke.spec.ts NOT executed in the sandbox (no chromium installed).
Operator runs 'cd web && npm install && npx playwright install
--with-deps chromium && npm run e2e' on first wire-up.

All CI guards (no-todo-in-prod, skip-inventory-drift, G-3
env-docs-drift, doc-rot-detector, and every existing guard) verified
clean by running each individually.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-TEST-H1,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-H2,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-M1,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-M2,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-M3,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-M4,
        cowork/certctl-architecture-diligence-audit.html#fix-TEST-L1,
        cowork/certctl-architecture-diligence-audit.html#fix-ARCH-H3,
        cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L1,
        cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L2,
        cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L3,
        cowork/certctl-architecture-diligence-audit.html#fix-ARCH-L4

2026-05-13 20:10:08 +00:00

B2-compose-base-no-demo-env.sh

fix(security): close BUNDLE 2 — safe first run, demo mode, agent bootstrap

2026-05-13 00:14:59 +00:00

B3-helm-chart-coherence.sh

fix(helm): close BUNDLE 3 — Helm chart hardening + enterprise deploy

2026-05-13 00:40:42 +00:00

B6-no-private-keys-in-tree.sh

docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard

2026-05-13 01:48:40 +00:00

B-1-orphan-crud.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-1-compat-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-1-to-2-upgrade-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-8-L-015-target-blank-rel-noopener.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-L-019-dangerously-set-inner-html.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-M-009-bare-usemutation.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

complete-path-config-coverage-exceptions.yaml

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

complete-path-config-coverage.sh

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

cors-wildcard-allowlist.sh

fix(api/cors): narrow Bundle-2 routes from wildcard to NewCORS(corsCfg)

2026-05-10 20:12:19 +00:00

D-1-D-2-statusbadge-phantom.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

digest-validity.sh

ci(digest-validity): exclude Windows IIS digest — image is doc-only, not pulled by Linux CI

2026-05-01 03:06:49 +00:00

doc-rot-detector-exceptions.yaml

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

doc-rot-detector.sh

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

G-1-jwt-auth-literal.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-2-api-key-hash-json.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-3-env-docs-drift.sh

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

H-1-encryption-key-min-length.sh

fix(deploy/test) + ci(guard): unblock deploy-vendor-e2e — encryption-key length

2026-05-01 00:57:43 +00:00

H-001-bare-from.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

H-009-readme-jwt.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

L-1-bulk-action-loop.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

L-001-insecure-skip-verify.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

M-012-no-root-user.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

multi-tenant-query-coverage.sh

chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5

2026-05-11 14:19:35 +00:00

N-bundle-2-security-empty-preserved.sh

auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints),

2026-05-10 06:08:27 +00:00

no-change-me-in-prod-compose.sh

config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)

2026-05-13 19:50:00 +00:00

no-new-synthetic-admin.sh

harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8)

2026-05-11 11:45:54 +00:00

no-precompiled-binary.sh

ci: supply-chain hardening (Phase 1 closure — RED-1, RED-2, TEST-L2)

2026-05-13 19:30:53 +00:00

no-tag-pinned-actions.sh

ci: supply-chain hardening (Phase 1 closure — RED-1, RED-2, TEST-L2)

2026-05-13 19:30:53 +00:00

no-todo-in-prod.sh

ci: floor raise + doc drift (Phase 3 closure — TEST-H1/H2/M1/M2/M3/M4/L1, ARCH-H3/L1/L2/L3/L4)

2026-05-13 20:10:08 +00:00

openapi-handler-parity.sh

ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job

2026-04-30 20:50:52 +00:00

P-1-documented-orphan-fns.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

S-1-hardcoded-source-counts.sh

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

S-2-strings-contains-err.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

skip-inventory-drift.sh

ci: floor raise + doc drift (Phase 3 closure — TEST-H1/H2/M1/M2/M3/M4/L1, ARCH-H3/L1/L2/L3/L4)

2026-05-13 20:10:08 +00:00

surface-parity-mcp-exemptions.yaml

feat(ci): item-2 cross-surface contract parity (stdlib-only package)

2026-05-12 14:09:32 +00:00

T-1-frontend-page-coverage.sh

web, docs: IssuerHierarchyPage + sysadmin runbook + connectors row (Rank 8 commit 5)

2026-05-04 02:33:48 +00:00

test-compose-scep-coherence.sh

fix(deploy/test) + ci(guard): drop dead SCEP profile from test compose

2026-05-01 01:39:18 +00:00

test-naming-convention.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

U-2-plaintext-healthcheck.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

U-3-migration-mount.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

`scripts/ci-guards/` — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

Be exit-code 0 on a clean repo (no regression present).
Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
Use set -e early to fail-fast on internal command errors.
Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
scripts/check-coverage-thresholds.sh — consumes coverage.out
- .github/coverage-thresholds.yml

Adding a new guard

Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
Make it executable: chmod +x scripts/ci-guards/<id>.sh.
Verify it fails on a deliberate regression and passes on clean repo.
CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID	Finding	Catches
`G-1-jwt-auth-literal`	G-1 JWT silent auth downgrade	`"jwt"` literal in additive auth-type surfaces
`L-001-insecure-skip-verify`	L-001 unjustified InsecureSkipVerify	`InsecureSkipVerify: true` without `//nolint:gosec`
`H-001-bare-from`	H-001 (CWE-829) tag-swap attack	Bare `FROM` line without `@sha256` digest pin
`M-012-no-root-user`	M-012 (CWE-250) container-as-root	Dockerfile missing terminal `USER <non-root>`
`H-009-readme-jwt`	H-009 README JWT advertising	README.md re-introducing JWT-as-supported claim
`G-2-api-key-hash-json`	G-2 cat-s5-apikey_leak	`api_key_hash` in JSON-emitting surface
`U-2-plaintext-healthcheck`	U-2 healthcheck protocol mismatch	Plaintext `http://` in HEALTHCHECK directive
`U-3-migration-mount`	U-3 seed initdb schema drift	Migration file mounted into postgres initdb
`D-1-D-2-statusbadge-phantom`	D-1 + D-2 dead keys + TS phantoms	StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
`L-1-bulk-action-loop`	L-1 client-side bulk loops	`for ... await triggerRenewal/updateCertificate` in CertificatesPage
`B-1-orphan-crud`	B-1 orphan-CRUD client fns	8 update/create/delete fns lose their page consumer
`S-2-strings-contains-err`	S-2 brittle error-dispatch	`strings.Contains(err.Error(), "not found"\|"violates foreign key")` in handlers
`G-3-env-docs-drift`	G-3 env-var docs drift	`CERTCTL_*` env var defined OR documented but not both
`test-naming-convention`	I-001-extended	`func TestXxx` (lowercase first letter) — Go silently skips
`S-1-hardcoded-source-counts`	S-1 stale numeric prose	Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
`P-1-documented-orphan-fns`	P-1 documented orphans	16 read-fn names removed from client.ts exports
`T-1-frontend-page-coverage`	T-1 untested frontend pages	New page in `web/src/pages/` without sibling `.test.tsx` and not on the deferred allowlist
`bundle-8-L-015-target-blank-rel-noopener`	L-015 (CWE-1022) reverse-tabnabbing	`target="_blank"` without `rel="noopener noreferrer"`
`bundle-8-L-019-dangerously-set-inner-html`	L-019 (CWE-79) XSS	`dangerouslySetInnerHTML` outside `safeHtml.ts`
`bundle-8-M-009-bare-usemutation`	M-009 + M-029 mutation contract	Bare `useMutation()` outside `useTrackedMutation` wrapper
`H-1-encryption-key-min-length`	H-1 closure follow-up (post-Phase-5 surfacing)	`CERTCTL_CONFIG_ENCRYPTION_KEY` literal in any `deploy/docker-compose*.yml` shorter than the 32-byte floor enforced by `internal/config/config.go::Validate()`
`test-compose-scep-coherence`	post-Phase-5 surfacing of dead SCEP test config	`CERTCTL_SCEP_ENABLED=true` in test compose without (a) a CI job that runs the SCEP integration test, (b) the `ra.crt` + `ra.key` + `intune_trust_anchor.pem` fixtures committed to `deploy/test/fixtures/`, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID	Item	Catches
`complete-path-config-coverage`	post-v2.1.0 / item-1	"Lying field" — `CERTCTL_*` env var defined in `internal/config/config.go` that no consumer outside `internal/config/` actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at `internal/config/coverage_test.go`.
`doc-rot-detector`	post-v2.1.0 / item-5	Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. `docs/archive/` allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done

README.md

scripts/ci-guards/ — Regression-guard scripts

Contract

Helpers vs guards

Adding a new guard

Guards in this directory

Per-finding regression guards

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

Running the full set locally

`scripts/ci-guards/` — Regression-guard scripts