Pure mode-change commit. The previous a3599ad commit dropped the
executable bit (100755 → 100644) on five files in scripts/ci-guards/
plus scripts/qa-doc-seed-count.sh and scripts/dev-setup.sh — a
sandbox-tooling artefact, not intentional. The CI pipeline calls
each guard via 'bash "$g"' so the missing exec bit didn't break
anything operationally, but operators who run a guard directly via
'./scripts/ci-guards/<id>.sh' would hit a permission-denied. Restore
to 100755 to match the rest of scripts/ci-guards/*.sh.
No content changes.
CI run on the d8aa3fc push surfaced two real failures rooted in the
2026-05-04 docs overhaul:
1. G-3 env-docs-drift caught two phantom CERTCTL_* env vars I'd
introduced in the Phase 4 follow-on connector pages
(CERTCTL_CA_CERT_PATH_NEW in adcs.md was a placeholder I made
up; CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS in ejbca.md does not
exist in source). Both removed.
2. QA-doc Part-count drift guard tried to grep
docs/qa-test-guide.md and docs/testing-guide.md, both of which
were renamed/deleted in Phase 2/Phase 5. The Part-count drift
class died with testing-guide.md (Phase 5 prune dispersed its
content); the seed-count drift class is still live but pointed
at the wrong path.
Fixes:
- Removed the QA-doc Part-count drift guard from ci.yml (premise
dead) plus its standalone scripts/qa-doc-part-count.sh peer.
- Retargeted the QA-doc seed-count drift guard from
docs/qa-test-guide.md → docs/contributor/qa-test-suite.md (the
Phase 2 target). Updated both ci.yml inline copy and
scripts/qa-doc-seed-count.sh.
- Updated Makefile qa-stats: target to drop the testing-guide.md
Parts metric (file is gone).
- Updated Makefile verify-docs: target to drop the part-count step.
G-3 was also failing in the second direction (env vars defined in
config.go but never documented anywhere). 16 vars surfaced —
features.md (deleted Phase 6) and testing-guide.md (deleted Phase 5)
had been their canonical home. Created
docs/reference/configuration.md as the new home: a compact
operator-facing env-var reference covering scheduler intervals, job
lifecycle, rate limiting, audit, deploy verify, database,
agent-side, and SCEP profile binding. Added to docs/README.md
Reference table.
Doc-side updates to qa-test-suite.md to reframe its references to
the deleted testing-guide.md (it's now self-contained: the
Part-by-Part Coverage Map IS the canonical Part inventory).
Cosmetic comment-only updates in ci.yml + scripts/ci-guards/*.sh +
scripts/dev-setup.sh to point at the new audience-organized doc
paths (docs/operator/security.md, docs/operator/tls.md,
docs/reference/architecture.md, etc.) instead of the pre-Phase-2
flat layout.
Verified: all 24 ci-guards/*.sh pass locally; qa-doc-seed-count.sh
clean. Net diff: 178 additions / 112 deletions across 13 files.
One file deleted (qa-doc-part-count.sh) and one file added
(docs/reference/configuration.md).
Final commit of the 5-commit Rank 8 chain. Operator-facing surface
on top of the service + handler layers shipped in commits 1-4.
Frontend (web/src):
- api/client.ts: 3 new functions + IntermediateCA interface
(listIntermediateCAs, getIntermediateCA, retireIntermediateCA).
- pages/IssuerHierarchyPage.tsx: recursive nested <ul> render of
the hierarchy tree at /issuers/:id/hierarchy. buildHierarchyTree
is a pure helper that walks the flat list and groups children
on parent_ca_id; the dendrogram view is parking-lot work tracked
in WORKSPACE-ROADMAP. Two-phase retire UX surfaces 'Retire…'
then 'Confirm retire (terminal)' when the row is in retiring
state. Admin gate is enforced at the API; the page renders the
backend's 403 as ErrorState for non-admin callers.
- main.tsx: register the new /issuers/:id/hierarchy route.
CI guard update:
- scripts/ci-guards/T-1-frontend-page-coverage.sh: add
IssuerHierarchyPage to the deferred-test allowlist with the
standard 'why deferred' comment. Admin-gate + recursive build
semantics are already pinned at the backend layer
(intermediate_ca_test.go service tests + intermediate_ca_test.go
handler triplet). Vitest test deferred until next feature
change touches the page.
Docs:
- docs/intermediate-ca-hierarchy.md: new operator runbook
covering:
Concepts (HierarchyMode 'single' vs 'tree', defense-in-depth
on key bytes never persisting on rows).
Lifecycle states + drain-first semantics
(active → retiring → retired with active-children gate).
Three deployment patterns: 4-level FedRAMP boundary CA,
3-level financial-services policy CA, 2-level internal
PKI.
RFC 5280 enforcement (§3.2 self-signed, §4.2.1.9 path-length
tightening, §4.2.1.10 NameConstraints subset).
Migration from single → tree using the load-bearing
TestLocal_HierarchyMode_SingleVsTree_ByteIdentical pin as
the canary.
API reference + observability (IntermediateCAMetrics
Prometheus exposure).
Known limitations + Rank-8 follow-on roadmap.
- docs/connectors.md: extend the Built-in Local CA section with
a 'Tree mode (Rank 8)' paragraph describing the new chain
assembly path + cross-link to docs/intermediate-ca-hierarchy.md.
Roadmap:
- WORKSPACE-ROADMAP.md: 5 follow-on items under a new
'Intermediate CA hierarchy extensions (Rank 8 V2 follow-ons)'
bullet block:
HSM-backed roots (PKCS#11 / cloud KMS drivers via existing
signer.Driver interface — no service-layer change needed).
Automated CA rotation (parallel-validity windows ahead of
expiry).
Intra-hierarchy CRL chaining (per-CA CRL endpoints stitched
at issue time).
NameConstraints policy templates (FedRAMP / financial /
internal PKI declarative templates instead of hand-rolled
JSON).
D3 dendrogram visualization (separate page so the existing
list view stays the default + the dep stays opt-in).
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
tsc --noEmit (web/): exit 0 (no TypeScript errors).
go test -short -count=1 ./internal/api/handler/... + service +
local: ok across all three packages, 4-5s each.
All 24 CI guards: clean
(T-1 frontend-page-coverage with the new
IssuerHierarchyPage allowlist entry; openapi-handler-parity,
M-008 admin-gate, every other guard untouched).
Rank 8 chain complete:
468b75c domain, migrations: IntermediateCA type + intermediate_cas
+ Issuer.HierarchyMode (commit 1)
0562359 service: IntermediateCAService + IntermediateCAMetrics
+ RFC 5280 enforcement (commit 2)
5bf2f0c service: 10 IntermediateCAService tests + in-memory fake
repo (commit 2.5)
8ff5668 local: tree-mode chain assembly + byte-equivalence pin
(commit 3 — load-bearing backwards-compat refuse-to-ship
pin in TestLocal_HierarchyMode_SingleVsTree_ByteIdentical)
4d17ef9 api, handler: 4 admin-gated CA hierarchy endpoints +
OpenAPI (commit 4)
HEAD web, docs: IssuerHierarchyPage + sysadmin runbook +
connectors row (this commit)
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 5.
Phase 1b push (commit 27bd660) failed three CI guards. None were
caught by `make verify` locally because they're CI-only guards
that aren't part of the Makefile target. This commit fixes all
three.
1. go.mod tidy diff. The go-jose v4 dep was added with `// indirect`
in go.mod after the initial `go get`, but the codebase imports it
directly from internal/api/acme/jws.go + service/acme.go +
handler/acme.go. CI's `go mod tidy && git diff --exit-code go.mod
go.sum` flagged the staleness. Promoted to a direct require in
the same `require (...)` block as github.com/aws/aws-sdk-go-v2
etc.
2. G-3-env-docs-drift.sh. The guard greps `\bCERTCTL_[A-Z_]+\b` in
docs/ and complains when the bare-prefix forms don't match
anything defined in config.go. Phase 1a + 1b's docs/acme-server.md
intro and migration header use bare-prefix forms `CERTCTL_ACME_*`
and `CERTCTL_ACME_SERVER_*` to describe namespace separation
(consumer-side ACMEConfig vs server-side ACMEServerConfig). Same
precedent as the existing CERTCTL_SCEP_ + CERTCTL_TLS_ +
CERTCTL_QA_* prefix entries already in the guard's ALLOWED list.
Added CERTCTL_ACME_ + CERTCTL_ACME_SERVER_ to the ALLOWED list
with a justification comment block matching the existing
integration-surface allowlist convention.
3. openapi-handler-parity.sh. Distinct from
internal/api/router/openapi_parity_test.go (which runs at `go
test` time and has its own SpecParityExceptions map I extended
in 1a + 1b) — this is a separate CI-only guard that reads
api/openapi-handler-exceptions.yaml. The 6 Phase-1a routes + 4
Phase-1b routes (10 ACME endpoints total) were never added to
that yaml. Same rationale as the SCEP/SCEP-mTLS entries already
in the file: ACME is a JWS-signed-JSON wire protocol per
RFC 8555 + RFC 9773, not an OpenAPI-shape REST surface.
Documenting every endpoint in openapi.yaml would duplicate the
RFC. The canonical reference is docs/acme-server.md. Phases 2-4
will add their routes to this yaml in lockstep with router.go.
Verified locally:
- bash scripts/ci-guards/G-3-env-docs-drift.sh → clean.
- bash scripts/ci-guards/openapi-handler-parity.sh → clean
(152 router routes, 136 OpenAPI ops, 18 documented exceptions).
- All other ci-guards/*.sh → clean.
- go.mod diff after `go mod tidy` is empty.
CI run #376 (commit 58ddf19, Frontend Build job) failed with:
digest does not resolve: mcr.microsoft.com/windows/servercore/iis:
windowsservercore-ltsc2022@sha256:8d0b0e651ad514e3fb05978db66f38036
118812e1b9314a48f10419cad8a3462
A re-run with no code changes went green. The digest itself is fine —
verified against MCR directly (HTTP 200 from
mcr.microsoft.com/v2/windows/servercore/iis/manifests/sha256:8d0b...),
and the tag `:windowsservercore-ltsc2022` currently resolves to that
exact digest. Microsoft hasn't rotated.
Root cause is registry-side rate-limiting. MCR throttles unauthenticated
GET-by-digest requests by source IP. GitHub-hosted runners share a small
pool of egress IPs across many users; bursts trip the throttle and
return non-200. Re-run = different runner = different IP = throttle
window has reset = pass. This will recur on roughly N% of pushes
indefinitely, until either (a) Microsoft loosens MCR rate limits, (b)
GitHub buys more runner IPs, or (c) we stop verifying digests CI doesn't
actually use.
The deeper issue is structural, not transient. The Windows IIS image is
gated behind compose `profiles: [deploy-e2e-windows]`
(deploy/docker-compose.test.yml:700). The comment block above the
service definition (lines 675-691) explicitly says "Linux CI never
activates this profile." All 10 TestVendorEdge_IIS_*_E2E tests are on
scripts/vendor-e2e-skip-allowlist.txt because the sidecar is never
started. The whole Windows matrix was DELETED in ci-pipeline-cleanup
Phase 6 / frozen decision 0.5 (revising Bundle II decision 0.4); IIS
validation moved to docs/connector-iis.md::Operator validation playbook.
So `digest-validity.sh` is verifying a digest that no CI job ever pulls
— paying CI brittleness against MCR rate-limiting we can't control, for
an image whose only purpose in compose is documentation for an
operator's manual workflow on a real Windows host.
The fix matches the guard's stated purpose ("every digest CI actually
depends on is valid"): exclude images CI never pulls.
Implementation. Add an EXCLUDED_PATTERNS array near the top of the
script with one entry — the IIS image path
`mcr.microsoft.com/windows/servercore/iis` — and a comment block above
it documenting:
- WHY it's excluded (gated profile, never started, all tests on
skip-allowlist)
- WHEN it would need re-inclusion (if a Windows CI runner is added
that actually starts the sidecar)
- WHAT this list is NOT for (transient flake silencing — that gets
fixed via retry logic in the script, not via exclusion)
The match is by image-path substring, not by digest, so future tag/
digest updates of the same image still hit the exclusion without
needing this list to be re-edited.
Loop logic gains a 6-line check that runs the exclusion match before
any registry work. Excluded refs log as "SKIP (excluded) <ref>" so
operator-facing CI logs stay informative — at a glance you can see
which digests were verified vs which were intentionally not.
The success message updates to differentiate verified vs excluded
counts: "digest-validity: clean — N verified, M excluded (CI never
pulls)" when M > 0; original message preserved when M == 0.
Verified manually:
- Clean repo: 15 verified, 1 excluded, exit 0.
- Fabricated bogus httpd digest: ::error:: emitted for the bad
digest, IIS still SKIP-excluded, exit 1. (Real regressions still
caught.)
- Restore: 15 verified, 1 excluded, exit 0 again.
Other recurring MCR-hosted images would warrant the same treatment if
they get added later. The exclusion list pattern scales: each new entry
needs its own "WHY this is doc-only" justification block.
What this is NOT:
- Not a generic flake-silencer. The exclusion is justified by the
image being doc-only, not by the test being noisy.
- Not a global retry/resilience layer. If MCR rate-limits an image CI
DOES pull, that's a real CI dependency on an unreliable external
service — fix by retry-with-backoff, not by excluding.
The deploy-vendor-e2e job has been failing with the certctl-test-server
container restarting endlessly. Diagnostic dump (added in 69266c8)
finally surfaced the actual cause:
Failed to load configuration: SCEP profile 0 (PathID="e2eintune")
has empty CHALLENGE_PASSWORD — refuse to start (CWE-306: per-profile
shared secret is the sole application-layer auth boundary; an empty
password would allow any client reaching /scep/e2eintune to enroll
a CSR against issuer "iss-local")
Same shape as the encryption-key fix that landed in 4bb7a74: a config
validation gate added in code that the test compose never got updated
to satisfy, hidden pre-Phase-5 because the matrix-collapse hadn't yet
forced the certctl-server to actually boot in CI.
Root cause is more interesting than just "missing env var." The
2026-04-29 SCEP RFC 8894 + Intune master bundle Phase I added an
`e2eintune` SCEP profile to docker-compose.test.yml expecting
deploy/test/scep_intune_e2e_test.go to exercise it. That integration
test does exist (//go:build integration) but **NO CI job ever
selects it** — ci.yml's deploy-vendor-e2e job runs only
`-run 'VendorEdge_'` (line 379), and no other job invokes
`go test -tags integration` with a SCEP selector. Confirmed via
`grep -rnE "scep_intune|SCEPIntune" .github/workflows/` returning
empty.
Worse: the supporting fixtures (ra.crt + ra.key + intune_trust_anchor.pem)
were documented in deploy/test/fixtures/README.md with the
regeneration recipe but never actually committed. Pre-Phase-5 the
test stack didn't fully boot the server in CI, so the entire stack
of debt — dead config + missing fixtures + no consumer test — sat
silent until the matrix collapse forced the boot path.
Fixing this with a fake CHALLENGE_PASSWORD value would silence the
immediate validator but leave the real problem in place: maintenance
cost on test config that no test exercises. Same critique applies
to "let me commit fake fixtures" — the fixtures alone don't add
test coverage when no CI job runs the SCEP test.
The complete-path fix is to make the test compose match what CI
actually exercises:
- deploy/docker-compose.test.yml: drop CERTCTL_SCEP_ENABLED + the
full e2eintune profile env var family (10 lines) + the
./test/fixtures volume mount (1 line). Replace with an in-line
comment explaining why SCEP is intentionally disabled and what
needs to come back together when SCEP is added to CI for real.
- scripts/ci-guards/test-compose-scep-coherence.sh (new, 22nd
guard): refuses any future state where CERTCTL_SCEP_ENABLED=true
in test compose without ALL of:
1. A CI job that runs the SCEP integration test (matched by
scep_intune | SCEPIntune | -run [Ss]cep in ci.yml)
2. The fixture files actually committed (ra.crt, ra.key,
intune_trust_anchor.pem)
3. The ./test/fixtures:/etc/certctl/scep:ro volume mount
Verified manually with the same pattern as the H-1 guard:
clean tree → exit 0; deliberate SCEP_ENABLED=true regression →
exit 1 with 5 ::error:: annotations covering each gap; restore
→ exit 0 again.
- scripts/ci-guards/README.md: 21 → 22 guards, new row.
The fixtures README at deploy/test/fixtures/README.md keeps the
regeneration recipe so the eventual SCEP CI job lands cleanly: the
operator who adds the SCEP job restores the env vars, regenerates
+ commits the fixtures, and the guard auto-passes.
Pattern (now firm across this CI-stabilization sequence):
- Pre-existing latent bug
- Old CI structurally hid it (per-vendor matrix, missing boot path)
- Phase-5 matrix collapse + new diagnostic infra exposed it
- Direct fix unblocks today
- Regression guard prevents the same shape of drift forever
Encryption-key (4bb7a74) was the same shape; this is its sibling.
Two-part complete-path fix for the deploy-vendor-e2e failure that has
been firing since the ci-pipeline-cleanup Phase 5 matrix collapse
started actually booting the certctl-test-server:
Failed to load configuration:
CERTCTL_CONFIG_ENCRYPTION_KEY too short (29 bytes; minimum 32).
Surfaced via the diagnostic-dump step landed in commit 69266c8 — the
server panicked on startup, Docker restarted it endlessly, compose
reported the dependency-chain symptom ("container certctl-test-server
is unhealthy"), but the actual cause was invisible in the previous
CI output. With the dump in place, the next failing run named the
problem in one line.
Root cause. The H-1 audit-closure master commit 6cb4414
("feat(security): bodyLimit on noAuth + security headers + encryption-
key validation (H-1 master)") added internal/config/config.go's
minEncryptionKeyLength = 32 byte floor + 5 unit tests that pin it.
The closure was incomplete: it never enforced the rule against the
literal CERTCTL_CONFIG_ENCRYPTION_KEY values certctl's own
deploy/docker-compose*.yml files pass. Pre-Phase-5 the test stack
didn't fully exercise the validator (the per-vendor matrix didn't
boot certctl-test-server in every job), so the gap was silent.
deploy/docker-compose.test.yml's literal value
`test-encryption-key-32chars!!` was 29 bytes — the name claimed 32
but the author miscounted (4+1+10+1+3+1+2+5+2 = 29). Pattern matches
every fix in this CI-stabilization sequence: pre-existing latent bug
that the old CI structurally hid.
Part 1 — direct fix (deploy/docker-compose.test.yml):
Replace the 29-byte literal with a clearly test-only,
self-documenting 49-byte value (`test-encryption-key-deterministic-
32-byte-fixture`). 17 bytes of safety margin so a future tightening
of the floor (32 → 33+) doesn't break this fixture again. Inline
comment block explains the byte-budget contract + points at the
H-1 closure commit. Production deploy/docker-compose.yml's default
(`change-me-32-char-encryption-key`) is exactly 32 bytes — passes
by 1 byte but on the edge; not touched here because operators are
already told to override it via env (`${VAR:-default}`).
Part 2 — structural fix (scripts/ci-guards/H-1-encryption-key-min-
length.sh):
New regression guard. Scans every deploy/docker-compose*.yml for
literal CERTCTL_CONFIG_ENCRYPTION_KEY values + values inside
${VAR:-default} expansions, checks each against the 32-byte floor,
fails CI with `::error::` annotation pointing at the offending
file:line if any literal regresses. Bare ${VAR} env references with
no default are skipped — those are operator-supplied at runtime
and the validator handles them at boot.
Verified manually:
- Clean repo: `H-1-encryption-key-min-length: clean.` (exit 0)
- 5-byte regression: emits proper ::error:: annotation, exit 1
- Restore: clean again (exit 0)
CI auto-picks up the new guard via the `for g in
scripts/ci-guards/*.sh; do bash "$g"; done` loop in ci.yml's
Regression guards step (no ci.yml change required).
scripts/ci-guards/README.md updated: 20 → 21 guards, new row
explaining the closure rationale.
The structural piece is the more important half of this fix. The
direct fix unblocks today's CI; the guard prevents the same class of
drift from ever recurring silently. Future audit closures that add
new validation rules to internal/config/config.go now have a working
template for the matching CI guard — drop a sibling .sh in the
ci-guards directory.
Bonus — what the diagnostic-dump step (69266c8) bought us. Before
that step landed, the same failure looked like an opaque "container
unhealthy" with no actionable signal. With it, the actual error
message + the offending env var + the exact byte count came out in
one CI run. The diagnostic infrastructure paid for itself within one
push.
The 'Regression guards' loop step in ci.yml runs:
for g in scripts/ci-guards/*.sh; do bash "$g"; done
Per the directory's own contract (scripts/ci-guards/README.md), every
script there MUST be runnable bare with no args / no env. Three files
violated that contract — they're helpers consumed by specific CI job
steps with arguments, not regression guards. They were misplaced.
Moved (git mv):
scripts/ci-guards/vendor-e2e-skip-check.sh → scripts/
scripts/ci-guards/vendor-e2e-skip-allowlist.txt → scripts/
scripts/ci-guards/coverage-pr-comment.sh → scripts/
Updated ci.yml call sites:
- deploy-vendor-e2e job: bash scripts/vendor-e2e-skip-check.sh $LOG
- go-build-and-test job: bash scripts/coverage-pr-comment.sh
Tightened scripts/vendor-e2e-skip-check.sh arg parse from a silent
default ('LOG=${1:-test-output.log}') to a mandatory-arg form
('LOG=${1:?usage: ...}') so misuse fails loud at parse time rather
than at the missing-file check.
Updated scripts/ci-guards/README.md contract to spell out the
guard-vs-helper distinction explicitly; lists current helpers under
scripts/ for future-author guidance.
Verified locally: 'for g in scripts/ci-guards/*.sh; do bash $g; done'
returns clean (22 guards pass) on HEAD post-move.
Closes the regression-guards-loop failure that surfaced in CI run
25192163943 (job 73864471346 'Frontend Build').
Bundle: ci-pipeline-cleanup, Phase 10 / frozen decision 0.9.
Self-hosted alternative to Codecov / Coveralls. Posts a per-package
coverage delta as a PR comment on every PR; updates the same comment
in place on subsequent pushes (avoids duplicate noise).
scripts/ci-guards/coverage-pr-comment.sh:
- Reads coverage.out from the prior Go Test step
- Builds per-package coverage table (mirrors check-coverage-thresholds
averaging logic)
- Searches existing PR comments for the '**Coverage report' marker
and PATCHes the existing one if found, else POSTs a new one
- No-op on non-PR builds (push to master, scheduled, etc.)
Wired into go-build-and-test job after 'Upload Coverage Report' step
with if: github.event_name == 'pull_request' guard.
Operator can swap to Codecov/Coveralls later by replacing this script
+ step with a third-party action — the YAML manifest at
.github/coverage-thresholds.yml stays unchanged either way.
Bundle: ci-pipeline-cleanup, Phases 7-9 / frozen decisions 0.8 + 0.10 + 0.11.
NEW image-and-supply-chain job (Ubuntu, ~3 min). Three steps:
PHASE 7 — Digest validity
scripts/ci-guards/digest-validity.sh resolves every @sha256:<digest>
ref in deploy/**/*.{yml,Dockerfile*} against its registry. Closes the
H-001 lying-field gap that Bundle II hit (11 fabricated digests passed
H-001's regex-only check and failed docker pull in CI).
Sandbox verification: 16/16 digests in deploy/* + Dockerfiles all
return HTTP 200 from registry-1.docker.io / ghcr.io / mcr.microsoft.com.
PHASE 8 — Docker build smoke (all 4 Dockerfiles)
Per frozen decision 0.10: build Dockerfile, Dockerfile.agent,
deploy/test/f5-mock-icontrol/Dockerfile, deploy/test/libest/Dockerfile.
Catches syntax errors + COPY path drift before tag-time release.yml.
The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a
syntax error there silently breaks the e2e suite.
PHASE 9 — OpenAPI ↔ handler operationId parity
scripts/ci-guards/openapi-handler-parity.sh extracts router routes
(r.mux.Handle / r.Register "METHOD /path" syntax — Go 1.22+ ServeMux),
extracts OpenAPI operations (paths × HTTP methods), and fails if any
router route has no operationId AND is not documented in the new
api/openapi-handler-exceptions.yaml.
Verified gap at HEAD 1de61e91 (root-caused):
142 router routes, 136 OpenAPI operations
6 router-only routes — all SCEP wire-protocol endpoints (RFC-shaped,
not REST). Documented in api/openapi-handler-exceptions.yaml with
one-line why: justifications.
0 OpenAPI-only operations.
Going forward: any new gap fails the build unless documented.
Status checks per push: now 7 (was 8 after Phase 5+6 dropped windows;
this Phase adds 1 = +1 net). Final acceptance gate target.
ci.yml: 383 → 432 lines (+49 for the new job + steps).
Bundle: ci-pipeline-cleanup, Phases 5+6 / frozen decisions 0.4 + 0.5
+ 0.6. Revises Bundle II decisions 0.4 (Windows matrix) and 0.9 (per-
vendor granularity).
PHASE 5 — Linux vendor matrix collapsed (12 jobs → 1):
The previous per-vendor matrix produced 12 status-check rows for
~1 real assertion (115/116 vendor-edge tests are t.Log placeholders
per Bundle II Phase 2-13 design). Granularity was fake signal.
Single-job version: brings up all 11 sidecars at once via
docker compose --profile deploy-e2e up -d, runs go test -run
'VendorEdge_' once, tears down once.
Critical caveat: requireSidecar() in deploy/test/vendor_e2e_helpers.go
uses t.Skipf() when a sidecar isn't reachable — silent test skip,
not CI failure. The new Skip-count enforcement step
(scripts/ci-guards/vendor-e2e-skip-check.sh) counts SKIP lines and
fails the build if it exceeds the allowlist at
scripts/ci-guards/vendor-e2e-skip-allowlist.txt (15 windows-iis-
requiring tests legitimately skip on Linux per Phase 6).
PHASE 6 — Windows matrix deleted entirely:
The deploy-vendor-e2e-windows job removed. Two reasons:
1. Can't physically work on windows-latest today (Docker not started
in Windows-containers mode by default; bridge network driver
missing on Windows Docker — see CI run 25183374742 failure logs).
2. Even fixed, validates nothing — all 16 IIS + WinCertStore tests
are t.Log placeholders that exercise no IIS-specific behavior.
Per Bundle II frozen decision 0.14, the third criterion for
"verified" status in the vendor matrix is operator manual smoke
against a real instance. IIS + WinCertStore now satisfy that via
the playbook (Phase 6 follow-up adds docs/connector-iis.md::
Operator validation playbook).
The windows-iis-test sidecar STAYS in deploy/docker-compose.test.yml
under profiles: [deploy-e2e-windows] for operator local use. Linux
CI never activates this profile.
Operator-required action before merge: RAM headroom verification on
prototype branch (per frozen decision 0.14). If peak RSS > 12 GB on
ubuntu-latest with all 11 sidecars up, fall back to bucketed matrix
per cowork/ci-pipeline-cleanup/decisions-revised.md.
ci.yml: 417 → 383 lines (-34 net; -1105 cumulative since baseline 1488).
Status checks per push: 19 → 7 (collapse 12 vendor + 2 windows = -14;
add image-and-supply-chain in Phase 7-9 = +1; net 19-12-2+1 = ~7).
Operator action for Phase 13: update GitHub branch protection rules
(required-checks list 19 → 7 entries). Documented in cowork/
ci-pipeline-cleanup/decisions-revised.md.
Bundle: ci-pipeline-cleanup, Phase 1.
Pure relocation — no behavior change. Each guard's bash logic is
byte-identical to the prior inline version; the only changes are:
(a) the guard becomes a sibling script under scripts/ci-guards/<id>.sh,
(b) ci.yml's per-guard step is replaced by a single loop step that
iterates all scripts.
20 scripts extracted (alphabetized):
B-1-orphan-crud.sh, D-1-D-2-statusbadge-phantom.sh,
G-1-jwt-auth-literal.sh, G-2-api-key-hash-json.sh,
G-3-env-docs-drift.sh, H-001-bare-from.sh, H-009-readme-jwt.sh,
L-001-insecure-skip-verify.sh, L-1-bulk-action-loop.sh,
M-012-no-root-user.sh, P-1-documented-orphan-fns.sh,
S-1-hardcoded-source-counts.sh, S-2-strings-contains-err.sh,
T-1-frontend-page-coverage.sh, U-2-plaintext-healthcheck.sh,
U-3-migration-mount.sh, bundle-8-L-015-target-blank-rel-noopener.sh,
bundle-8-L-019-dangerously-set-inner-html.sh,
bundle-8-M-009-bare-usemutation.sh, test-naming-convention.sh
Plus scripts/ci-guards/README.md documenting the contract:
- Each script must exit 0 on clean repo, non-zero with ::error::
prefix on regression
- Runnable from repo root via 'bash scripts/ci-guards/<id>.sh'
- Adding a new guard: drop a new <id>.sh; CI auto-picks it up
ci.yml dropped 1488 → 557 lines (-931, -63%).
Single CI loop step now collects ALL guard failures before failing
the build instead of fail-fast — UX win for regressions that hit
two guards at once.
Two guards (QA-doc Part-count + seed-count, ci.yml lines 868-917)
deliberately NOT extracted — they move to 'make verify-docs' in
Phase 11 because they protect docs-the-operator-reads, not the
product itself.
Verification (sandbox):
- All 20 scripts pass against HEAD (chmod +x; for g in scripts/ci-guards/*.sh; do bash $g; done)
- New ci.yml YAML-parses cleanly
- Job boundaries preserved: go-build-and-test, frontend-build,
helm-lint, deploy-vendor-e2e, deploy-vendor-e2e-windows
- Loop step appears twice (once at end of go-build-and-test, once
at end of frontend-build) so both jobs continue running their
set of guards