ci-pipeline-cleanup Phase 5+6: collapse vendor matrix; delete Windows matrix

Bundle: ci-pipeline-cleanup, Phases 5+6 / frozen decisions 0.4 + 0.5
+ 0.6. Revises Bundle II decisions 0.4 (Windows matrix) and 0.9 (per-
vendor granularity).

PHASE 5 — Linux vendor matrix collapsed (12 jobs → 1):

The previous per-vendor matrix produced 12 status-check rows for
~1 real assertion (115/116 vendor-edge tests are t.Log placeholders
per Bundle II Phase 2-13 design). Granularity was fake signal.

Single-job version: brings up all 11 sidecars at once via
docker compose --profile deploy-e2e up -d, runs go test -run
'VendorEdge_' once, tears down once.

Critical caveat: requireSidecar() in deploy/test/vendor_e2e_helpers.go
uses t.Skipf() when a sidecar isn't reachable — silent test skip,
not CI failure. The new Skip-count enforcement step
(scripts/ci-guards/vendor-e2e-skip-check.sh) counts SKIP lines and
fails the build if it exceeds the allowlist at
scripts/ci-guards/vendor-e2e-skip-allowlist.txt (15 windows-iis-
requiring tests legitimately skip on Linux per Phase 6).

PHASE 6 — Windows matrix deleted entirely:

The deploy-vendor-e2e-windows job removed. Two reasons:
1. Can't physically work on windows-latest today (Docker not started
   in Windows-containers mode by default; bridge network driver
   missing on Windows Docker — see CI run 25183374742 failure logs).
2. Even fixed, validates nothing — all 16 IIS + WinCertStore tests
   are t.Log placeholders that exercise no IIS-specific behavior.

Per Bundle II frozen decision 0.14, the third criterion for
"verified" status in the vendor matrix is operator manual smoke
against a real instance. IIS + WinCertStore now satisfy that via
the playbook (Phase 6 follow-up adds docs/connector-iis.md::
Operator validation playbook).

The windows-iis-test sidecar STAYS in deploy/docker-compose.test.yml
under profiles: [deploy-e2e-windows] for operator local use. Linux
CI never activates this profile.

Operator-required action before merge: RAM headroom verification on
prototype branch (per frozen decision 0.14). If peak RSS > 12 GB on
ubuntu-latest with all 11 sidecars up, fall back to bucketed matrix
per cowork/ci-pipeline-cleanup/decisions-revised.md.

ci.yml: 417 → 383 lines (-34 net; -1105 cumulative since baseline 1488).
Status checks per push: 19 → 7 (collapse 12 vendor + 2 windows = -14;
add image-and-supply-chain in Phase 7-9 = +1; net 19-12-2+1 = ~7).

Operator action for Phase 13: update GitHub branch protection rules
(required-checks list 19 → 7 entries). Documented in cowork/
ci-pipeline-cleanup/decisions-revised.md.
This commit is contained in:
shankar0123
2026-04-30 20:46:05 +00:00
parent 0f205a8cfd
commit 0157510d48
3 changed files with 157 additions and 110 deletions
@@ -0,0 +1,40 @@
# scripts/ci-guards/vendor-e2e-skip-allowlist.txt
#
# Test names that are EXPECTED to skip on Linux ubuntu-latest CI runners.
# Each entry: one Go test function name per line. Lines starting with `#`
# are comments / ignored. Blank lines ignored.
#
# Per ci-pipeline-cleanup bundle Phase 5 / frozen decision 0.6.
# The skip-detection guard (in the deploy-vendor-e2e job) counts
# `^--- SKIP:` lines from the test output and fails the build if it
# exceeds the count of unique entries in this allowlist.
#
# When a sidecar fails to start, the affected tests' requireSidecar() call
# triggers t.Skipf() — those skips are NOT in this allowlist and surface
# as a build failure.
# Windows-only tests that legitimately skip on Linux because the
# windows-iis-test sidecar is gated by `profiles: [deploy-e2e-windows]`
# and CI runs only the `deploy-e2e` profile (per ci-pipeline-cleanup
# Phase 6 frozen decision 0.5 — Windows matrix deletion). Operators
# validate these via `docs/connector-iis.md::Operator validation playbook`
# on a real Windows host.
# IIS connector (10 tests; require windows-iis sidecar)
TestVendorEdge_IIS_ARRReverseProxyCertRotation_E2E
TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E
TestVendorEdge_IIS_BindingTypeHttpsValidated_E2E
TestVendorEdge_IIS_CCSCentralizedCertStoreVariant_DeployToSharedStore_E2E
TestVendorEdge_IIS_FriendlyNameUpdatedOnRotation_E2E
TestVendorEdge_IIS_HTTP2ALPNPreserved_E2E
TestVendorEdge_IIS_RemovePreviousBindingOnRotate_E2E
TestVendorEdge_IIS_SNIMultiBindingPerSite_DeployUpdatesCorrectBinding_E2E
TestVendorEdge_IIS_WinRMRemotePath_vs_LocalPowerShellPath_BothWork_E2E
# WinCertStore connector (6 tests; require windows-iis sidecar)
TestVendorEdge_WinCertStore_CertStoreACL_IISIUSRSAccess_E2E
TestVendorEdge_WinCertStore_CertStoreACL_NetworkServiceAccess_E2E
TestVendorEdge_WinCertStore_PrivateKeyExportableFlag_E2E
TestVendorEdge_WinCertStore_RemovePreviousThumbprintOnRotate_E2E
TestVendorEdge_WinCertStore_StoreLocationLocalMachineVsCurrentUser_E2E
TestVendorEdge_WinCertStore_ThumbprintBindingVsFriendlyNameBinding_E2E
+65
View File
@@ -0,0 +1,65 @@
#!/usr/bin/env bash
# scripts/ci-guards/vendor-e2e-skip-check.sh
#
# Counts `^--- SKIP:` lines in the vendor-e2e test output and fails
# the build if any test skipped that's NOT in the allowlist at
# scripts/ci-guards/vendor-e2e-skip-allowlist.txt.
#
# Per ci-pipeline-cleanup bundle Phase 5 / frozen decision 0.6.
# requireSidecar() in deploy/test/vendor_e2e_helpers.go uses
# t.Skipf() when a sidecar isn't reachable. The collapsed
# deploy-vendor-e2e job brings up all 11 sidecars at once — if
# one fails to start, the affected tests skip silently. This
# guard catches that.
#
# Usage: bash scripts/ci-guards/vendor-e2e-skip-check.sh <test-output.log>
set -e
LOG="${1:-test-output.log}"
ALLOWLIST="scripts/ci-guards/vendor-e2e-skip-allowlist.txt"
if [ ! -f "$LOG" ]; then
echo "::error::test output log not found: $LOG"
exit 1
fi
if [ ! -f "$ALLOWLIST" ]; then
echo "::error::skip allowlist not found: $ALLOWLIST"
exit 1
fi
# Build the set of allowed-skip test names (strip comments + blanks).
allowed=$(grep -vE '^\s*(#|$)' "$ALLOWLIST" | sort -u)
allowed_count=$(echo "$allowed" | grep -c .)
# Extract skipped test names from `--- SKIP: TestName (0.00s)` style lines.
skipped=$(grep -E '^--- SKIP: ' "$LOG" | awk '{print $3}' | sort -u || true)
skipped_count=$(echo "$skipped" | grep -c . || true)
echo "Vendor-e2e skip-check:"
echo " allowlist size: $allowed_count"
echo " observed skips: $skipped_count"
# Find skips not in allowlist.
unexpected=$(comm -23 <(echo "$skipped") <(echo "$allowed") || true)
if [ -n "$unexpected" ]; then
echo "::error::Unexpected test skips — a sidecar likely failed to start"
echo "Unexpected skipped tests (not in $ALLOWLIST):"
echo "$unexpected" | sed 's/^/ - /'
echo ""
echo "Either:"
echo " (a) Fix the sidecar / network / docker-compose issue causing the skip, OR"
echo " (b) If the skip is legitimate (e.g., a new Windows-only test added),"
echo " add the test name to $ALLOWLIST with a one-line justification comment."
exit 1
fi
# Also flag skips beyond the allowlist count (defensive — comm -23 catches
# this already but the explicit count check makes the error message clearer).
if [ "$skipped_count" -gt "$allowed_count" ]; then
echo "::error::Skip count $skipped_count exceeds allowlist size $allowed_count"
exit 1
fi
echo "vendor-e2e-skip-check: clean ($skipped_count skips ≤ $allowed_count allowed)."