ci-pipeline-cleanup Phase 5+6: collapse vendor matrix; delete Windows matrix

Bundle: ci-pipeline-cleanup, Phases 5+6 / frozen decisions 0.4 + 0.5
+ 0.6. Revises Bundle II decisions 0.4 (Windows matrix) and 0.9 (per-
vendor granularity).

PHASE 5 — Linux vendor matrix collapsed (12 jobs → 1):

The previous per-vendor matrix produced 12 status-check rows for
~1 real assertion (115/116 vendor-edge tests are t.Log placeholders
per Bundle II Phase 2-13 design). Granularity was fake signal.

Single-job version: brings up all 11 sidecars at once via
docker compose --profile deploy-e2e up -d, runs go test -run
'VendorEdge_' once, tears down once.

Critical caveat: requireSidecar() in deploy/test/vendor_e2e_helpers.go
uses t.Skipf() when a sidecar isn't reachable — silent test skip,
not CI failure. The new Skip-count enforcement step
(scripts/ci-guards/vendor-e2e-skip-check.sh) counts SKIP lines and
fails the build if it exceeds the allowlist at
scripts/ci-guards/vendor-e2e-skip-allowlist.txt (15 windows-iis-
requiring tests legitimately skip on Linux per Phase 6).

PHASE 6 — Windows matrix deleted entirely:

The deploy-vendor-e2e-windows job removed. Two reasons:
1. Can't physically work on windows-latest today (Docker not started
   in Windows-containers mode by default; bridge network driver
   missing on Windows Docker — see CI run 25183374742 failure logs).
2. Even fixed, validates nothing — all 16 IIS + WinCertStore tests
   are t.Log placeholders that exercise no IIS-specific behavior.

Per Bundle II frozen decision 0.14, the third criterion for
"verified" status in the vendor matrix is operator manual smoke
against a real instance. IIS + WinCertStore now satisfy that via
the playbook (Phase 6 follow-up adds docs/connector-iis.md::
Operator validation playbook).

The windows-iis-test sidecar STAYS in deploy/docker-compose.test.yml
under profiles: [deploy-e2e-windows] for operator local use. Linux
CI never activates this profile.

Operator-required action before merge: RAM headroom verification on
prototype branch (per frozen decision 0.14). If peak RSS > 12 GB on
ubuntu-latest with all 11 sidecars up, fall back to bucketed matrix
per cowork/ci-pipeline-cleanup/decisions-revised.md.

ci.yml: 417 → 383 lines (-34 net; -1105 cumulative since baseline 1488).
Status checks per push: 19 → 7 (collapse 12 vendor + 2 windows = -14;
add image-and-supply-chain in Phase 7-9 = +1; net 19-12-2+1 = ~7).

Operator action for Phase 13: update GitHub branch protection rules
(required-checks list 19 → 7 entries). Documented in cowork/
ci-pipeline-cleanup/decisions-revised.md.
This commit is contained in:
shankar0123
2026-04-30 20:46:05 +00:00
parent 0f205a8cfd
commit 0157510d48
3 changed files with 157 additions and 110 deletions
+52 -110
View File
@@ -308,20 +308,33 @@ jobs:
fi
# =============================================================================
# Deploy-Hardening II Phase 15 — per-vendor e2e matrix
# deploy-vendor-e2e — single-job (collapsed from 12-job matrix)
# =============================================================================
# Per frozen decision 0.9: each vendor's e2e tests run in their own
# matrix job so vendor failures surface independently in the CI status
# check (operator sees "K8s 1.31 vendor-edge fail" as a discrete check,
# not a generic "integration tests failed").
# Per ci-pipeline-cleanup bundle Phase 5 / frozen decision 0.4 (revises
# Bundle II decision 0.9): the per-vendor matrix produced 12 status-check
# rows for ~1 real assertion (115/116 vendor-edge tests are t.Log
# placeholders). Collapsed to one job that brings up all 11 sidecars
# at once and runs the full VendorEdge_ test set.
#
# Skip-detection guard (scripts/ci-guards/vendor-e2e-skip-check.sh)
# enforces that no test SKIPs except the documented allowlist
# (windows-iis-requiring tests on Linux). If a sidecar fails to come
# up, requireSidecar() in deploy/test/vendor_e2e_helpers.go calls
# t.Skipf() — the guard catches that.
#
# RAM headroom on ubuntu-latest (16 GB ceiling) — operator-confirmed
# in Phase 0 / frozen decision 0.14 prototype-branch run. If RAM
# regresses, fall back to bucketed matrix per
# cowork/ci-pipeline-cleanup/decisions-revised.md.
#
# The Windows matrix (deploy-vendor-e2e-windows) was deleted entirely
# per Phase 6 / frozen decision 0.5 (revises Bundle II decision 0.4).
# IIS + WinCertStore validation moved to the operator playbook at
# docs/connector-iis.md::Operator validation playbook.
deploy-vendor-e2e:
name: deploy-vendor-e2e (${{ matrix.vendor }})
name: deploy-vendor-e2e
runs-on: ubuntu-latest
needs: [go-build-and-test]
strategy:
fail-fast: false
matrix:
vendor: [nginx, apache, haproxy, traefik, caddy, envoy, postfix, dovecot, ssh, javakeystore, k8s, f5-mock]
timeout-minutes: 30
steps:
- uses: actions/checkout@v5
@@ -332,110 +345,39 @@ jobs:
go-version: '1.25.9'
cache: true
- name: Bring up vendor sidecar
# Map matrix.vendor → docker-compose service name. The naming is
# NOT 1:1 because (a) the legacy NGINX vendor-edge tests reuse the
# apache-test sidecar via requireSidecar(t,"apache") — see the
# comment in deploy/test/nginx_vendor_e2e_test.go; (b) the openssh
# service is named openssh-test (not ssh-test); (c) the kind
# cluster service is named k8s-kind-test; (d) the F5 mock service
# is named f5-mock-icontrol and must be built first because it
# has no published image; (e) the JavaKeystore tests are pure-Go
# placeholder stubs that exercise no sidecar.
run: |
set -e
case "${{ matrix.vendor }}" in
nginx) SVC=apache-test ;; # nginx tests reuse apache sidecar
apache) SVC=apache-test ;;
haproxy) SVC=haproxy-test ;;
traefik) SVC=traefik-test ;;
caddy) SVC=caddy-test ;;
envoy) SVC=envoy-test ;;
postfix) SVC=postfix-test ;;
dovecot) SVC=dovecot-test ;;
ssh) SVC=openssh-test ;;
k8s) SVC=k8s-kind-test ;;
f5-mock) SVC=f5-mock-icontrol ;;
javakeystore) SVC="" ;; # pure-Go placeholder stubs; no sidecar needed
*) echo "::error::unknown matrix vendor '${{ matrix.vendor }}'"; exit 1 ;;
esac
if [ -z "$SVC" ]; then
echo "vendor=${{ matrix.vendor }} runs without a sidecar (pure-Go placeholder tests)"
exit 0
fi
if [ "${{ matrix.vendor }}" = "f5-mock" ]; then
docker compose --profile deploy-e2e -f deploy/docker-compose.test.yml build "$SVC"
fi
docker compose --profile deploy-e2e -f deploy/docker-compose.test.yml up -d "$SVC"
sleep 5
- name: Build f5-mock-icontrol sidecar
# The only sidecar without a published image; built from the in-tree
# Go server at deploy/test/f5-mock-icontrol/.
run: docker compose --profile deploy-e2e -f deploy/docker-compose.test.yml build f5-mock-icontrol
- name: Run vendor-edge e2e
- name: Bring up all vendor sidecars
# Brings up the 11 deploy-e2e sidecars (apache-test, haproxy-test,
# traefik-test, caddy-test, envoy-test, postfix-test, dovecot-test,
# openssh-test, f5-mock-icontrol, k8s-kind-test, windows-iis-test
# which is gated by a separate windows-only profile and won't
# actually start) plus the always-on legacy nginx.
run: |
docker compose --profile deploy-e2e -f deploy/docker-compose.test.yml up -d
sleep 15
- name: Run all vendor-edge e2e
# Captures test output for skip-count enforcement (next step).
env:
INTEGRATION: "1"
run: |
# Per frozen decision 0.6: discoverable via
# `go test -run 'VendorEdge_<vendor>'`. Match the matrix
# vendor (test names are CamelCase: TestVendorEdge_NGINX_*,
# TestVendorEdge_HAProxy_*, etc.).
case "${{ matrix.vendor }}" in
nginx) PATTERN='VendorEdge_NGINX' ;;
apache) PATTERN='VendorEdge_Apache' ;;
haproxy) PATTERN='VendorEdge_HAProxy' ;;
traefik) PATTERN='VendorEdge_Traefik' ;;
caddy) PATTERN='VendorEdge_Caddy' ;;
envoy) PATTERN='VendorEdge_Envoy' ;;
postfix) PATTERN='VendorEdge_Postfix' ;;
dovecot) PATTERN='VendorEdge_Dovecot' ;;
ssh) PATTERN='VendorEdge_SSH' ;;
javakeystore) PATTERN='VendorEdge_JavaKeystore' ;;
k8s) PATTERN='VendorEdge_K8s' ;;
f5-mock) PATTERN='VendorEdge_F5' ;;
esac
go test -tags integration -race -count=1 -run "$PATTERN" ./deploy/test/...
go test -tags integration -race -count=1 -run 'VendorEdge_' \
./deploy/test/... 2>&1 | tee test-output.log
- name: Tear down sidecar
- name: Skip-count enforcement
# ci-pipeline-cleanup Phase 5 / frozen decision 0.6:
# requireSidecar uses t.Skipf (not t.Fatal) when a sidecar isn't
# reachable — collapsing the per-vendor matrix removes the implicit
# guard each per-job matrix entry provided. This step counts SKIP
# lines in the test output and fails the build if it exceeds the
# allowlist (windows-iis-requiring tests; legitimately skipped
# on Linux per Phase 6 / frozen decision 0.5).
run: bash scripts/ci-guards/vendor-e2e-skip-check.sh test-output.log
- name: Tear down sidecars
if: always()
run: docker compose --profile deploy-e2e -f deploy/docker-compose.test.yml down -v
# =============================================================================
# Deploy-Hardening II Phase 15 — Windows-host vendor e2e matrix
# =============================================================================
# IIS + WinCertStore tests run on windows-latest runners per frozen
# decision 0.4 (Windows containers run only on Windows hosts).
# Linux-only operators skip via //go:build integration && !no_iis.
deploy-vendor-e2e-windows:
name: deploy-vendor-e2e-windows (${{ matrix.vendor }})
runs-on: windows-latest
needs: [go-build-and-test]
strategy:
fail-fast: false
matrix:
vendor: [iis, wincertstore]
timeout-minutes: 30
steps:
- uses: actions/checkout@v5
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.25.9'
cache: true
- name: Bring up Windows IIS sidecar
shell: powershell
run: |
docker compose --profile deploy-e2e-windows -f deploy/docker-compose.test.yml up -d windows-iis-test
Start-Sleep -Seconds 10
- name: Run vendor-edge e2e (Windows)
env:
INTEGRATION: "1"
shell: powershell
run: |
$pattern = if ("${{ matrix.vendor }}" -eq "iis") { "VendorEdge_IIS" } else { "VendorEdge_WinCertStore" }
go test -tags integration -race -count=1 -run $pattern ./deploy/test/...
- name: Tear down sidecar
if: always()
shell: powershell
run: docker compose --profile deploy-e2e-windows -f deploy/docker-compose.test.yml down -v
@@ -0,0 +1,40 @@
# scripts/ci-guards/vendor-e2e-skip-allowlist.txt
#
# Test names that are EXPECTED to skip on Linux ubuntu-latest CI runners.
# Each entry: one Go test function name per line. Lines starting with `#`
# are comments / ignored. Blank lines ignored.
#
# Per ci-pipeline-cleanup bundle Phase 5 / frozen decision 0.6.
# The skip-detection guard (in the deploy-vendor-e2e job) counts
# `^--- SKIP:` lines from the test output and fails the build if it
# exceeds the count of unique entries in this allowlist.
#
# When a sidecar fails to start, the affected tests' requireSidecar() call
# triggers t.Skipf() — those skips are NOT in this allowlist and surface
# as a build failure.
# Windows-only tests that legitimately skip on Linux because the
# windows-iis-test sidecar is gated by `profiles: [deploy-e2e-windows]`
# and CI runs only the `deploy-e2e` profile (per ci-pipeline-cleanup
# Phase 6 frozen decision 0.5 — Windows matrix deletion). Operators
# validate these via `docs/connector-iis.md::Operator validation playbook`
# on a real Windows host.
# IIS connector (10 tests; require windows-iis sidecar)
TestVendorEdge_IIS_ARRReverseProxyCertRotation_E2E
TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E
TestVendorEdge_IIS_BindingTypeHttpsValidated_E2E
TestVendorEdge_IIS_CCSCentralizedCertStoreVariant_DeployToSharedStore_E2E
TestVendorEdge_IIS_FriendlyNameUpdatedOnRotation_E2E
TestVendorEdge_IIS_HTTP2ALPNPreserved_E2E
TestVendorEdge_IIS_RemovePreviousBindingOnRotate_E2E
TestVendorEdge_IIS_SNIMultiBindingPerSite_DeployUpdatesCorrectBinding_E2E
TestVendorEdge_IIS_WinRMRemotePath_vs_LocalPowerShellPath_BothWork_E2E
# WinCertStore connector (6 tests; require windows-iis sidecar)
TestVendorEdge_WinCertStore_CertStoreACL_IISIUSRSAccess_E2E
TestVendorEdge_WinCertStore_CertStoreACL_NetworkServiceAccess_E2E
TestVendorEdge_WinCertStore_PrivateKeyExportableFlag_E2E
TestVendorEdge_WinCertStore_RemovePreviousThumbprintOnRotate_E2E
TestVendorEdge_WinCertStore_StoreLocationLocalMachineVsCurrentUser_E2E
TestVendorEdge_WinCertStore_ThumbprintBindingVsFriendlyNameBinding_E2E
+65
View File
@@ -0,0 +1,65 @@
#!/usr/bin/env bash
# scripts/ci-guards/vendor-e2e-skip-check.sh
#
# Counts `^--- SKIP:` lines in the vendor-e2e test output and fails
# the build if any test skipped that's NOT in the allowlist at
# scripts/ci-guards/vendor-e2e-skip-allowlist.txt.
#
# Per ci-pipeline-cleanup bundle Phase 5 / frozen decision 0.6.
# requireSidecar() in deploy/test/vendor_e2e_helpers.go uses
# t.Skipf() when a sidecar isn't reachable. The collapsed
# deploy-vendor-e2e job brings up all 11 sidecars at once — if
# one fails to start, the affected tests skip silently. This
# guard catches that.
#
# Usage: bash scripts/ci-guards/vendor-e2e-skip-check.sh <test-output.log>
set -e
LOG="${1:-test-output.log}"
ALLOWLIST="scripts/ci-guards/vendor-e2e-skip-allowlist.txt"
if [ ! -f "$LOG" ]; then
echo "::error::test output log not found: $LOG"
exit 1
fi
if [ ! -f "$ALLOWLIST" ]; then
echo "::error::skip allowlist not found: $ALLOWLIST"
exit 1
fi
# Build the set of allowed-skip test names (strip comments + blanks).
allowed=$(grep -vE '^\s*(#|$)' "$ALLOWLIST" | sort -u)
allowed_count=$(echo "$allowed" | grep -c .)
# Extract skipped test names from `--- SKIP: TestName (0.00s)` style lines.
skipped=$(grep -E '^--- SKIP: ' "$LOG" | awk '{print $3}' | sort -u || true)
skipped_count=$(echo "$skipped" | grep -c . || true)
echo "Vendor-e2e skip-check:"
echo " allowlist size: $allowed_count"
echo " observed skips: $skipped_count"
# Find skips not in allowlist.
unexpected=$(comm -23 <(echo "$skipped") <(echo "$allowed") || true)
if [ -n "$unexpected" ]; then
echo "::error::Unexpected test skips — a sidecar likely failed to start"
echo "Unexpected skipped tests (not in $ALLOWLIST):"
echo "$unexpected" | sed 's/^/ - /'
echo ""
echo "Either:"
echo " (a) Fix the sidecar / network / docker-compose issue causing the skip, OR"
echo " (b) If the skip is legitimate (e.g., a new Windows-only test added),"
echo " add the test name to $ALLOWLIST with a one-line justification comment."
exit 1
fi
# Also flag skips beyond the allowlist count (defensive — comm -23 catches
# this already but the explicit count check makes the error message clearer).
if [ "$skipped_count" -gt "$allowed_count" ]; then
echo "::error::Skip count $skipped_count exceeds allowlist size $allowed_count"
exit 1
fi
echo "vendor-e2e-skip-check: clean ($skipped_count skips ≤ $allowed_count allowed)."