mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 23:11:32 +00:00
2643a427ac
CI run #376 (commit a1c7741, Frontend Build job) failed with:
digest does not resolve: mcr.microsoft.com/windows/servercore/iis:
windowsservercore-ltsc2022@sha256:8d0b0e651ad514e3fb05978db66f38036
118812e1b9314a48f10419cad8a3462
A re-run with no code changes went green. The digest itself is fine —
verified against MCR directly (HTTP 200 from
mcr.microsoft.com/v2/windows/servercore/iis/manifests/sha256:8d0b...),
and the tag `:windowsservercore-ltsc2022` currently resolves to that
exact digest. Microsoft hasn't rotated.
Root cause is registry-side rate-limiting. MCR throttles unauthenticated
GET-by-digest requests by source IP. GitHub-hosted runners share a small
pool of egress IPs across many users; bursts trip the throttle and
return non-200. Re-run = different runner = different IP = throttle
window has reset = pass. This will recur on roughly N% of pushes
indefinitely, until either (a) Microsoft loosens MCR rate limits, (b)
GitHub buys more runner IPs, or (c) we stop verifying digests CI doesn't
actually use.
The deeper issue is structural, not transient. The Windows IIS image is
gated behind compose `profiles: [deploy-e2e-windows]`
(deploy/docker-compose.test.yml:700). The comment block above the
service definition (lines 675-691) explicitly says "Linux CI never
activates this profile." All 10 TestVendorEdge_IIS_*_E2E tests are on
scripts/vendor-e2e-skip-allowlist.txt because the sidecar is never
started. The whole Windows matrix was DELETED in ci-pipeline-cleanup
Phase 6 / frozen decision 0.5 (revising Bundle II decision 0.4); IIS
validation moved to docs/connector-iis.md::Operator validation playbook.
So `digest-validity.sh` is verifying a digest that no CI job ever pulls
— paying CI brittleness against MCR rate-limiting we can't control, for
an image whose only purpose in compose is documentation for an
operator's manual workflow on a real Windows host.
The fix matches the guard's stated purpose ("every digest CI actually
depends on is valid"): exclude images CI never pulls.
Implementation. Add an EXCLUDED_PATTERNS array near the top of the
script with one entry — the IIS image path
`mcr.microsoft.com/windows/servercore/iis` — and a comment block above
it documenting:
- WHY it's excluded (gated profile, never started, all tests on
skip-allowlist)
- WHEN it would need re-inclusion (if a Windows CI runner is added
that actually starts the sidecar)
- WHAT this list is NOT for (transient flake silencing — that gets
fixed via retry logic in the script, not via exclusion)
The match is by image-path substring, not by digest, so future tag/
digest updates of the same image still hit the exclusion without
needing this list to be re-edited.
Loop logic gains a 6-line check that runs the exclusion match before
any registry work. Excluded refs log as "SKIP (excluded) <ref>" so
operator-facing CI logs stay informative — at a glance you can see
which digests were verified vs which were intentionally not.
The success message updates to differentiate verified vs excluded
counts: "digest-validity: clean — N verified, M excluded (CI never
pulls)" when M > 0; original message preserved when M == 0.
Verified manually:
- Clean repo: 15 verified, 1 excluded, exit 0.
- Fabricated bogus httpd digest: ::error:: emitted for the bad
digest, IIS still SKIP-excluded, exit 1. (Real regressions still
caught.)
- Restore: 15 verified, 1 excluded, exit 0 again.
Other recurring MCR-hosted images would warrant the same treatment if
they get added later. The exclusion list pattern scales: each new entry
needs its own "WHY this is doc-only" justification block.
What this is NOT:
- Not a generic flake-silencer. The exclusion is justified by the
image being doc-only, not by the test being noisy.
- Not a global retry/resilience layer. If MCR rate-limits an image CI
DOES pull, that's a real CI dependency on an unreliable external
service — fix by retry-with-backoff, not by excluding.
163 lines
6.1 KiB
Bash
Executable File
163 lines
6.1 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# scripts/ci-guards/digest-validity.sh
|
|
#
|
|
# Verify every @sha256:<digest> reference in deploy/**/*.{yml,Dockerfile*}
|
|
# actually resolves on its registry. H-001 only checks for digest
|
|
# presence; this catches fabricated or stale digests.
|
|
#
|
|
# Per ci-pipeline-cleanup bundle Phase 7. The bug class this catches:
|
|
# Bundle II shipped 11 fabricated digests that passed H-001's
|
|
# regex-only check and failed `docker pull` in CI.
|
|
#
|
|
# Real registries supported:
|
|
# - Docker Hub library/* and non-library (auth.docker.io)
|
|
# - ghcr.io (lscr.io alias for linuxserver/*)
|
|
# - mcr.microsoft.com (no auth required for public images;
|
|
# Windows IIS image needs the manifest.v2 single-image digest,
|
|
# not the multi-arch list digest)
|
|
|
|
set -e
|
|
|
|
# Find every digest reference in compose files + Dockerfiles
|
|
mapfile -t REFS < <(
|
|
grep -rEho '[a-z0-9./-]+:[a-z0-9.-]+@sha256:[a-f0-9]{64}' \
|
|
deploy/ Dockerfile* deploy/test/*/Dockerfile 2>/dev/null \
|
|
| sort -u
|
|
)
|
|
|
|
if [ ${#REFS[@]} -eq 0 ]; then
|
|
echo "No @sha256 refs found — nothing to verify."
|
|
exit 0
|
|
fi
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Excluded refs — digests for images CI never pulls.
|
|
# ---------------------------------------------------------------------------
|
|
# The guard's purpose is "every digest CI actually depends on is valid."
|
|
# Images that exist in compose only as documentation for an operator's
|
|
# manual workflow (e.g., Windows containers we cannot start on Linux
|
|
# runners) shouldn't add CI brittleness against external-registry
|
|
# rate-limiting we don't control.
|
|
#
|
|
# Each entry below is a substring matched against the full ref line
|
|
# (`<image>:<tag>@sha256:<digest>`). When a ref matches, it is logged as
|
|
# `SKIP (excluded)` and the loop continues. The match is by image-path
|
|
# substring, not by digest, so a future tag/digest update still excludes
|
|
# the right image without needing this list to be re-edited.
|
|
#
|
|
# Add an entry only with a documented reason in the comment block above
|
|
# the entry. This list is NOT a place to silence transient flakes — those
|
|
# get fixed by retries in the script itself, not by exclusion.
|
|
EXCLUDED_PATTERNS=(
|
|
# mcr.microsoft.com/windows/servercore/iis
|
|
# Windows-only image gated behind compose profiles=[deploy-e2e-windows]
|
|
# (deploy/docker-compose.test.yml:700). Linux CI runners cannot start
|
|
# the windows-iis-test sidecar — the entire Windows matrix was deleted
|
|
# per ci-pipeline-cleanup Phase 6 / frozen decision 0.5, and IIS
|
|
# validation moved to docs/connector-iis.md::Operator validation
|
|
# playbook. All 10 TestVendorEdge_IIS_*_E2E tests are on
|
|
# scripts/vendor-e2e-skip-allowlist.txt for the same reason.
|
|
#
|
|
# Without this exclusion, Linux CI runners HEAD this digest from MCR
|
|
# on every push. MCR rate-limits unauthenticated requests by source IP;
|
|
# GitHub-hosted runner IPs are heavily reused across users; the result
|
|
# is ~one transient 4xx/5xx every N runs (CI run #376 hit it). Re-runs
|
|
# pass because runner IPs rotate. The image itself is fine — we just
|
|
# don't need Linux CI to verify it.
|
|
"mcr.microsoft.com/windows/servercore/iis"
|
|
)
|
|
|
|
fail=0
|
|
verified=0
|
|
skipped=0
|
|
for ref in "${REFS[@]}"; do
|
|
# Apply exclusion list before any work on the ref.
|
|
excluded=0
|
|
for pat in "${EXCLUDED_PATTERNS[@]}"; do
|
|
if [[ "$ref" == *"$pat"* ]]; then
|
|
echo "SKIP (excluded) $ref"
|
|
excluded=1
|
|
skipped=$((skipped + 1))
|
|
break
|
|
fi
|
|
done
|
|
if [ "$excluded" -eq 1 ]; then
|
|
continue
|
|
fi
|
|
|
|
digest="${ref##*@}"
|
|
imgtag="${ref%@*}"
|
|
tag="${imgtag##*:}"
|
|
img="${imgtag%:*}"
|
|
|
|
# Determine registry + auth flow.
|
|
if [[ "$img" =~ ^lscr\.io/ ]]; then
|
|
img="${img#lscr.io/}"
|
|
registry="ghcr.io"
|
|
auth_url="https://ghcr.io/token?scope=repository:${img}:pull"
|
|
elif [[ "$img" =~ ^mcr\.microsoft\.com/ ]]; then
|
|
img="${img#mcr.microsoft.com/}"
|
|
registry="mcr.microsoft.com"
|
|
auth_url=""
|
|
elif [[ "$img" == */* ]]; then
|
|
# Non-library Docker Hub (e.g., envoyproxy/envoy, boky/postfix)
|
|
registry="registry-1.docker.io"
|
|
auth_url="https://auth.docker.io/token?service=registry.docker.io&scope=repository:${img}:pull"
|
|
else
|
|
# Library Docker Hub (e.g., httpd, golang)
|
|
img="library/$img"
|
|
registry="registry-1.docker.io"
|
|
auth_url="https://auth.docker.io/token?service=registry.docker.io&scope=repository:${img}:pull"
|
|
fi
|
|
|
|
# Get auth token if needed.
|
|
auth_header=""
|
|
if [ -n "$auth_url" ]; then
|
|
tok=$(curl -sS "$auth_url" | python3 -c "import sys,json; print(json.load(sys.stdin)['token'])" 2>/dev/null)
|
|
if [ -z "$tok" ]; then
|
|
echo "::error::Failed to get auth token for $registry / $img"
|
|
fail=1
|
|
continue
|
|
fi
|
|
auth_header="Authorization: Bearer $tok"
|
|
fi
|
|
|
|
# HEAD the manifest by digest.
|
|
if [ -n "$auth_header" ]; then
|
|
code=$(curl -sS -o /dev/null -w "%{http_code}" \
|
|
-H "$auth_header" \
|
|
-H "Accept: application/vnd.oci.image.index.v1+json" \
|
|
-H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
|
|
-H "Accept: application/vnd.oci.image.manifest.v1+json" \
|
|
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
|
|
"https://${registry}/v2/${img}/manifests/${digest}")
|
|
else
|
|
code=$(curl -sS -o /dev/null -w "%{http_code}" \
|
|
-H "Accept: application/vnd.oci.image.index.v1+json" \
|
|
-H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
|
|
-H "Accept: application/vnd.oci.image.manifest.v1+json" \
|
|
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
|
|
"https://${registry}/v2/${img}/manifests/${digest}")
|
|
fi
|
|
|
|
if [ "$code" != "200" ]; then
|
|
echo "::error::digest does not resolve: ${ref}"
|
|
echo " registry: $registry"
|
|
echo " image: $img"
|
|
echo " digest: $digest"
|
|
echo " HTTP: $code"
|
|
fail=1
|
|
else
|
|
echo "OK $ref"
|
|
verified=$((verified + 1))
|
|
fi
|
|
done
|
|
|
|
[ $fail -eq 0 ] || exit 1
|
|
echo ""
|
|
if [ "$skipped" -gt 0 ]; then
|
|
echo "digest-validity: clean — ${verified} verified, ${skipped} excluded (CI never pulls)."
|
|
else
|
|
echo "digest-validity: clean — all ${verified} digest references resolve."
|
|
fi
|