mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:21:30 +00:00
ci-pipeline-cleanup Phase 0: baseline + frozen decisions + Bundle II revisions
Bundle: ci-pipeline-cleanup, Phase 0.
Captures all 12 baseline measurements at HEAD 1de61e91 (tag v2.0.66):
- ci.yml shape (1488 lines, 53 named steps, 22 regression-guard steps)
- 4 Dockerfiles in repo
- 24/24 migration up/down balance
- 136 OpenAPI operationIds vs 149 router Register calls (13-route gap
for Phase 9 root-cause)
- 11 vendor sidecars + 1 always-on nginx in deploy/docker-compose.test.yml
- 19 status checks per push (target after cleanup: 7)
Locks the 14 Phase-0 frozen decisions in cowork/ci-pipeline-cleanup/
frozen-decisions.md. Two of them deliberately revise Bundle II
decisions:
- Decision 0.4 revises Bundle II 0.9 (vendor matrix collapse)
- Decision 0.5 revises Bundle II 0.4 (Windows IIS matrix deletion)
Both revisions are documented with rationale + preservation note in
cowork/ci-pipeline-cleanup/decisions-revised.md. Verified failure-log
evidence cited for the Windows matrix (CI run 25183374742) +
verified source-grep evidence for the t.Log-only vendor-edge tests
(115 of 116).
Two operator-on-workstation deliverables explicitly deferred to
their respective Phases:
- Live SA1019 site count (Phase 3 pre-flight)
- RAM headroom on prototype branch with collapsed vendor-e2e (Phase 5
pre-merge gate)
No code changes in this commit — Phase 0 is documentation + measurement
+ frozen-decision lock-in only.
This commit is contained in:
@@ -0,0 +1,159 @@
|
||||
# CI Pipeline Cleanup — Phase 0 Baseline
|
||||
|
||||
> Captured against repo HEAD `1de61e91cf07449356d9046a76499c86efe413b1` (operator tag `v2.0.66`) on 2026-04-30.
|
||||
> Each subsequent Phase that changes a number references this baseline.
|
||||
|
||||
## Repo state
|
||||
|
||||
**HEAD SHA:** `1de61e91cf07449356d9046a76499c86efe413b1`
|
||||
|
||||
**Operator-stamped tag:** `v2.0.66`
|
||||
|
||||
## ci.yml shape
|
||||
|
||||
- Total lines: `1488`
|
||||
- Total named steps: `53`
|
||||
- Named regression-guard steps: 22 (enumerated below)
|
||||
|
||||
### The 22 regression-guard steps
|
||||
|
||||
```
|
||||
81: - name: Forbidden auth-type literal regression guard (G-1)
|
||||
144: - name: Forbidden bare InsecureSkipVerify regression guard (L-001)
|
||||
180: - name: Forbidden bare FROM regression guard (H-001)
|
||||
201: - name: Forbidden missing USER regression guard (M-012)
|
||||
228: - name: Forbidden README JWT advertising regression guard (H-009)
|
||||
254: - name: Forbidden api_key_hash JSON-shape regression guard (G-2)
|
||||
311: - name: Forbidden plaintext HEALTHCHECK regression guard (U-2)
|
||||
360: - name: Forbidden migration mount in compose initdb (U-3)
|
||||
417: - name: Forbidden StatusBadge dead-key + TS phantom-field regression guard (D-1 + D-2)
|
||||
569: - name: Forbidden client-side bulk-action loop regression guard (L-1)
|
||||
613: - name: Forbidden orphan-CRUD client function regression guard (B-1)
|
||||
665: - name: Forbidden strings.Contains(err.Error()) regression guard (S-2)
|
||||
868: - name: QA-doc Part-count drift guard
|
||||
886: - name: QA-doc seed-count drift guard
|
||||
938: - name: Test-naming convention guard (hard-fail)
|
||||
982: - name: Forbidden hardcoded source-count prose regression guard (S-1)
|
||||
1027: - name: Documented orphan client fns sync guard (P-1)
|
||||
1063: - name: Frontend page-coverage regression guard (T-1)
|
||||
1118: - name: Bundle-8 / L-015 target=_blank rel=noopener regression guard
|
||||
1147: - name: Bundle-8 / L-019 dangerouslySetInnerHTML regression guard
|
||||
1176: - name: Bundle-8 / M-009 + M-029 Pass 1 mutation contract guard (hard zero)
|
||||
1220: - name: Forbidden env-var docs drift regression guard (G-3)
|
||||
```
|
||||
|
||||
## SA1019 site count
|
||||
|
||||
- **Operator-on-workstation deliverable** — sandbox cannot run `staticcheck`.
|
||||
- ci.yml inline comment claims "6 sites" (`middleware.NewAuth × 3`, `csr.Attributes`, `elliptic.Marshal`).
|
||||
- Source-grep at HEAD shows:
|
||||
- `internal/api/handler/scep.go`: `csr.Attributes` references present
|
||||
- `internal/connector/issuer/local/local.go`: `elliptic.Marshal` historic refs (already migrated per bundle9_coverage_test.go byte-equivalence test)
|
||||
- `cmd/server/main_test.go`: `middleware.NewAuth` references TBD
|
||||
- Operator must run `staticcheck ./... 2>&1 | grep SA1019` on workstation and update Phase 3 plan with the actual site list.
|
||||
|
||||
## Dockerfile inventory (verified 4)
|
||||
|
||||
```
|
||||
./Dockerfile.agent
|
||||
./Dockerfile
|
||||
./deploy/test/f5-mock-icontrol/Dockerfile
|
||||
./deploy/test/libest/Dockerfile
|
||||
```
|
||||
|
||||
## Migration up/down balance
|
||||
|
||||
- ups: `24`
|
||||
- downs: `24`
|
||||
- missing downs: `0`
|
||||
|
||||
## OpenAPI ↔ handler parity gap (verified)
|
||||
|
||||
- operationIds in api/openapi.yaml: `136`
|
||||
- r.Register calls in router.go: `149`
|
||||
- Gap to root-cause in Phase 9: 13 routes
|
||||
|
||||
## docker-compose.test.yml sidecars
|
||||
|
||||
```
|
||||
52: certctl-tls-init:
|
||||
107: postgres:
|
||||
135: pebble-challtestsrv:
|
||||
150: pebble:
|
||||
178: step-ca:
|
||||
213: certctl-server:
|
||||
363: nginx:
|
||||
391: certctl-agent:
|
||||
449: libest-client:
|
||||
488: apache-test:
|
||||
502: haproxy-test:
|
||||
515: traefik-test:
|
||||
533: caddy-test:
|
||||
548: envoy-test:
|
||||
562: postfix-test:
|
||||
577: dovecot-test:
|
||||
591: openssh-test:
|
||||
613: f5-mock-icontrol:
|
||||
631: k8s-kind-test:
|
||||
648: windows-iis-test:
|
||||
666: certctl-test:
|
||||
```
|
||||
|
||||
## Makefile::verify body (existing)
|
||||
|
||||
```
|
||||
verify:
|
||||
@echo "==> fmt"
|
||||
@go fmt ./... | { ! grep -q '.'; } || (echo "gofmt produced changes — commit them" && exit 1)
|
||||
@echo "==> go vet ./..."
|
||||
@go vet ./...
|
||||
@echo "==> golangci-lint run ./... (incl. staticcheck ST*)"
|
||||
@which golangci-lint > /dev/null || (echo "Installing golangci-lint..." && go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest)
|
||||
@golangci-lint run ./... --timeout 5m
|
||||
@echo "==> go test -short ./..."
|
||||
@go test -short -count=1 ./...
|
||||
@echo ""
|
||||
@echo "verify: PASS — safe to commit"
|
||||
|
||||
```
|
||||
|
||||
## RAM headroom for collapsed vendor-e2e job
|
||||
|
||||
- **Operator-on-workstation deliverable** — requires a prototype branch with the collapsed job + `docker stats` polling.
|
||||
- Per Phase 0 frozen decision 0.14: if peak RSS ≤ 12 GB on ubuntu-latest (16 GB ceiling), single-job collapse is approved.
|
||||
- If > 12 GB, fall back to bucketed-matrix design documented in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## Coverage thresholds at HEAD
|
||||
|
||||
```
|
||||
778: if [ "$(echo "$SERVICE_COV < 70" | bc -l)" -eq 1 ]; then
|
||||
779: echo "::error::Service layer coverage ${SERVICE_COV}% is below 70% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
782: if [ "$(echo "$HANDLER_COV < 75" | bc -l)" -eq 1 ]; then
|
||||
783: echo "::error::Handler layer coverage ${HANDLER_COV}% is below 75% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
786: if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
||||
787: echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
||||
790: if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
||||
791: echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
||||
802: if [ "$(echo "$CRYPTO_COV < 88" | bc -l)" -eq 1 ]; then
|
||||
803: echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 88% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
832: if [ "$(echo "$LOCAL_ISSUER_COV < 86" | bc -l)" -eq 1 ]; then
|
||||
833: echo "::error::Local-issuer coverage ${LOCAL_ISSUER_COV}% is below 86% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
842: if [ "$(echo "$ACME_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
843: echo "::error::ACME issuer coverage ${ACME_COV}% is below 80% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
846: if [ "$(echo "$STEPCA_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
847: echo "::error::StepCA issuer coverage ${STEPCA_COV}% is below 80% (Bundle L.B closure floor — add tests, do not lower the gate)"
|
||||
850: if [ "$(echo "$MCP_COV < 85" | bc -l)" -eq 1 ]; then
|
||||
851: echo "::error::MCP coverage ${MCP_COV}% is below 85% (Bundle K closure floor — add tests, do not lower the gate)"
|
||||
```
|
||||
|
||||
## CodeQL workflow (no changes)
|
||||
|
||||
- File: `.github/workflows/codeql.yml` (`81` lines)
|
||||
- Matrix: `[go, javascript-typescript]` — 2 status checks per push
|
||||
- Trigger: push to master, PR to master, weekly Sunday cron
|
||||
|
||||
## Status check accounting (verified)
|
||||
|
||||
Today: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 12 `deploy-vendor-e2e (<vendor>)` + 2 `deploy-vendor-e2e-windows (<vendor>)` + 2 `CodeQL Analyze (<lang>)` = **19 status checks per push**.
|
||||
|
||||
After cleanup: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 1 `deploy-vendor-e2e` + 1 `image-and-supply-chain` + 2 `CodeQL Analyze (<lang>)` = **7 status checks per push**.
|
||||
@@ -0,0 +1,53 @@
|
||||
# CI Pipeline Cleanup — Deliberate Revisions of Bundle II Decisions
|
||||
|
||||
This bundle deliberately revises two Bundle II frozen decisions. Both revisions are recorded here for audit trail and acknowledged in the per-Phase commits that implement them.
|
||||
|
||||
## Bundle II decision 0.4 → revised by ci-pipeline-cleanup decision 0.5
|
||||
|
||||
**Bundle II 0.4 (original):** "IIS e2e strategy — `mcr.microsoft.com/windows/servercore:ltsc2022` Windows containers via Docker Desktop on Windows hosts. Linux CI runners CAN'T run Windows containers, so the IIS e2e suite runs on a separate Windows-runner CI matrix job (or operator's local Windows host for development). Documented limitation."
|
||||
|
||||
**ci-pipeline-cleanup 0.5 (revision):** Delete the Windows-runner CI matrix entirely.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The matrix can't physically work on `windows-latest` GitHub-hosted runners today. Verified via the failure logs from CI run `25183374742` (commit `1de61e9`):
|
||||
- `wincertstore` job: `error during connect: ... open //./pipe/docker_engine: The system cannot find the file specified` — Docker daemon not started in Windows-containers mode.
|
||||
- `iis` job: image pulled successfully (so the new digest is correct), then died at `failed to create network deploy_certctl-test: could not find plugin bridge in v1 plugin registry: plugin not found` — `bridge` network driver doesn't exist on Windows Docker (uses `nat`).
|
||||
|
||||
2. Even if both Docker-daemon and network-driver issues were fixed, the matrix would validate nothing of substance. Verified by source-grep: all 16 functions matching `TestVendorEdge_(IIS|WinCertStore)_*` in `deploy/test/vendor_e2e_phase3_to_13_test.go` are `t.Log` placeholders that exercise no IIS-specific behavior. The real IIS connector validation lives in `internal/connector/target/iis/` unit tests (run on Linux in `go-build-and-test` — already green per push).
|
||||
|
||||
3. Bundle II decision 0.14 explicitly required operator manual smoke against a real instance for "verified" status in the vendor matrix. Moving IIS + WinCertStore validation to a documented operator playbook in `docs/connector-iis.md` satisfies that criterion better than a fake CI matrix that passes by skipping.
|
||||
|
||||
**Preservation:** the `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` under `profiles: [deploy-e2e-windows]` — operators on a Windows host can opt in via `docker compose --profile deploy-e2e-windows up -d windows-iis-test`. Linux CI never activates this profile.
|
||||
|
||||
## Bundle II decision 0.9 → revised by ci-pipeline-cleanup decision 0.4
|
||||
|
||||
**Bundle II 0.9 (original):** "CI parallelism — Each vendor e2e gets its own GitHub Actions matrix job. Vendor failures surface independently in the CI status check (operator sees 'K8s 1.31 vendor-edge fail' as a discrete check, not a generic 'integration tests failed')."
|
||||
|
||||
**ci-pipeline-cleanup 0.4 (revision):** Single `deploy-vendor-e2e` job replaces the 12-job matrix; per-vendor visibility partially restored via skip-detection guard messages.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The per-vendor granularity Bundle II decision 0.9 was designed to provide is fake signal. Verified by source-analysis at HEAD:
|
||||
```
|
||||
$ grep -cE 't\.Log\(' deploy/test/{vendor_e2e_phase3_to_13,nginx_vendor_e2e}_test.go
|
||||
deploy/test/nginx_vendor_e2e_test.go:9
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go:106
|
||||
|
||||
$ awk '/^func TestVendorEdge_/{in_test=1; name=$2; has_assert=0; next}
|
||||
in_test && /^}$/ {if (has_assert) print name; in_test=0}
|
||||
in_test && /t\.(Fatal|Error|Errorf|Fatalf|Fail|Failf)/ {has_assert=1}' \
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go deploy/test/nginx_vendor_e2e_test.go
|
||||
TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E
|
||||
```
|
||||
115 of 116 vendor-edge test functions are `t.Log`-only — they spin up a sidecar, log a one-line description of the vendor quirk, and return. Only 1 has a real assertion.
|
||||
|
||||
2. Per-vendor status-check granularity costs ~9 sec setup overhead × 12 jobs = ~108 sec of pure runner waste per push (verified from CI run `25183374742` job timings).
|
||||
|
||||
3. The single-job version partially restores per-vendor visibility via the skip-detection guard (decision 0.6): if a sidecar fails to start, the affected tests' SKIP names print in the CI output and the build fails. Operators see "TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E SKIPPED: vendor sidecar 'k8s-kind' not reachable" — same per-vendor signal, just no longer rendered as a separate status-check row.
|
||||
|
||||
**Preservation:** the per-test discoverability via `go test -run 'VendorEdge_<vendor>'` (Bundle II frozen decision 0.6) is unchanged. Only the matrix-jobs-per-vendor part of decision 0.9 is revised; the per-test naming convention stays.
|
||||
|
||||
## Forward-looking note
|
||||
|
||||
Both revisions are limited in scope to CI execution shape — they do NOT delete the test files, the sidecar definitions, or the documentation that Bundle II shipped. Future work could re-introduce per-vendor matrix jobs if test bodies are filled in with real assertions (transforming the t.Log placeholders into actual contract pins). At that point, decision 0.4 + 0.9 should be re-evaluated.
|
||||
@@ -0,0 +1,64 @@
|
||||
# CI Pipeline Cleanup — Frozen Decisions
|
||||
|
||||
> 14 frozen decisions confirmed at Phase 0. Each subsequent Phase references the decision number it implements.
|
||||
|
||||
## 0.1 — Trigger model
|
||||
|
||||
Three-tier split, no mixing:
|
||||
- **On push/PR to master:** blocking, fast, every check earns its keep, target <10 min wall-clock.
|
||||
- **Daily cron + workflow_dispatch:** `security-deep-scan.yml` as-is; slow scans, best-effort, never blocks.
|
||||
- **On tag push (`v*`):** `release.yml` as-is; cross-platform binaries, ghcr.io push, SLSA provenance.
|
||||
|
||||
## 0.2 — Extracted-script location
|
||||
|
||||
`scripts/ci-guards/` at repo root. Operator runs `bash scripts/ci-guards/<id>.sh` locally. Contract documented in `scripts/ci-guards/README.md`.
|
||||
|
||||
## 0.3 — Coverage threshold YAML format
|
||||
|
||||
`.github/coverage-thresholds.yml`. Top-level keys are package paths; each entry has `floor:` (integer pct) + `why:` (multi-line string for load-bearing context). Bash step uses Python (already on the runner) to read the YAML — no `yq` dependency.
|
||||
|
||||
## 0.4 — Vendor matrix collapse policy (REVISES Bundle II decision 0.9)
|
||||
|
||||
Single `deploy-vendor-e2e` job replaces 12-job matrix. Bundle II decision 0.9 said "Each vendor e2e gets its own GitHub Actions matrix job" — this revision recognizes that 115/116 vendor-edge tests are `t.Log` placeholders, so per-vendor status-check granularity is fake signal. Skip-detection guard partially restores per-vendor visibility (SKIP messages name the vendor). Documented as deliberate revision in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## 0.5 — Windows IIS validation deletion (REVISES Bundle II decision 0.4)
|
||||
|
||||
Delete `deploy-vendor-e2e-windows` matrix entirely. Bundle II decision 0.4 said "the IIS e2e suite runs on a separate Windows-runner CI matrix job" — this revision recognizes that (a) the matrix can't physically work on `windows-latest` (Docker not started in Windows-containers mode; `bridge` driver missing on Windows Docker), and (b) all 16 IIS + WinCertStore tests are `t.Log` placeholders. Move validation to `docs/connector-iis.md::Operator validation playbook` per Bundle II decision 0.14's third criterion. The `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` for operator local use.
|
||||
|
||||
## 0.6 — Skip-detection guard semantics + EXPECTED_SKIPS allowlist
|
||||
|
||||
After `go test -tags integration -run 'VendorEdge_'`, count `^--- SKIP:` lines. Allowlist: 6 JavaKeystore tests in `vendor_e2e_phase3_to_13_test.go` that legitimately t.Log without sidecar. Allowlist file at `scripts/ci-guards/vendor-e2e-skip-allowlist.txt`, one test name per line.
|
||||
|
||||
## 0.7 — SA1019 closure approach
|
||||
|
||||
Close each site individually with byte-equivalence tests where the deprecated API was load-bearing. Then flip `continue-on-error: true` → `false` in the SAME commit. Do NOT split — shipping the gate without closing sites would fail CI on master. Live verification: `staticcheck ./... 2>&1 | grep -c SA1019` returns 0 BEFORE flipping the gate.
|
||||
|
||||
## 0.8 — Image-and-supply-chain placement
|
||||
|
||||
Separate top-level job (not steps in `go-build-and-test`). Two reasons: (a) digest-validity needs network egress to multiple registries (Docker Hub, ghcr.io, mcr.microsoft.com), bundling into go-build blocks Go tests on registry latency. (b) `docker build` is parallel to Go tests; isolating lets it run concurrently.
|
||||
|
||||
## 0.9 — Coverage PR-comment provider
|
||||
|
||||
Default: lightweight self-hosted action that posts a per-PR comment via `gh pr comment`. Avoids paid SaaS. Operator can swap to Codecov/Coveralls later.
|
||||
|
||||
## 0.10 — Docker build smoke scope
|
||||
|
||||
Build all 4 Dockerfiles in the repo: `Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`. The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a syntax error there silently breaks the e2e suite. Tagged `:smoke` and discarded.
|
||||
|
||||
## 0.11 — OpenAPI ↔ handler parity exception YAML
|
||||
|
||||
NEW `api/openapi-handler-exceptions.yaml`. Schema: `documented_exceptions:` list of `{route, why}` entries. The 13-route gap at HEAD is root-caused in Phase 9; most are likely health probes / metrics / SCEP-EST-OCSP wire endpoints that legitimately have no operationId.
|
||||
|
||||
## 0.12 — Branch-protection-rule update timing
|
||||
|
||||
Operator updates GitHub branch-protection rules in Phase 13 AFTER the new pipeline ships and runs green on a feature branch + on the first push to master. Required-checks list changes from 19 → 7 entries. Operator action only — agent cannot do this.
|
||||
|
||||
## 0.13 — Make-target naming for new operator-side scripts
|
||||
|
||||
- `make verify` (existing) — required pre-commit; gofmt + vet + lint + tests
|
||||
- `make verify-deploy` (new) — optional pre-push; digest-validity + OpenAPI parity + docker build smoke (server + agent only — fast subset for local)
|
||||
- `make verify-docs` (new) — required pre-tag; QA-doc Part-count + seed-count drift
|
||||
|
||||
## 0.14 — RAM headroom verification methodology
|
||||
|
||||
Phase 0 deliverable. Operator creates `prototype/ci-pipeline-cleanup-vendor-collapse` branch, runs the collapsed `deploy-vendor-e2e` job once, captures peak RSS via `docker stats --no-stream` snapshots every 30 sec, records max in this baseline doc. If max > 12 GB (75% of 16 GB ceiling), fall back to bucketed matrix (3 jobs × ~4 sidecars). If max ≤ 12 GB, single-job collapse is approved.
|
||||
Reference in New Issue
Block a user