certctl

gsadmin/certctl

Fork 0

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 14:51:30 +00:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
shankar0123	c4157fd196	fix(deploy/test) + ci(guard): unblock deploy-vendor-e2e — encryption-key length Two-part complete-path fix for the deploy-vendor-e2e failure that has been firing since the ci-pipeline-cleanup Phase 5 matrix collapse started actually booting the certctl-test-server: Failed to load configuration: CERTCTL_CONFIG_ENCRYPTION_KEY too short (29 bytes; minimum 32). Surfaced via the diagnostic-dump step landed in commit `3b96b35` — the server panicked on startup, Docker restarted it endlessly, compose reported the dependency-chain symptom ("container certctl-test-server is unhealthy"), but the actual cause was invisible in the previous CI output. With the dump in place, the next failing run named the problem in one line. Root cause. The H-1 audit-closure master commit `3e78ecb` ("feat(security): bodyLimit on noAuth + security headers + encryption- key validation (H-1 master)") added internal/config/config.go's minEncryptionKeyLength = 32 byte floor + 5 unit tests that pin it. The closure was incomplete: it never enforced the rule against the literal CERTCTL_CONFIG_ENCRYPTION_KEY values certctl's own deploy/docker-compose.yml files pass. Pre-Phase-5 the test stack didn't fully exercise the validator (the per-vendor matrix didn't boot certctl-test-server in every job), so the gap was silent. deploy/docker-compose.test.yml's literal value `test-encryption-key-32chars!!` was 29 bytes — the name claimed 32 but the author miscounted (4+1+10+1+3+1+2+5+2 = 29). Pattern matches every fix in this CI-stabilization sequence: pre-existing latent bug that the old CI structurally hid. Part 1 — direct fix (deploy/docker-compose.test.yml): Replace the 29-byte literal with a clearly test-only, self-documenting 49-byte value (`test-encryption-key-deterministic- 32-byte-fixture`). 17 bytes of safety margin so a future tightening of the floor (32 → 33+) doesn't break this fixture again. Inline comment block explains the byte-budget contract + points at the H-1 closure commit. Production deploy/docker-compose.yml's default (`change-me-32-char-encryption-key`) is exactly 32 bytes — passes by 1 byte but on the edge; not touched here because operators are already told to override it via env (`${VAR:-default}`). Part 2 — structural fix (scripts/ci-guards/H-1-encryption-key-min- length.sh): New regression guard. Scans every deploy/docker-compose.yml for literal CERTCTL_CONFIG_ENCRYPTION_KEY values + values inside ${VAR:-default} expansions, checks each against the 32-byte floor, fails CI with `::error::` annotation pointing at the offending file:line if any literal regresses. Bare ${VAR} env references with no default are skipped — those are operator-supplied at runtime and the validator handles them at boot. Verified manually: - Clean repo: `H-1-encryption-key-min-length: clean.` (exit 0) - 5-byte regression: emits proper ::error:: annotation, exit 1 - Restore: clean again (exit 0) CI auto-picks up the new guard via the `for g in scripts/ci-guards/*.sh; do bash "$g"; done` loop in ci.yml's Regression guards step (no ci.yml change required). scripts/ci-guards/README.md updated: 20 → 21 guards, new row explaining the closure rationale. The structural piece is the more important half of this fix. The direct fix unblocks today's CI; the guard prevents the same class of drift from ever recurring silently. Future audit closures that add new validation rules to internal/config/config.go now have a working template for the matching CI guard — drop a sibling .sh in the ci-guards directory. Bonus — what the diagnostic-dump step (`3b96b35`) bought us. Before that step landed, the same failure looked like an opaque "container unhealthy" with no actionable signal. With it, the actual error message + the offending env var + the exact byte count came out in one CI run. The diagnostic infrastructure paid for itself within one push.	2026-05-01 00:57:43 +00:00
shankar0123	7b8cadcd02	refactor(scripts): move CI helpers out of scripts/ci-guards/ The 'Regression guards' loop step in ci.yml runs: for g in scripts/ci-guards/.sh; do bash "$g"; done Per the directory's own contract (scripts/ci-guards/README.md), every script there MUST be runnable bare with no args / no env. Three files violated that contract — they're helpers consumed by specific CI job steps with arguments, not regression guards. They were misplaced. Moved (git mv): scripts/ci-guards/vendor-e2e-skip-check.sh → scripts/ scripts/ci-guards/vendor-e2e-skip-allowlist.txt → scripts/ scripts/ci-guards/coverage-pr-comment.sh → scripts/ Updated ci.yml call sites: - deploy-vendor-e2e job: bash scripts/vendor-e2e-skip-check.sh $LOG - go-build-and-test job: bash scripts/coverage-pr-comment.sh Tightened scripts/vendor-e2e-skip-check.sh arg parse from a silent default ('LOG=${1:-test-output.log}') to a mandatory-arg form ('LOG=${1:?usage: ...}') so misuse fails loud at parse time rather than at the missing-file check. Updated scripts/ci-guards/README.md contract to spell out the guard-vs-helper distinction explicitly; lists current helpers under scripts/ for future-author guidance. Verified locally: 'for g in scripts/ci-guards/.sh; do bash $g; done' returns clean (22 guards pass) on HEAD post-move. Closes the regression-guards-loop failure that surfaced in CI run 25192163943 (job 73864471346 'Frontend Build').	2026-04-30 22:37:12 +00:00
shankar0123	1caedd5fd3	ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/ Bundle: ci-pipeline-cleanup, Phase 1. Pure relocation — no behavior change. Each guard's bash logic is byte-identical to the prior inline version; the only changes are: (a) the guard becomes a sibling script under scripts/ci-guards/<id>.sh, (b) ci.yml's per-guard step is replaced by a single loop step that iterates all scripts. 20 scripts extracted (alphabetized): B-1-orphan-crud.sh, D-1-D-2-statusbadge-phantom.sh, G-1-jwt-auth-literal.sh, G-2-api-key-hash-json.sh, G-3-env-docs-drift.sh, H-001-bare-from.sh, H-009-readme-jwt.sh, L-001-insecure-skip-verify.sh, L-1-bulk-action-loop.sh, M-012-no-root-user.sh, P-1-documented-orphan-fns.sh, S-1-hardcoded-source-counts.sh, S-2-strings-contains-err.sh, T-1-frontend-page-coverage.sh, U-2-plaintext-healthcheck.sh, U-3-migration-mount.sh, bundle-8-L-015-target-blank-rel-noopener.sh, bundle-8-L-019-dangerously-set-inner-html.sh, bundle-8-M-009-bare-usemutation.sh, test-naming-convention.sh Plus scripts/ci-guards/README.md documenting the contract: - Each script must exit 0 on clean repo, non-zero with ::error:: prefix on regression - Runnable from repo root via 'bash scripts/ci-guards/<id>.sh' - Adding a new guard: drop a new <id>.sh; CI auto-picks it up ci.yml dropped 1488 → 557 lines (-931, -63%). Single CI loop step now collects ALL guard failures before failing the build instead of fail-fast — UX win for regressions that hit two guards at once. Two guards (QA-doc Part-count + seed-count, ci.yml lines 868-917) deliberately NOT extracted — they move to 'make verify-docs' in Phase 11 because they protect docs-the-operator-reads, not the product itself. Verification (sandbox): - All 20 scripts pass against HEAD (chmod +x; for g in scripts/ci-guards/*.sh; do bash $g; done) - New ci.yml YAML-parses cleanly - Job boundaries preserved: go-build-and-test, frontend-build, helm-lint, deploy-vendor-e2e, deploy-vendor-e2e-windows - Loop step appears twice (once at end of go-build-and-test, once at end of frontend-build) so both jobs continue running their set of guards	2026-04-30 20:36:26 +00:00

shankar0123

c4157fd196

fix(deploy/test) + ci(guard): unblock deploy-vendor-e2e — encryption-key length

Two-part complete-path fix for the deploy-vendor-e2e failure that has
been firing since the ci-pipeline-cleanup Phase 5 matrix collapse
started actually booting the certctl-test-server:

    Failed to load configuration:
    CERTCTL_CONFIG_ENCRYPTION_KEY too short (29 bytes; minimum 32).

Surfaced via the diagnostic-dump step landed in commit 3b96b35 — the
server panicked on startup, Docker restarted it endlessly, compose
reported the dependency-chain symptom ("container certctl-test-server
is unhealthy"), but the actual cause was invisible in the previous
CI output. With the dump in place, the next failing run named the
problem in one line.

Root cause. The H-1 audit-closure master commit 3e78ecb
("feat(security): bodyLimit on noAuth + security headers + encryption-
key validation (H-1 master)") added internal/config/config.go's
minEncryptionKeyLength = 32 byte floor + 5 unit tests that pin it.
The closure was incomplete: it never enforced the rule against the
literal CERTCTL_CONFIG_ENCRYPTION_KEY values certctl's own
deploy/docker-compose*.yml files pass. Pre-Phase-5 the test stack
didn't fully exercise the validator (the per-vendor matrix didn't
boot certctl-test-server in every job), so the gap was silent.
deploy/docker-compose.test.yml's literal value
`test-encryption-key-32chars!!` was 29 bytes — the name claimed 32
but the author miscounted (4+1+10+1+3+1+2+5+2 = 29). Pattern matches
every fix in this CI-stabilization sequence: pre-existing latent bug
that the old CI structurally hid.

Part 1 — direct fix (deploy/docker-compose.test.yml):

  Replace the 29-byte literal with a clearly test-only,
  self-documenting 49-byte value (`test-encryption-key-deterministic-
  32-byte-fixture`). 17 bytes of safety margin so a future tightening
  of the floor (32 → 33+) doesn't break this fixture again. Inline
  comment block explains the byte-budget contract + points at the
  H-1 closure commit. Production deploy/docker-compose.yml's default
  (`change-me-32-char-encryption-key`) is exactly 32 bytes — passes
  by 1 byte but on the edge; not touched here because operators are
  already told to override it via env (`${VAR:-default}`).

Part 2 — structural fix (scripts/ci-guards/H-1-encryption-key-min-
length.sh):

  New regression guard. Scans every deploy/docker-compose*.yml for
  literal CERTCTL_CONFIG_ENCRYPTION_KEY values + values inside
  ${VAR:-default} expansions, checks each against the 32-byte floor,
  fails CI with `::error::` annotation pointing at the offending
  file:line if any literal regresses. Bare ${VAR} env references with
  no default are skipped — those are operator-supplied at runtime
  and the validator handles them at boot.

  Verified manually:
    - Clean repo: `H-1-encryption-key-min-length: clean.` (exit 0)
    - 5-byte regression: emits proper ::error:: annotation, exit 1
    - Restore: clean again (exit 0)

  CI auto-picks up the new guard via the `for g in
  scripts/ci-guards/*.sh; do bash "$g"; done` loop in ci.yml's
  Regression guards step (no ci.yml change required).

  scripts/ci-guards/README.md updated: 20 → 21 guards, new row
  explaining the closure rationale.

The structural piece is the more important half of this fix. The
direct fix unblocks today's CI; the guard prevents the same class of
drift from ever recurring silently. Future audit closures that add
new validation rules to internal/config/config.go now have a working
template for the matching CI guard — drop a sibling .sh in the
ci-guards directory.

Bonus — what the diagnostic-dump step (3b96b35) bought us. Before
that step landed, the same failure looked like an opaque "container
unhealthy" with no actionable signal. With it, the actual error
message + the offending env var + the exact byte count came out in
one CI run. The diagnostic infrastructure paid for itself within one
push.

2026-05-01 00:57:43 +00:00

shankar0123

7b8cadcd02

refactor(scripts): move CI helpers out of scripts/ci-guards/

The 'Regression guards' loop step in ci.yml runs:
    for g in scripts/ci-guards/*.sh; do bash "$g"; done

Per the directory's own contract (scripts/ci-guards/README.md), every
script there MUST be runnable bare with no args / no env. Three files
violated that contract — they're helpers consumed by specific CI job
steps with arguments, not regression guards. They were misplaced.

Moved (git mv):
  scripts/ci-guards/vendor-e2e-skip-check.sh         → scripts/
  scripts/ci-guards/vendor-e2e-skip-allowlist.txt    → scripts/
  scripts/ci-guards/coverage-pr-comment.sh           → scripts/

Updated ci.yml call sites:
  - deploy-vendor-e2e job: bash scripts/vendor-e2e-skip-check.sh $LOG
  - go-build-and-test job: bash scripts/coverage-pr-comment.sh

Tightened scripts/vendor-e2e-skip-check.sh arg parse from a silent
default ('LOG=${1:-test-output.log}') to a mandatory-arg form
('LOG=${1:?usage: ...}') so misuse fails loud at parse time rather
than at the missing-file check.

Updated scripts/ci-guards/README.md contract to spell out the
guard-vs-helper distinction explicitly; lists current helpers under
scripts/ for future-author guidance.

Verified locally: 'for g in scripts/ci-guards/*.sh; do bash $g; done'
returns clean (22 guards pass) on HEAD post-move.

Closes the regression-guards-loop failure that surfaced in CI run
25192163943 (job 73864471346 'Frontend Build').

2026-04-30 22:37:12 +00:00

shankar0123

1caedd5fd3

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

Bundle: ci-pipeline-cleanup, Phase 1.

Pure relocation — no behavior change. Each guard's bash logic is
byte-identical to the prior inline version; the only changes are:
(a) the guard becomes a sibling script under scripts/ci-guards/<id>.sh,
(b) ci.yml's per-guard step is replaced by a single loop step that
iterates all scripts.

20 scripts extracted (alphabetized):
  B-1-orphan-crud.sh, D-1-D-2-statusbadge-phantom.sh,
  G-1-jwt-auth-literal.sh, G-2-api-key-hash-json.sh,
  G-3-env-docs-drift.sh, H-001-bare-from.sh, H-009-readme-jwt.sh,
  L-001-insecure-skip-verify.sh, L-1-bulk-action-loop.sh,
  M-012-no-root-user.sh, P-1-documented-orphan-fns.sh,
  S-1-hardcoded-source-counts.sh, S-2-strings-contains-err.sh,
  T-1-frontend-page-coverage.sh, U-2-plaintext-healthcheck.sh,
  U-3-migration-mount.sh, bundle-8-L-015-target-blank-rel-noopener.sh,
  bundle-8-L-019-dangerously-set-inner-html.sh,
  bundle-8-M-009-bare-usemutation.sh, test-naming-convention.sh

Plus scripts/ci-guards/README.md documenting the contract:
- Each script must exit 0 on clean repo, non-zero with ::error::
  prefix on regression
- Runnable from repo root via 'bash scripts/ci-guards/<id>.sh'
- Adding a new guard: drop a new <id>.sh; CI auto-picks it up

ci.yml dropped 1488 → 557 lines (-931, -63%).

Single CI loop step now collects ALL guard failures before failing
the build instead of fail-fast — UX win for regressions that hit
two guards at once.

Two guards (QA-doc Part-count + seed-count, ci.yml lines 868-917)
deliberately NOT extracted — they move to 'make verify-docs' in
Phase 11 because they protect docs-the-operator-reads, not the
product itself.

Verification (sandbox):
- All 20 scripts pass against HEAD (chmod +x; for g in scripts/ci-guards/*.sh; do bash $g; done)
- New ci.yml YAML-parses cleanly
- Job boundaries preserved: go-build-and-test, frontend-build,
  helm-lint, deploy-vendor-e2e, deploy-vendor-e2e-windows
- Loop step appears twice (once at end of go-build-and-test, once
  at end of frontend-build) so both jobs continue running their
  set of guards

2026-04-30 20:36:26 +00:00

3 Commits