mirror of https://github.com/shankar0123/certctl.git synced 2026-07-26 14:48:13 +00:00

Files

T

shankar0123 8191b1ee64 scheduler+db: close Phase 6 — scale hardening across pool, jitter, ETag, asyncpoll

Phase 6 of the certctl architecture diligence remediation. Five
findings across the same scheduler-and-DB-pool surface.

SCALE-M1 (Med) — DB pool default bumped 25 → 50
  internal/config/config.go line 1972:
    MaxConnections: getEnvInt("CERTCTL_DATABASE_MAX_CONNS", 50)
  Postgres default max_connections is 100; 50 leaves headroom for
  pg_dump + ad-hoc psql + a server replica without exhausting the
  DB-side cap. Operator override env var unchanged. Operator-tune
  ladder for larger fleets (5K / 50K certs) lives in
  docs/operator/scale.md as starter values pending Phase 8 load
  tests — explicitly marked TBD.

SCALE-M3 (Med) — async-CA poll budget operator-configurable
  Live state was partially-already-shipped: all 4 async-CA
  connectors (digicert, entrust, globalsign, sectigo) already have
  per-connector CERTCTL_<NAME>_POLL_MAX_WAIT_SECONDS (Audit fix #5
  closed pre-Phase-6). What was missing: a global package-default
  override. Shipped:
    - internal/connector/issuer/asyncpoll/asyncpoll.go gains
      SetDefaultMaxWait(d) + effectiveDefaultMaxWait var + the
      currentDefaultMaxWait() priority resolver.
    - cmd/server/main.go reads CERTCTL_ASYNC_POLL_MAX_WAIT_SECONDS
      at boot and calls SetDefaultMaxWait.
    - deploy/ENVIRONMENTS.md documents the new env var (G-3 guard
      green).
  Naming deviation from the prompt's CERTCTL_ASYNC_POLL_MAX_ATTEMPTS:
  the live code tracks wall-clock time (MaxWait), not attempt count.
  Matched the existing per-connector nomenclature (_POLL_MAX_WAIT_SECONDS)
  so the priority chain reads naturally.

SCALE-M5 (Med) — JitteredTicker wrapper for all 15 scheduler loops
  internal/scheduler/jitter.go ships NewJitteredTicker(interval,
  jitterPct) + DefaultSchedulerJitter (±10%). All 15 sites in
  internal/scheduler/scheduler.go migrated from bare time.NewTicker
  to NewJitteredTicker(interval, DefaultSchedulerJitter). Base
  intervals unchanged; only the per-tick envelope adds ±10%
  randomized delay so multiple loops with the same nominal cadence
  don't co-fire and spike CPU + DB at wall-clock boundaries.

  internal/scheduler/jitter_test.go pins:
    - Bounded envelope (each tick within ±jitterPct of interval)
    - Mean drift < 30% of nominal (sign-bug detector)
    - Stop() releases the goroutine + closes C
    - Stop() idempotent (no panic on repeat)
    - Zero-jitter behaves like time.NewTicker
    - Negative and >=1 jitterPct values clamped defensively

  CI guard scripts/ci-guards/no-bare-newticker-in-scheduler.sh blocks
  any future bare time.NewTicker in scheduler.go.

SCALE-L1 (Low) — renewal-sweep semaphore behavior documented
  docs/operator/scale.md "Scheduler tick budgets" section explains
  the per-tick concurrency semaphore (CERTCTL_RENEWAL_CONCURRENCY=25
  default), the ctx-cancellation drain on tick-budget overrun, and
  operator tuning advice (raise concurrency + DB pool together).
  No code change — the behavior is defensible as-is per the audit.

SCALE-L2 (Low) — ETag middleware for top-5 read endpoints
  internal/api/middleware/etag.go computes SHA-256 ETag over the
  buffered response body, respects If-None-Match, short-circuits
  to 304 Not Modified on match. GET/HEAD only; non-2xx responses
  pass through unchanged. 64 KiB buffer cap degrades gracefully on
  oversized responses (no caching, body still flushes intact).

  Wired around the top-5 read endpoints via etagged() helper in
  internal/api/router/router.go:
    GET /api/v1/certificates
    GET /api/v1/agents
    GET /api/v1/jobs
    GET /api/v1/audit
    GET /api/v1/discovered-certificates

  internal/api/middleware/etag_test.go pins 11 behaviors including
  304-on-repeat, 200-after-mutation-with-new-ETag, POST bypass,
  4xx/5xx pass-through, oversized-response degradation, wildcard
  match, HEAD-treated-like-GET, byte-equal pass-through.

Cross-cutting fixes:
  - internal/config/config_test.go::TestLoad_DefaultValues updated
    to assert the new 50 default (was 25).
  - deploy/helm/certctl/values.yaml comment corrected — agent
    pollInterval is hardcoded 30s, not env-configurable; the
    Phase 4 comment mistakenly referenced CERTCTL_AGENT_POLL_INTERVAL
    which G-3 caught as a phantom env var.
  - asyncpoll.go reformatted by gofmt; functionally unchanged.

Verification (all pass):
  grep -nE 'SetMaxOpenConns' internal/repository/postgres/db.go    # finds 1 site
  grep -nE 'CERTCTL_DATABASE_MAX_CONNS.*50' internal/config/config.go  # config default is 50
  grep -rnE 'CERTCTL_ASYNC_POLL_MAX_WAIT_SECONDS' internal/ deploy/ENVIRONMENTS.md  # wired
  grep -cE 'time\.NewTicker\(' internal/scheduler/scheduler.go    # 0 (all migrated)
  grep -cE 'JitteredTicker' internal/scheduler/scheduler.go         # 15
  ls internal/scheduler/jitter.go internal/api/middleware/etag.go   # both exist
  ls docs/operator/scale.md                                          # exists
  bash scripts/ci-guards/no-bare-newticker-in-scheduler.sh          # clean
  bash scripts/ci-guards/G-3-env-docs-drift.sh                      # clean
  go test ./internal/scheduler/ ./internal/api/middleware/ \
    ./internal/connector/issuer/asyncpoll/ ./internal/config/       # 4/4 packages green

Closes: cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M1
        cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M3
        cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M5
        cowork/certctl-architecture-diligence-audit.html#fix-SCALE-L1
        cowork/certctl-architecture-diligence-audit.html#fix-SCALE-L2

2026-05-14 01:23:03 +00:00

B2-compose-base-no-demo-env.sh

fix(security): close BUNDLE 2 — safe first run, demo mode, agent bootstrap

2026-05-13 00:14:59 +00:00

B3-helm-chart-coherence.sh

fix(helm): close BUNDLE 3 — Helm chart hardening + enterprise deploy

2026-05-13 00:40:42 +00:00

B6-no-private-keys-in-tree.sh

docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard

2026-05-13 01:48:40 +00:00

B-1-orphan-crud.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-1-compat-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-1-to-2-upgrade-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-8-L-015-target-blank-rel-noopener.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-L-019-dangerously-set-inner-html.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-M-009-bare-usemutation.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

complete-path-config-coverage-exceptions.yaml

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

complete-path-config-coverage.sh

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

cors-wildcard-allowlist.sh

fix(api/cors): narrow Bundle-2 routes from wildcard to NewCORS(corsCfg)

2026-05-10 20:12:19 +00:00

D-1-D-2-statusbadge-phantom.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

digest-validity.sh

ci: add exponential-backoff retry to digest-validity guard

2026-05-13 20:17:08 +00:00

doc-rot-detector-exceptions.yaml

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

doc-rot-detector.sh

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

G-1-jwt-auth-literal.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-2-api-key-hash-json.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-3-env-docs-drift.sh

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

H-1-encryption-key-min-length.sh

fix(deploy/test) + ci(guard): unblock deploy-vendor-e2e — encryption-key length

2026-05-01 00:57:43 +00:00

H-001-bare-from.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

H-009-readme-jwt.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

helm-templates-lint.sh

deploy(helm): close Phase 4 — chart surface + DR + ops runbooks

2026-05-14 00:58:00 +00:00

L-1-bulk-action-loop.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

L-001-insecure-skip-verify.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

M-012-no-root-user.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

multi-tenant-query-coverage.sh

chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5

2026-05-11 14:19:35 +00:00

N-bundle-2-security-empty-preserved.sh

auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints),

2026-05-10 06:08:27 +00:00

no-bare-newticker-in-scheduler.sh

scheduler+db: close Phase 6 — scale hardening across pool, jitter, ETag, asyncpoll

2026-05-14 01:23:03 +00:00

no-change-me-in-prod-compose.sh

config: default hardening + operator docs (Phase 2 closure — SEC-H1, SEC-H3, SEC-M4, DEPL-H1, DEPL-M2 + doc-only carve-outs)

2026-05-13 19:50:00 +00:00

no-new-synthetic-admin.sh

harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8)

2026-05-11 11:45:54 +00:00

no-precompiled-binary.sh

ci: supply-chain hardening (Phase 1 closure — RED-1, RED-2, TEST-L2)

2026-05-13 19:30:53 +00:00

no-tag-pinned-actions.sh

ci: supply-chain hardening (Phase 1 closure — RED-1, RED-2, TEST-L2)

2026-05-13 19:30:53 +00:00

no-todo-in-prod.sh

ci: floor raise + doc drift (Phase 3 closure — TEST-H1/H2/M1/M2/M3/M4/L1, ARCH-H3/L1/L2/L3/L4)

2026-05-13 20:10:08 +00:00

openapi-codegen-drift.sh

ci: OpenAPI parity reconciliation + codegen scaffolding (Phase 5 — ARCH-H1 / ARCH-M6)

2026-05-13 20:24:20 +00:00

openapi-handler-parity.sh

ci: OpenAPI parity reconciliation + codegen scaffolding (Phase 5 — ARCH-H1 / ARCH-M6)

2026-05-13 20:24:20 +00:00

P-1-documented-orphan-fns.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

S-1-hardcoded-source-counts.sh

docs: remove internal engineering docs; docs must be tool- or story-relevant

2026-05-13 02:44:27 +00:00

S-2-strings-contains-err.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

skip-inventory-drift.sh

ci: floor raise + doc drift (Phase 3 closure — TEST-H1/H2/M1/M2/M3/M4/L1, ARCH-H3/L1/L2/L3/L4)

2026-05-13 20:10:08 +00:00

surface-parity-mcp-exemptions.yaml

feat(ci): item-2 cross-surface contract parity (stdlib-only package)

2026-05-12 14:09:32 +00:00

T-1-frontend-page-coverage.sh

web, docs: IssuerHierarchyPage + sysadmin runbook + connectors row (Rank 8 commit 5)

2026-05-04 02:33:48 +00:00

test-compose-scep-coherence.sh

fix(deploy/test) + ci(guard): drop dead SCEP profile from test compose

2026-05-01 01:39:18 +00:00

test-naming-convention.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

U-2-plaintext-healthcheck.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

U-3-migration-mount.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

`scripts/ci-guards/` — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

Be exit-code 0 on a clean repo (no regression present).
Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
Use set -e early to fail-fast on internal command errors.
Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
scripts/check-coverage-thresholds.sh — consumes coverage.out
- .github/coverage-thresholds.yml

Adding a new guard

Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
Make it executable: chmod +x scripts/ci-guards/<id>.sh.
Verify it fails on a deliberate regression and passes on clean repo.
CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID	Finding	Catches
`G-1-jwt-auth-literal`	G-1 JWT silent auth downgrade	`"jwt"` literal in additive auth-type surfaces
`L-001-insecure-skip-verify`	L-001 unjustified InsecureSkipVerify	`InsecureSkipVerify: true` without `//nolint:gosec`
`H-001-bare-from`	H-001 (CWE-829) tag-swap attack	Bare `FROM` line without `@sha256` digest pin
`M-012-no-root-user`	M-012 (CWE-250) container-as-root	Dockerfile missing terminal `USER <non-root>`
`H-009-readme-jwt`	H-009 README JWT advertising	README.md re-introducing JWT-as-supported claim
`G-2-api-key-hash-json`	G-2 cat-s5-apikey_leak	`api_key_hash` in JSON-emitting surface
`U-2-plaintext-healthcheck`	U-2 healthcheck protocol mismatch	Plaintext `http://` in HEALTHCHECK directive
`U-3-migration-mount`	U-3 seed initdb schema drift	Migration file mounted into postgres initdb
`D-1-D-2-statusbadge-phantom`	D-1 + D-2 dead keys + TS phantoms	StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
`L-1-bulk-action-loop`	L-1 client-side bulk loops	`for ... await triggerRenewal/updateCertificate` in CertificatesPage
`B-1-orphan-crud`	B-1 orphan-CRUD client fns	8 update/create/delete fns lose their page consumer
`S-2-strings-contains-err`	S-2 brittle error-dispatch	`strings.Contains(err.Error(), "not found"\|"violates foreign key")` in handlers
`G-3-env-docs-drift`	G-3 env-var docs drift	`CERTCTL_*` env var defined OR documented but not both
`test-naming-convention`	I-001-extended	`func TestXxx` (lowercase first letter) — Go silently skips
`S-1-hardcoded-source-counts`	S-1 stale numeric prose	Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
`P-1-documented-orphan-fns`	P-1 documented orphans	16 read-fn names removed from client.ts exports
`T-1-frontend-page-coverage`	T-1 untested frontend pages	New page in `web/src/pages/` without sibling `.test.tsx` and not on the deferred allowlist
`bundle-8-L-015-target-blank-rel-noopener`	L-015 (CWE-1022) reverse-tabnabbing	`target="_blank"` without `rel="noopener noreferrer"`
`bundle-8-L-019-dangerously-set-inner-html`	L-019 (CWE-79) XSS	`dangerouslySetInnerHTML` outside `safeHtml.ts`
`bundle-8-M-009-bare-usemutation`	M-009 + M-029 mutation contract	Bare `useMutation()` outside `useTrackedMutation` wrapper
`H-1-encryption-key-min-length`	H-1 closure follow-up (post-Phase-5 surfacing)	`CERTCTL_CONFIG_ENCRYPTION_KEY` literal in any `deploy/docker-compose*.yml` shorter than the 32-byte floor enforced by `internal/config/config.go::Validate()`
`test-compose-scep-coherence`	post-Phase-5 surfacing of dead SCEP test config	`CERTCTL_SCEP_ENABLED=true` in test compose without (a) a CI job that runs the SCEP integration test, (b) the `ra.crt` + `ra.key` + `intune_trust_anchor.pem` fixtures committed to `deploy/test/fixtures/`, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID	Item	Catches
`complete-path-config-coverage`	post-v2.1.0 / item-1	"Lying field" — `CERTCTL_*` env var defined in `internal/config/config.go` that no consumer outside `internal/config/` actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at `internal/config/coverage_test.go`.
`doc-rot-detector`	post-v2.1.0 / item-5	Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. `docs/archive/` allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done

README.md

scripts/ci-guards/ — Regression-guard scripts

Contract

Helpers vs guards

Adding a new guard

Guards in this directory

Per-finding regression guards

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

Running the full set locally

`scripts/ci-guards/` — Regression-guard scripts