mirror of https://github.com/shankar0123/certctl.git synced 2026-07-27 20:10:04 +00:00

Files

T

shankar0123 47da13e7a1 fix(helm): close BUNDLE 3 — Helm chart hardening + enterprise deploy

Bundle 3 closure (2026-05-12 acquisition diligence audit). Closes the
"chart claims production-ready but lying-fields silently break it"
hazard cluster: README install command had wrong key, required secrets
weren't fail-fast, external Postgres rendered the bundled StatefulSet
hostname, container-only security hardening fields landed at pod scope
(silently dropped by K8s API), and three advertised template surfaces
(ServiceMonitor, PodDisruptionBudget, NetworkPolicy) didn't render at
all even when their values.yaml toggles were on.

Source findings closed:
  C2 C3 D1 D2 D3 D5 D7 D11 D12       (repo audit)
  OPS-L1 OPS-L2                       (cowork audit)
Source findings explicitly deferred (tracked in WORKSPACE-ROADMAP.md):
  D6 OPS-H1   (backup automation — operator must choose target storage)
  D10         (digest pinning of latest `:latest` tags)
  OPS-M1      (prometheus/client_golang migration)
  OPS-M2      (distributed tracing instrumentation)

Chart truth table (rendered with helm 3.16.3):
  -f values.yaml + tls.existingSecret + auth.apiKey + pg.auth.password
    → 12 resources (default mode, no monitoring/PDB/networkpolicy)
  + postgresql.enabled=false + externalDatabase.url=…
    → NO StatefulSet, NO postgres-secret, NO postgres-service (D2)
  + server.tls.certManager.enabled=true
    → +1 Certificate (cert-manager mode)
  + replicas=3 + monitoring.enabled=true + serviceMonitor.enabled=true
    + podDisruptionBudget.enabled=true + networkPolicy.enabled=true
    → +1 ServiceMonitor + 1 PodDisruptionBudget + 1 NetworkPolicy (D5+D11)
  tls.existingSecret AND tls.certManager.enabled both set
    → REFUSED with "EXACTLY ONE TLS ownership path" error (D7)
  Missing required secrets (apiKey / pg password / external URL)
    → REFUSED at template time with operator-actionable guidance (D1)

Closures by source ID:

C2 — README Helm install example fixed. Was `--set postgresql.password=…`
  (does not exist); now `--set postgresql.auth.password=…` matching
  the chart key. README install block also wires TLS, mentions
  fail-fast at template time, and links the external-Postgres example.

C3 — Kubernetes Secrets connector annotated PREVIEW in values.yaml.
  The chart still exposes `kubernetesSecrets.enabled` for the RBAC
  preview wiring, but the values block now states clearly that the
  production K8s client at internal/connector/target/k8ssecret/
  k8ssecret.go::realK8sClient is a stub (verified — go.mod imports
  zero k8s.io/client-go packages). Production landing tracked in
  WORKSPACE-ROADMAP.md.

D1 — `certctl.requiredSecrets` template helper. Fail-fasts at render
  time when (a) server.auth.type=api-key + apiKey empty, (b)
  postgresql.enabled=true + pg.auth.password empty, (c)
  postgresql.enabled=false + externalDatabase.url + legacy env
  CERTCTL_DATABASE_URL all empty. Each branch emits an
  operator-actionable diagnostic with the openssl rand command or
  values override needed. postgres-secret template additionally
  uses Helm's `required` builtin so it can't render with the empty
  fallback that pre-Bundle-3 produced ("changeme" literal).

D2 — externalDatabase.url first-class. New top-level values block.
  certctl.databaseURL helper now branches on postgresql.enabled:
  bundled path uses the helper-emitted in-cluster URL; external
  path uses externalDatabase.url verbatim. postgres-secret,
  postgres-statefulset, and postgres-service ALL gate on
  postgresql.enabled — external mode renders ZERO postgres-*
  resources. POSTGRES_PASSWORD env in server-deployment also gates.

D3 — Container-vs-pod security context split. K8s API silently drops
  readOnlyRootFilesystem / allowPrivilegeEscalation / capabilities /
  privileged when they land at pod scope (`spec.securityContext`);
  they only work at container scope (`spec.containers[].securityContext`).
  Pre-Bundle-3 all fields sat at pod scope so the chart's documented
  "read-only rootfs + drop-all caps" hardening was effectively
  unenforced. New certctl.podSecurityContext + containerSecurityContext
  helpers split the operator-facing securityContext map by field-name
  whitelist so existing values keep working byte-for-byte while
  fields render at the K8s-valid scope. Applied to both
  server-deployment.yaml and agent-daemonset.yaml (DaemonSet + Deployment
  branches).

D5 — Prometheus ServiceMonitor template. New
  templates/servicemonitor.yaml. Renders when monitoring.enabled AND
  monitoring.serviceMonitor.enabled. Scrapes /api/v1/metrics/prometheus
  (rbac-gated on metrics.read — needs bearerTokenSecret with an API
  key holding that perm). values.yaml block extended with bearerTokenSecret,
  tlsConfig, and relabelings knobs and the operator-facing comment
  documenting the auth requirement.

D7 — TLS both-set rejection. certctl.tls.required helper extended.
  Pre-Bundle-3 only the NEITHER-set case was caught; setting BOTH
  rendered a dangling cert-manager Certificate alongside an
  existing-Secret mount, two conflicting TLS sources of truth.
  Now refuses with "EXACTLY ONE TLS ownership path" + remediation
  steps for both possible operator intents.

D11 — PodDisruptionBudget + NetworkPolicy templates. New
  templates/pdb.yaml (renders when podDisruptionBudget.enabled +
  server.replicas > 1) + templates/networkpolicy.yaml (renders when
  networkPolicy.enabled). PDB uses minAvailable / maxUnavailable
  exclusivity per K8s spec. NetworkPolicy default-allows in-namespace
  agent → server traffic, kube-DNS egress, and bundled-postgres
  egress (when postgresql.enabled), with operator-extensible
  extraIngress / extraEgress for CA / OIDC / SMTP egress. Both
  default off so existing deploys don't lose network reach
  unannounced.

D12 — Database max-conn config wired. Pre-Bundle-3
  internal/repository/postgres/db.go::NewDB hard-coded
  SetMaxOpenConns(25). config.go loaded CERTCTL_DATABASE_MAX_CONNS,
  Validate() enforced the >= 1 floor, values.yaml documented it,
  and docs/reference/configuration.md surfaced it — but the pool
  ignored every operator setting. New NewDBWithMaxConns threads
  the operator value into the pool with maxIdle = maxOpen / 5
  (≥ 1) so the historical ratio carries forward. cmd/server/main.go
  calls the new constructor; NewDB stays for compat at the default 25.

OPS-L1 — Chart version 0.1.0 → 1.0.0. Chart has shipped through 8 audit
  closures since 2026-02 (M-018, U-1, U-2, U-3, H-1, G-1, B1, B2);
  pre-1.0 version was implying instability the chart no longer has.

OPS-L2 — External-Postgres path is now properly documented in values.yaml
  (externalDatabase block with mode-2 example), README install command
  links the existing examples/values-external-db.yaml, and the chart
  truth table above proves the external mode renders cleanly.

Receipts:
  helm lint deploy/helm/certctl/                                # clean
  helm template c deploy/helm/certctl/ \
      --set server.tls.existingSecret=ci \
      --set postgresql.auth.password=p \
      --set server.auth.apiKey=k                                # 12 kinds, default
  helm template c deploy/helm/certctl/ \
      --set server.tls.existingSecret=ci \
      --set postgresql.enabled=false \
      --set externalDatabase.url='postgres://u:p@h:5432/db?sslmode=require' \
      --set server.auth.apiKey=k                                # 9 kinds, no postgres-*
  helm template c deploy/helm/certctl/ \
      --set server.tls.certManager.enabled=true \
      --set server.tls.certManager.issuerRef.name=letsencrypt \
      --set postgresql.auth.password=p --set server.auth.apiKey=k
                                                                # +1 Certificate (cert-manager)
  helm template c deploy/helm/certctl/ \
      --set server.tls.existingSecret=ci \
      --set postgresql.auth.password=p --set server.auth.apiKey=k \
      --set server.replicas=3 \
      --set monitoring.enabled=true \
      --set monitoring.serviceMonitor.enabled=true \
      --set podDisruptionBudget.enabled=true \
      --set networkPolicy.enabled=true                          # +ServiceMonitor +PDB +NetworkPolicy
  (TLS both-set + missing apiKey + missing pg password + missing extDb URL all REFUSED.)

  gofmt -l                                                      # clean
  go vet ./internal/repository/postgres ./cmd/server            # clean
  go build ./cmd/server                                         # clean
  bash scripts/ci-guards/B3-helm-chart-coherence.sh             # clean

Remaining operator warnings (deferred, tracked in WORKSPACE-ROADMAP.md):
  - Backup CronJob + restore script (D6 + OPS-H1): operator chooses
    target (S3, GCS, Azure Blob, NFS). Sample CronJob yaml may ship
    in deploy/helm/examples/ once an operator workstation has run
    one full backup-restore cycle.
  - Distributed tracing (OPS-M2): otel/* are go.mod indirect deps,
    not actively instrumented. Adding spans is a v3 work item.
  - Prometheus client_golang migration (OPS-M1): the hand-rolled
    /metrics/prometheus exposition format works today; client_golang
    migration unlocks histograms + exemplars + native label sets.

Audit-Closes: BUNDLE-3 C2 C3 D1 D2 D3 D5 D7 D11 D12 OPS-L1 OPS-L2
Audit-Defers: D6 D10 OPS-H1 OPS-M1 OPS-M2

2026-05-13 00:40:42 +00:00

B2-compose-base-no-demo-env.sh

fix(security): close BUNDLE 2 — safe first run, demo mode, agent bootstrap

2026-05-13 00:14:59 +00:00

B3-helm-chart-coherence.sh

fix(helm): close BUNDLE 3 — Helm chart hardening + enterprise deploy

2026-05-13 00:40:42 +00:00

B-1-orphan-crud.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-1-compat-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-1-to-2-upgrade-regression.sh

auth-bundle-2 Phase 6: session middleware + CSRF token plumbing +

2026-05-10 06:22:25 +00:00

bundle-8-L-015-target-blank-rel-noopener.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-L-019-dangerously-set-inner-html.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

bundle-8-M-009-bare-usemutation.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

complete-path-config-coverage-exceptions.yaml

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

complete-path-config-coverage.sh

feat(ci): item-1 complete-path config-coverage guard (PARTIAL — sandbox could not verify Go test)

2026-05-12 14:02:04 +00:00

cors-wildcard-allowlist.sh

fix(api/cors): narrow Bundle-2 routes from wildcard to NewCORS(corsCfg)

2026-05-10 20:12:19 +00:00

D-1-D-2-statusbadge-phantom.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

digest-validity.sh

ci(digest-validity): exclude Windows IIS digest — image is doc-only, not pulled by Linux CI

2026-05-01 03:06:49 +00:00

doc-rot-detector-exceptions.yaml

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

doc-rot-detector.sh

feat(ci): item-5 doc rot detector (90d warn / 120d fail)

2026-05-12 14:10:27 +00:00

G-1-jwt-auth-literal.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-2-api-key-hash-json.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

G-3-env-docs-drift.sh

chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5

2026-05-11 14:19:35 +00:00

H-1-encryption-key-min-length.sh

fix(deploy/test) + ci(guard): unblock deploy-vendor-e2e — encryption-key length

2026-05-01 00:57:43 +00:00

H-001-bare-from.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

H-009-readme-jwt.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

L-1-bulk-action-loop.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

L-001-insecure-skip-verify.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

M-012-no-root-user.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

multi-tenant-query-coverage.sh

chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5

2026-05-11 14:19:35 +00:00

N-bundle-2-security-empty-preserved.sh

auth-bundle-2 Phase 5: OIDC + session HTTP surface (13 endpoints),

2026-05-10 06:08:27 +00:00

no-new-synthetic-admin.sh

harden(auth): demo-mode residual-grants detector + cleanup endpoint + CI guard (A-8)

2026-05-11 11:45:54 +00:00

openapi-handler-parity.sh

ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job

2026-04-30 20:50:52 +00:00

P-1-documented-orphan-fns.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

ci(cold-db-smoke): inline into workflow; remove the script (operator: not a per-commit gate)

2026-05-12 14:22:19 +00:00

S-1-hardcoded-source-counts.sh

2026-05-05 18:18:38 +00:00

S-2-strings-contains-err.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

surface-parity-mcp-exemptions.yaml

feat(ci): item-2 cross-surface contract parity (stdlib-only package)

2026-05-12 14:09:32 +00:00

T-1-frontend-page-coverage.sh

web, docs: IssuerHierarchyPage + sysadmin runbook + connectors row (Rank 8 commit 5)

2026-05-04 02:33:48 +00:00

test-compose-scep-coherence.sh

fix(deploy/test) + ci(guard): drop dead SCEP profile from test compose

2026-05-01 01:39:18 +00:00

test-naming-convention.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

U-2-plaintext-healthcheck.sh

ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)

2026-05-05 04:56:43 +00:00

U-3-migration-mount.sh

ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/

2026-04-30 20:36:26 +00:00

README.md

`scripts/ci-guards/` — Regression-guard scripts

Each <id>.sh script in this directory pins one closed audit finding from regressing. CI runs the full set on every push via the Regression guards step in .github/workflows/ci.yml. Operators can run any script locally:

bash scripts/ci-guards/G-3-env-docs-drift.sh

Contract

Every script in this directory MUST:

Be exit-code 0 on a clean repo (no regression present).
Be exit-code non-zero on regression, with a ::error:: annotation prefix so PR reviewers see the failing line in the GitHub Actions UI.
Be runnable from repo root via bash scripts/ci-guards/<id>.sh with NO arguments and NO env-var requirements. The CI loop step (for g in scripts/ci-guards/*.sh; do bash "$g"; done) iterates every .sh here without args; any script that requires an arg or env var WILL fail in that loop.
Carry a head-comment block matching the in-source justification from the original ci.yml entry: the audit-finding reference, the closure rationale, the exempt-surface list (if any).
Use set -e early to fail-fast on internal command errors.
Produce no output on the happy path beyond a final echo "<id>: clean." confirmation line.

Helpers vs guards

Scripts that consume input artifacts (a test-output log, a coverage.out file) or env vars (PR_NUMBER, GH_TOKEN) are HELPERS, not guards. They live in scripts/, NOT scripts/ci-guards/.

Current helpers:

scripts/vendor-e2e-skip-check.sh — consumes test-output.log arg from the deploy-vendor-e2e job
scripts/coverage-pr-comment.sh — consumes coverage.out + PR_NUMBER + GH_TOKEN env from the go-build-and-test job
scripts/check-coverage-thresholds.sh — consumes coverage.out
- .github/coverage-thresholds.yml
scripts/qa-doc-part-count.sh + scripts/qa-doc-seed-count.sh — invoked via make verify-docs pre-tag, not in CI

Adding a new guard

Drop a new <id>.sh in this directory with the head-comment block describing the audit finding it closes.
Make it executable: chmod +x scripts/ci-guards/<id>.sh.
Verify it fails on a deliberate regression and passes on clean repo.
CI auto-picks up new scripts via the for g in scripts/ci-guards/*.sh loop in the Regression guards step — no ci.yml change required.

Guards in this directory

Count: re-derive on demand via ls scripts/ci-guards/*.sh | wc -l. The table below names each one — keep it in sync as guards are added.

Per-finding regression guards

ID	Finding	Catches
`G-1-jwt-auth-literal`	G-1 JWT silent auth downgrade	`"jwt"` literal in additive auth-type surfaces
`L-001-insecure-skip-verify`	L-001 unjustified InsecureSkipVerify	`InsecureSkipVerify: true` without `//nolint:gosec`
`H-001-bare-from`	H-001 (CWE-829) tag-swap attack	Bare `FROM` line without `@sha256` digest pin
`M-012-no-root-user`	M-012 (CWE-250) container-as-root	Dockerfile missing terminal `USER <non-root>`
`H-009-readme-jwt`	H-009 README JWT advertising	README.md re-introducing JWT-as-supported claim
`G-2-api-key-hash-json`	G-2 cat-s5-apikey_leak	`api_key_hash` in JSON-emitting surface
`U-2-plaintext-healthcheck`	U-2 healthcheck protocol mismatch	Plaintext `http://` in HEALTHCHECK directive
`U-3-migration-mount`	U-3 seed initdb schema drift	Migration file mounted into postgres initdb
`D-1-D-2-statusbadge-phantom`	D-1 + D-2 dead keys + TS phantoms	StatusBadge dead keys + 5 Certificate / 5 Agent / 1 Issuer / 1 Notification phantom fields
`L-1-bulk-action-loop`	L-1 client-side bulk loops	`for ... await triggerRenewal/updateCertificate` in CertificatesPage
`B-1-orphan-crud`	B-1 orphan-CRUD client fns	8 update/create/delete fns lose their page consumer
`S-2-strings-contains-err`	S-2 brittle error-dispatch	`strings.Contains(err.Error(), "not found"\|"violates foreign key")` in handlers
`G-3-env-docs-drift`	G-3 env-var docs drift	`CERTCTL_*` env var defined OR documented but not both
`test-naming-convention`	I-001-extended	`func TestXxx` (lowercase first letter) — Go silently skips
`S-1-hardcoded-source-counts`	S-1 stale numeric prose	Hardcoded "N issuer connectors" / "N MCP tools" in README + docs
`P-1-documented-orphan-fns`	P-1 documented orphans	16 read-fn names removed from client.ts exports
`T-1-frontend-page-coverage`	T-1 untested frontend pages	New page in `web/src/pages/` without sibling `.test.tsx` and not on the deferred allowlist
`bundle-8-L-015-target-blank-rel-noopener`	L-015 (CWE-1022) reverse-tabnabbing	`target="_blank"` without `rel="noopener noreferrer"`
`bundle-8-L-019-dangerously-set-inner-html`	L-019 (CWE-79) XSS	`dangerouslySetInnerHTML` outside `safeHtml.ts`
`bundle-8-M-009-bare-usemutation`	M-009 + M-029 mutation contract	Bare `useMutation()` outside `useTrackedMutation` wrapper
`H-1-encryption-key-min-length`	H-1 closure follow-up (post-Phase-5 surfacing)	`CERTCTL_CONFIG_ENCRYPTION_KEY` literal in any `deploy/docker-compose*.yml` shorter than the 32-byte floor enforced by `internal/config/config.go::Validate()`
`test-compose-scep-coherence`	post-Phase-5 surfacing of dead SCEP test config	`CERTCTL_SCEP_ENABLED=true` in test compose without (a) a CI job that runs the SCEP integration test, (b) the `ra.crt` + `ra.key` + `intune_trust_anchor.pem` fixtures committed to `deploy/test/fixtures/`, AND (c) the matching volume mount

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

These guards catch defect classes BEFORE they get audit findings — they pin invariants on the codebase that the v2.0 audit history showed are easy to lose.

ID	Item	Catches
`complete-path-config-coverage`	post-v2.1.0 / item-1	"Lying field" — `CERTCTL_*` env var defined in `internal/config/config.go` that no consumer outside `internal/config/` actually reads. Operator-facing config that the docs claim works but the code never honors. Companion Go test at `internal/config/coverage_test.go`.
`doc-rot-detector`	post-v2.1.0 / item-5	Docs older than 90 days warn (yellow), older than 120 days fail (red). Uses HEAD commit timestamp for reproducibility. `docs/archive/` allowlisted in bulk.

The cold-DB compose smoke (post-v2.1.0 / item-6) is NOT a script in this directory — it is inlined directly into .github/workflows/ci.yml::cold-db-compose-smoke because there is no value in a developer running it locally (the whole point of the gate is that CI owns the cold-DB state). To inspect or modify the smoke logic, read that workflow job; there is intentionally no scripts/ci-guards/cold-db-compose-smoke.sh.

The fourth Bundle artifact (internal/ciparity/) is Go tests, not shell guards — runs under the standard Go test step. Pins the MCP tool catalogue floor + naming convention; reports CLI/MCP/OpenAPI surface counts as a trend metric.

Guards explicitly NOT here

QA-doc Part-count drift + QA-doc seed-count drift — these protect docs-the-operator-reads, not anything the product depends on. Moved to make verify-docs (operator runs pre-tag, not on every push). See the ci-pipeline-cleanup spec, Phase 11.

Running the full set locally

for g in scripts/ci-guards/*.sh; do
  echo "=== $(basename "$g") ==="
  bash "$g" || echo "  FAILED"
done

README.md

scripts/ci-guards/ — Regression-guard scripts

Contract

Helpers vs guards

Adding a new guard

Guards in this directory

Per-finding regression guards

Forward-looking guards (Auditable Codebase Bundle, post-v2.1.0 anti-rot)

Guards explicitly NOT here

Running the full set locally

`scripts/ci-guards/` — Regression-guard scripts