mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:51:36 +00:00
47da13e7a1
Bundle 3 closure (2026-05-12 acquisition diligence audit). Closes the
"chart claims production-ready but lying-fields silently break it"
hazard cluster: README install command had wrong key, required secrets
weren't fail-fast, external Postgres rendered the bundled StatefulSet
hostname, container-only security hardening fields landed at pod scope
(silently dropped by K8s API), and three advertised template surfaces
(ServiceMonitor, PodDisruptionBudget, NetworkPolicy) didn't render at
all even when their values.yaml toggles were on.
Source findings closed:
C2 C3 D1 D2 D3 D5 D7 D11 D12 (repo audit)
OPS-L1 OPS-L2 (cowork audit)
Source findings explicitly deferred (tracked in WORKSPACE-ROADMAP.md):
D6 OPS-H1 (backup automation — operator must choose target storage)
D10 (digest pinning of latest `:latest` tags)
OPS-M1 (prometheus/client_golang migration)
OPS-M2 (distributed tracing instrumentation)
Chart truth table (rendered with helm 3.16.3):
-f values.yaml + tls.existingSecret + auth.apiKey + pg.auth.password
→ 12 resources (default mode, no monitoring/PDB/networkpolicy)
+ postgresql.enabled=false + externalDatabase.url=…
→ NO StatefulSet, NO postgres-secret, NO postgres-service (D2)
+ server.tls.certManager.enabled=true
→ +1 Certificate (cert-manager mode)
+ replicas=3 + monitoring.enabled=true + serviceMonitor.enabled=true
+ podDisruptionBudget.enabled=true + networkPolicy.enabled=true
→ +1 ServiceMonitor + 1 PodDisruptionBudget + 1 NetworkPolicy (D5+D11)
tls.existingSecret AND tls.certManager.enabled both set
→ REFUSED with "EXACTLY ONE TLS ownership path" error (D7)
Missing required secrets (apiKey / pg password / external URL)
→ REFUSED at template time with operator-actionable guidance (D1)
Closures by source ID:
C2 — README Helm install example fixed. Was `--set postgresql.password=…`
(does not exist); now `--set postgresql.auth.password=…` matching
the chart key. README install block also wires TLS, mentions
fail-fast at template time, and links the external-Postgres example.
C3 — Kubernetes Secrets connector annotated PREVIEW in values.yaml.
The chart still exposes `kubernetesSecrets.enabled` for the RBAC
preview wiring, but the values block now states clearly that the
production K8s client at internal/connector/target/k8ssecret/
k8ssecret.go::realK8sClient is a stub (verified — go.mod imports
zero k8s.io/client-go packages). Production landing tracked in
WORKSPACE-ROADMAP.md.
D1 — `certctl.requiredSecrets` template helper. Fail-fasts at render
time when (a) server.auth.type=api-key + apiKey empty, (b)
postgresql.enabled=true + pg.auth.password empty, (c)
postgresql.enabled=false + externalDatabase.url + legacy env
CERTCTL_DATABASE_URL all empty. Each branch emits an
operator-actionable diagnostic with the openssl rand command or
values override needed. postgres-secret template additionally
uses Helm's `required` builtin so it can't render with the empty
fallback that pre-Bundle-3 produced ("changeme" literal).
D2 — externalDatabase.url first-class. New top-level values block.
certctl.databaseURL helper now branches on postgresql.enabled:
bundled path uses the helper-emitted in-cluster URL; external
path uses externalDatabase.url verbatim. postgres-secret,
postgres-statefulset, and postgres-service ALL gate on
postgresql.enabled — external mode renders ZERO postgres-*
resources. POSTGRES_PASSWORD env in server-deployment also gates.
D3 — Container-vs-pod security context split. K8s API silently drops
readOnlyRootFilesystem / allowPrivilegeEscalation / capabilities /
privileged when they land at pod scope (`spec.securityContext`);
they only work at container scope (`spec.containers[].securityContext`).
Pre-Bundle-3 all fields sat at pod scope so the chart's documented
"read-only rootfs + drop-all caps" hardening was effectively
unenforced. New certctl.podSecurityContext + containerSecurityContext
helpers split the operator-facing securityContext map by field-name
whitelist so existing values keep working byte-for-byte while
fields render at the K8s-valid scope. Applied to both
server-deployment.yaml and agent-daemonset.yaml (DaemonSet + Deployment
branches).
D5 — Prometheus ServiceMonitor template. New
templates/servicemonitor.yaml. Renders when monitoring.enabled AND
monitoring.serviceMonitor.enabled. Scrapes /api/v1/metrics/prometheus
(rbac-gated on metrics.read — needs bearerTokenSecret with an API
key holding that perm). values.yaml block extended with bearerTokenSecret,
tlsConfig, and relabelings knobs and the operator-facing comment
documenting the auth requirement.
D7 — TLS both-set rejection. certctl.tls.required helper extended.
Pre-Bundle-3 only the NEITHER-set case was caught; setting BOTH
rendered a dangling cert-manager Certificate alongside an
existing-Secret mount, two conflicting TLS sources of truth.
Now refuses with "EXACTLY ONE TLS ownership path" + remediation
steps for both possible operator intents.
D11 — PodDisruptionBudget + NetworkPolicy templates. New
templates/pdb.yaml (renders when podDisruptionBudget.enabled +
server.replicas > 1) + templates/networkpolicy.yaml (renders when
networkPolicy.enabled). PDB uses minAvailable / maxUnavailable
exclusivity per K8s spec. NetworkPolicy default-allows in-namespace
agent → server traffic, kube-DNS egress, and bundled-postgres
egress (when postgresql.enabled), with operator-extensible
extraIngress / extraEgress for CA / OIDC / SMTP egress. Both
default off so existing deploys don't lose network reach
unannounced.
D12 — Database max-conn config wired. Pre-Bundle-3
internal/repository/postgres/db.go::NewDB hard-coded
SetMaxOpenConns(25). config.go loaded CERTCTL_DATABASE_MAX_CONNS,
Validate() enforced the >= 1 floor, values.yaml documented it,
and docs/reference/configuration.md surfaced it — but the pool
ignored every operator setting. New NewDBWithMaxConns threads
the operator value into the pool with maxIdle = maxOpen / 5
(≥ 1) so the historical ratio carries forward. cmd/server/main.go
calls the new constructor; NewDB stays for compat at the default 25.
OPS-L1 — Chart version 0.1.0 → 1.0.0. Chart has shipped through 8 audit
closures since 2026-02 (M-018, U-1, U-2, U-3, H-1, G-1, B1, B2);
pre-1.0 version was implying instability the chart no longer has.
OPS-L2 — External-Postgres path is now properly documented in values.yaml
(externalDatabase block with mode-2 example), README install command
links the existing examples/values-external-db.yaml, and the chart
truth table above proves the external mode renders cleanly.
Receipts:
helm lint deploy/helm/certctl/ # clean
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.auth.password=p \
--set server.auth.apiKey=k # 12 kinds, default
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.enabled=false \
--set externalDatabase.url='postgres://u:p@h:5432/db?sslmode=require' \
--set server.auth.apiKey=k # 9 kinds, no postgres-*
helm template c deploy/helm/certctl/ \
--set server.tls.certManager.enabled=true \
--set server.tls.certManager.issuerRef.name=letsencrypt \
--set postgresql.auth.password=p --set server.auth.apiKey=k
# +1 Certificate (cert-manager)
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.auth.password=p --set server.auth.apiKey=k \
--set server.replicas=3 \
--set monitoring.enabled=true \
--set monitoring.serviceMonitor.enabled=true \
--set podDisruptionBudget.enabled=true \
--set networkPolicy.enabled=true # +ServiceMonitor +PDB +NetworkPolicy
(TLS both-set + missing apiKey + missing pg password + missing extDb URL all REFUSED.)
gofmt -l # clean
go vet ./internal/repository/postgres ./cmd/server # clean
go build ./cmd/server # clean
bash scripts/ci-guards/B3-helm-chart-coherence.sh # clean
Remaining operator warnings (deferred, tracked in WORKSPACE-ROADMAP.md):
- Backup CronJob + restore script (D6 + OPS-H1): operator chooses
target (S3, GCS, Azure Blob, NFS). Sample CronJob yaml may ship
in deploy/helm/examples/ once an operator workstation has run
one full backup-restore cycle.
- Distributed tracing (OPS-M2): otel/* are go.mod indirect deps,
not actively instrumented. Adding spans is a v3 work item.
- Prometheus client_golang migration (OPS-M1): the hand-rolled
/metrics/prometheus exposition format works today; client_golang
migration unlocks histograms + exemplars + native label sets.
Audit-Closes: BUNDLE-3 C2 C3 D1 D2 D3 D5 D7 D11 D12 OPS-L1 OPS-L2
Audit-Defers: D6 D10 OPS-H1 OPS-M1 OPS-M2
162 lines
6.2 KiB
Bash
Executable File
162 lines
6.2 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# scripts/ci-guards/B3-helm-chart-coherence.sh
|
|
#
|
|
# Bundle 3 closure (2026-05-12) — Helm chart coherence guard.
|
|
#
|
|
# Catches regressions in the chart-truth surface the Bundle 3 closure
|
|
# locked in:
|
|
#
|
|
# 1. README's Helm install example must use the canonical
|
|
# `postgresql.auth.password` key (audit C2). The pre-Bundle-3
|
|
# example used the wrong `postgresql.password` shape.
|
|
#
|
|
# 2. The chart renders all 5 advertised production modes:
|
|
# - default (TLS existingSecret + secrets)
|
|
# - external Postgres (postgresql.enabled=false + externalDatabase.url)
|
|
# - cert-manager TLS
|
|
# - production hardening (NetworkPolicy + PDB + ServiceMonitor)
|
|
# - both-TLS-set is REJECTED (D7)
|
|
#
|
|
# 3. The chart still fail-fasts on missing required secrets (D1):
|
|
# - server.auth.apiKey empty when type=api-key
|
|
# - postgresql.auth.password empty when postgresql.enabled=true
|
|
# - externalDatabase.url empty when postgresql.enabled=false
|
|
#
|
|
# 4. The bundled-Postgres Secret template does NOT render when
|
|
# postgresql.enabled=false (D2 / clean external mode).
|
|
#
|
|
# Per the contract documented in scripts/ci-guards/README.md:
|
|
# bare callable, no args, no env, exit 0 on clean. Skips quietly when
|
|
# `helm` is not on PATH (developer workstations without Helm installed),
|
|
# but the GH Actions runner always has it.
|
|
|
|
set -e
|
|
|
|
GUARD_NAME="B3-helm-chart-coherence"
|
|
CHART="deploy/helm/certctl/"
|
|
README="README.md"
|
|
|
|
if ! command -v helm > /dev/null 2>&1; then
|
|
echo "${GUARD_NAME}: helm not on PATH — skipping (install helm ≥ 3.13 to enable locally)."
|
|
exit 0
|
|
fi
|
|
|
|
failed=0
|
|
|
|
# Check 1 — README Helm install command uses postgresql.auth.password,
|
|
# never the pre-Bundle-3 postgresql.password shape.
|
|
if grep -nE -- '--set\s+postgresql\.password=' "$README"; then
|
|
echo "::error file=${README}::Bundle 3 audit C2 regression: README references --set postgresql.password=... — the canonical key is postgresql.auth.password (matches values.yaml + Bitnami-style chart). Update the install command."
|
|
failed=1
|
|
fi
|
|
|
|
# Check 2 — production-mode renders pass. We use a tmp dir so partial
|
|
# failures don't leave stray files.
|
|
TMP=$(mktemp -d)
|
|
trap 'rm -rf "$TMP"' EXIT
|
|
|
|
# Default mode.
|
|
if ! helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set postgresql.auth.password=p \
|
|
--set server.auth.apiKey=k \
|
|
> "$TMP/default.yaml" 2> "$TMP/default.err"; then
|
|
echo "::error file=${CHART}::B3 regression: default mode (TLS + secrets) fails to render."
|
|
cat "$TMP/default.err"
|
|
failed=1
|
|
fi
|
|
|
|
# External Postgres mode.
|
|
if ! helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set postgresql.enabled=false \
|
|
--set externalDatabase.url='postgres://u:p@h:5432/db?sslmode=require' \
|
|
--set server.auth.apiKey=k \
|
|
> "$TMP/external.yaml" 2> "$TMP/external.err"; then
|
|
echo "::error file=${CHART}::B3 regression: external Postgres mode fails to render."
|
|
cat "$TMP/external.err"
|
|
failed=1
|
|
fi
|
|
|
|
# Bundle 3 D2 check: bundled-Postgres Secret + StatefulSet + Service
|
|
# must NOT appear in external-Postgres render.
|
|
for resource in StatefulSet "postgres-secret.yaml" "postgres-service.yaml"; do
|
|
if grep -q "$resource" "$TMP/external.yaml" 2>/dev/null; then
|
|
echo "::error file=${CHART}::B3 regression (D2): external-Postgres render still emits $resource. postgresql.enabled=false must skip ALL postgres-* templates."
|
|
failed=1
|
|
fi
|
|
done
|
|
|
|
# Production hardening mode.
|
|
if ! helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set postgresql.auth.password=p \
|
|
--set server.auth.apiKey=k \
|
|
--set server.replicas=3 \
|
|
--set monitoring.enabled=true \
|
|
--set monitoring.serviceMonitor.enabled=true \
|
|
--set podDisruptionBudget.enabled=true \
|
|
--set networkPolicy.enabled=true \
|
|
> "$TMP/prod.yaml" 2> "$TMP/prod.err"; then
|
|
echo "::error file=${CHART}::B3 regression: production hardening mode fails to render."
|
|
cat "$TMP/prod.err"
|
|
failed=1
|
|
fi
|
|
|
|
# Bundle 3 D5 + D11 check: production hardening render MUST include
|
|
# ServiceMonitor + PodDisruptionBudget + NetworkPolicy.
|
|
for kind in ServiceMonitor PodDisruptionBudget NetworkPolicy; do
|
|
if ! grep -q "^kind: $kind\$" "$TMP/prod.yaml" 2>/dev/null; then
|
|
echo "::error file=${CHART}::B3 regression: production hardening render is missing kind: $kind."
|
|
failed=1
|
|
fi
|
|
done
|
|
|
|
# Check 3 — D7 TLS both-set rejection.
|
|
if helm template c "$CHART" \
|
|
--set server.tls.existingSecret=existing \
|
|
--set server.tls.certManager.enabled=true \
|
|
--set server.tls.certManager.issuerRef.name=foo \
|
|
--set postgresql.auth.password=p \
|
|
--set server.auth.apiKey=k \
|
|
> /dev/null 2> "$TMP/both-tls.err"; then
|
|
echo "::error file=${CHART}::B3 regression (D7): TLS both-set rendered successfully. Chart must refuse when existingSecret AND certManager.enabled are both populated."
|
|
failed=1
|
|
fi
|
|
|
|
# Check 4 — D1 fail-fast on missing apiKey.
|
|
if helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set postgresql.auth.password=p \
|
|
> /dev/null 2> "$TMP/missing-apikey.err"; then
|
|
echo "::error file=${CHART}::B3 regression (D1): missing server.auth.apiKey rendered successfully when auth.type=api-key. Chart must refuse."
|
|
failed=1
|
|
fi
|
|
|
|
# Check 5 — D1 fail-fast on missing postgres password (bundled mode).
|
|
if helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set server.auth.apiKey=k \
|
|
> /dev/null 2> "$TMP/missing-pg.err"; then
|
|
echo "::error file=${CHART}::B3 regression (D1): missing postgresql.auth.password rendered successfully when postgresql.enabled=true. Chart must refuse."
|
|
failed=1
|
|
fi
|
|
|
|
# Check 6 — D1 fail-fast on missing external DB URL.
|
|
if helm template c "$CHART" \
|
|
--set server.tls.existingSecret=ci \
|
|
--set postgresql.enabled=false \
|
|
--set server.auth.apiKey=k \
|
|
> /dev/null 2> "$TMP/missing-extdb.err"; then
|
|
echo "::error file=${CHART}::B3 regression (D1): missing externalDatabase.url rendered successfully when postgresql.enabled=false. Chart must refuse."
|
|
failed=1
|
|
fi
|
|
|
|
if [ "$failed" -ne 0 ]; then
|
|
echo ""
|
|
echo "${GUARD_NAME}: FAILED — Helm chart coherence regression."
|
|
exit 1
|
|
fi
|
|
|
|
echo "${GUARD_NAME}: clean (default + external-Postgres + cert-manager + production hardening + 3 fail-fast gates all green)."
|