fix(helm): DEPL-004 follow-up — default tlsConfig to real verify; fix ill-formed required-nil

Sprint 6 ACQ DEPL-004 closure follow-up. CI run on commit 58a15e0
caught two issues:

1. The fail-closed guard in templates/servicemonitor.yaml used
   `{{ required "msg" nil }}`, which is wrong Helm syntax — the
   bareword `nil` isn't valid in Go templates and Helm interprets
   it as no value, hitting "wrong number of args for required:
   want 2 got 0". The B3-helm-chart-coherence ci-guard's
   production-hardening render
   (`--set monitoring.serviceMonitor.enabled=true` without
   explicit tlsConfig) failed with this error AND with the
   downstream "missing kind: ServiceMonitor / PodDisruptionBudget /
   NetworkPolicy" cascades (the entire render aborted before
   producing the matrix).

2. The original DEPL-004 framing — "operators MUST explicitly
   choose tlsConfig or you get a chart-render error" — was the
   right intent but the wrong default. The chart's existingSecret
   integration mounts the CA bundle at a canonical path
   (/etc/prometheus/secrets/certctl-ca/ca.crt); defaulting to that
   path closes the implicit-skipVerify gap without forcing every
   operator to repeat the same boilerplate.

Fixes
=====

deploy/helm/certctl/values.yaml — flips
monitoring.serviceMonitor.tlsConfig from commented-out (which fell
through to implicit insecureSkipVerify: true) to a real verify
default:

  tlsConfig:
    caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
    serverName: certctl-server

Operators with a different CA mount path override caFile;
operators who genuinely want skipVerify back must set
`{ insecureSkipVerify: true }` explicitly. Operators who blank
tlsConfig entirely (`tlsConfig: null` or `tlsConfig: {}`) still
trip the fail-closed guard.

deploy/helm/certctl/templates/servicemonitor.yaml — replaces
`required "msg" nil` with `fail "msg"`. The `fail` builtin is
the correct Helm pattern for an unconditional render-time error;
`required` is for "this value MUST be non-empty" which is the
wrong semantic here (we want to fail when the operator went OUT OF
THEIR WAY to blank the default). Failure message updated to
reflect the new default + the operator-action recipes.

docs/operator/helm-deployment.md — rewrites the
"2026-05-16 — ServiceMonitor TLS default flipped" subsection to
match the new default-on-real-verify semantics. The three operator
recipes (default install / different CA mount / explicit
skipVerify) are documented; the explicit "there is no way to
inherit pre-2026-05-16 implicit-skipVerify behavior silently"
guarantee is preserved.

Verified locally: python3 YAML parse on values.yaml clean; the
helm-templates-lint and B3-helm-chart-coherence ci-guards require
helm itself which isn't in the sandbox — both should pass on the
CI re-run.
This commit is contained in:
shankar0123
2026-05-16 22:09:42 +00:00
parent 58a15e0b3d
commit 9155ec9174
3 changed files with 58 additions and 35 deletions
@@ -43,24 +43,24 @@ spec:
scrapeTimeout: {{ .Values.monitoring.serviceMonitor.scrapeTimeout | default "10s" }}
tlsConfig:
# Acquisition-audit DEPL-004 closure (Sprint 6 ACQ, 2026-05-16).
# The default flipped from `insecureSkipVerify: true` to a
# fail-closed posture: operators MUST either supply a
# `monitoring.serviceMonitor.tlsConfig` block (caFile / ca /
# serverName for a real TLS verify) or opt back in explicitly
# with `tlsConfig: { insecureSkipVerify: true }`. The {{ required }}
# check below renders an error at `helm template` / `helm upgrade`
# time if neither is supplied, surfacing the misconfiguration
# before the ServiceMonitor lands in-cluster.
#
# In-cluster scrapes from a Prometheus pod that already trusts the
# certctl CA (via existingSecret + cert-manager) keep working with
# zero operator action — they just point at the right caFile.
# Out-of-cluster Prometheus deployments now require the operator
# to surface the trust decision explicitly.
# Pre-Sprint-6 the default was an implicit `insecureSkipVerify:
# true` (template fell through the else-branch). Post-Sprint-6
# values.yaml ships a real-verify default
# (caFile + serverName matching the chart's existingSecret /
# cert-manager-emitted Secret at /etc/prometheus/secrets/
# certctl-ca/), so the truthy `{{ if .Values.monitoring.
# serviceMonitor.tlsConfig }}` branch always fires for the
# default install. Operators who genuinely want skipVerify
# back must override with `tlsConfig: { insecureSkipVerify:
# true }` explicitly. Operators who blank tlsConfig entirely
# (`tlsConfig: null` or `tlsConfig: {}`) hit the else-branch
# below and trip the `{{ fail }}` at chart-render time —
# there is no way to inherit the pre-Sprint-6 implicit-
# skipVerify behavior silently.
{{- if .Values.monitoring.serviceMonitor.tlsConfig }}
{{- toYaml .Values.monitoring.serviceMonitor.tlsConfig | nindent 8 }}
{{- else }}
{{- required "monitoring.serviceMonitor.tlsConfig is required when monitoring.serviceMonitor.enabled=true (Sprint 6 ACQ DEPL-004 closure, 2026-05-16). Supply { caFile: \"/etc/prometheus/secrets/.../ca.crt\", serverName: \"certctl-server\" } to verify against your CA, or { insecureSkipVerify: true } to opt back into the pre-2026-05-16 default. See docs/operator/helm-deployment.md for the upgrade-path note." nil }}
{{- fail "monitoring.serviceMonitor.tlsConfig was explicitly blanked but monitoring.serviceMonitor.enabled=true (Sprint 6 ACQ DEPL-004 closure, 2026-05-16). The values.yaml default ships caFile=/etc/prometheus/secrets/certctl-ca/ca.crt + serverName=certctl-server which matches the existingSecret mount pattern. If your Prometheus pod mounts the CA bundle at a different path, override caFile rather than blanking the block. If you genuinely need skipVerify, set `tlsConfig: { insecureSkipVerify: true }` explicitly — never blank. See docs/operator/helm-deployment.md for the upgrade-path note." }}
{{- end }}
{{- with .Values.monitoring.serviceMonitor.bearerTokenSecret }}
bearerTokenSecret:
+23 -15
View File
@@ -681,25 +681,33 @@ monitoring:
# key: api-key
# bearerTokenSecret: {}
# TLS config for the scrape endpoint. Acquisition-audit DEPL-004
# closure (Sprint 6 ACQ, 2026-05-16): the default flipped from
# `insecureSkipVerify: true` to fail-closed. Operators MUST supply
# tlsConfig — either a real verify (caFile / ca / serverName) for
# production, or explicit `{ insecureSkipVerify: true }` to opt
# back into the pre-2026-05-16 default. The ServiceMonitor template
# `{{ required ... }}` guard surfaces missing tlsConfig at chart-
# render time before it lands in-cluster.
# closure (Sprint 6 ACQ, 2026-05-16): pre-Sprint-6 the default was
# an implicit `insecureSkipVerify: true` (fell through the
# template's else-branch). Post-Sprint-6 the default is a real
# verify against the chart's CA at the canonical mount path the
# existingSecret pattern produces (Prometheus mounts the
# certctl-ca Secret as a volume at /etc/prometheus/secrets/
# certctl-ca/). Operators whose Prometheus pod mounts the bundle
# at a different path override `caFile` below; operators who
# genuinely want skipVerify back can do so explicitly. Operators
# who blank tlsConfig entirely (`tlsConfig: null` or
# `tlsConfig: {}`) trip the `{{ fail }}` guard in
# templates/servicemonitor.yaml at chart-render time — there is
# no way to inherit the pre-Sprint-6 implicit-skipVerify behavior
# silently.
#
# In-cluster Prometheus that already trusts the certctl CA via
# the chart's existingSecret / cert-manager-emitted bundle: point
# caFile at that path (typically /etc/prometheus/secrets/<name>/ca.crt
# once you mount the Secret into the Prometheus pod).
# Production default (verify against the chart's CA):
tlsConfig:
caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
serverName: certctl-server
#
# Production-shaped example (verify against the chart's CA):
# Operator override — different CA mount path:
# tlsConfig:
# caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
# serverName: certctl-server
# caFile: /path/to/your/ca.crt
# serverName: your-cert-CN
#
# Demo / dev-cluster escape hatch (operator-acknowledged):
# Operator override — demo / dev-cluster escape hatch
# (operator-acknowledged unsafe):
# tlsConfig:
# insecureSkipVerify: true
# Optional relabeling for the scrape job.
+20 -5
View File
@@ -96,20 +96,35 @@ Postgres state survives the upgrade (the PVC is retained). The server / agent im
### 2026-05-16 — ServiceMonitor TLS default flipped (DEPL-004)
Acquisition-audit DEPL-004 closure. `monitoring.serviceMonitor.tlsConfig` was previously empty by default and the chart template fell through to `insecureSkipVerify: true`. Post-2026-05-16, the template emits a `{{ required ... }}` fail-closed message at `helm template` / `helm upgrade` time if neither a real verify nor an explicit opt-back is supplied.
Acquisition-audit DEPL-004 closure. Pre-2026-05-16, `monitoring.serviceMonitor.tlsConfig` was empty by default and the chart template fell through to an implicit `insecureSkipVerify: true`. Post-2026-05-16, the values.yaml default is a real TLS verify against the chart's CA (caFile + serverName matching the existingSecret mount path the chart's Prometheus integration produces).
Operators with `monitoring.serviceMonitor.enabled: true` MUST set one of:
The new default works out of the box for the canonical install (the chart's `existingSecret` or cert-manager-emitted Secret mounted at `/etc/prometheus/secrets/certctl-ca/`):
```yaml
# A. Real TLS verify against the chart's CA (production-shaped).
# Default in values.yaml (no operator action required for the
# canonical install path).
monitoring:
serviceMonitor:
enabled: true
tlsConfig:
caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
serverName: certctl-server
```
# B. Demo / dev-cluster — operator-acknowledged opt-back to pre-flip default.
Operators whose Prometheus pod mounts the CA bundle at a different path override `caFile`:
```yaml
monitoring:
serviceMonitor:
enabled: true
tlsConfig:
caFile: /path/to/your/ca.crt
serverName: your-cert-CN
```
Operators who genuinely need `insecureSkipVerify` (demo / dev clusters) must opt in **explicitly** — blanking the `tlsConfig` block trips the chart's `{{ fail }}` guard at render time:
```yaml
monitoring:
serviceMonitor:
enabled: true
@@ -117,7 +132,7 @@ monitoring:
insecureSkipVerify: true
```
Operators with `monitoring.serviceMonitor.enabled: false` (the chart default) need no action — the template short-circuits before the `tlsConfig` block.
There is no way to inherit the pre-2026-05-16 implicit-skipVerify behavior silently. Operators with `monitoring.serviceMonitor.enabled: false` (the chart default) need no action — the template short-circuits before the `tlsConfig` block.
## Configuration reference