diff --git a/deploy/helm/certctl/templates/servicemonitor.yaml b/deploy/helm/certctl/templates/servicemonitor.yaml index b7abc7c..d338c70 100644 --- a/deploy/helm/certctl/templates/servicemonitor.yaml +++ b/deploy/helm/certctl/templates/servicemonitor.yaml @@ -43,24 +43,24 @@ spec: scrapeTimeout: {{ .Values.monitoring.serviceMonitor.scrapeTimeout | default "10s" }} tlsConfig: # Acquisition-audit DEPL-004 closure (Sprint 6 ACQ, 2026-05-16). - # The default flipped from `insecureSkipVerify: true` to a - # fail-closed posture: operators MUST either supply a - # `monitoring.serviceMonitor.tlsConfig` block (caFile / ca / - # serverName for a real TLS verify) or opt back in explicitly - # with `tlsConfig: { insecureSkipVerify: true }`. The {{ required }} - # check below renders an error at `helm template` / `helm upgrade` - # time if neither is supplied, surfacing the misconfiguration - # before the ServiceMonitor lands in-cluster. - # - # In-cluster scrapes from a Prometheus pod that already trusts the - # certctl CA (via existingSecret + cert-manager) keep working with - # zero operator action — they just point at the right caFile. - # Out-of-cluster Prometheus deployments now require the operator - # to surface the trust decision explicitly. + # Pre-Sprint-6 the default was an implicit `insecureSkipVerify: + # true` (template fell through the else-branch). Post-Sprint-6 + # values.yaml ships a real-verify default + # (caFile + serverName matching the chart's existingSecret / + # cert-manager-emitted Secret at /etc/prometheus/secrets/ + # certctl-ca/), so the truthy `{{ if .Values.monitoring. + # serviceMonitor.tlsConfig }}` branch always fires for the + # default install. Operators who genuinely want skipVerify + # back must override with `tlsConfig: { insecureSkipVerify: + # true }` explicitly. Operators who blank tlsConfig entirely + # (`tlsConfig: null` or `tlsConfig: {}`) hit the else-branch + # below and trip the `{{ fail }}` at chart-render time — + # there is no way to inherit the pre-Sprint-6 implicit- + # skipVerify behavior silently. {{- if .Values.monitoring.serviceMonitor.tlsConfig }} {{- toYaml .Values.monitoring.serviceMonitor.tlsConfig | nindent 8 }} {{- else }} - {{- required "monitoring.serviceMonitor.tlsConfig is required when monitoring.serviceMonitor.enabled=true (Sprint 6 ACQ DEPL-004 closure, 2026-05-16). Supply { caFile: \"/etc/prometheus/secrets/.../ca.crt\", serverName: \"certctl-server\" } to verify against your CA, or { insecureSkipVerify: true } to opt back into the pre-2026-05-16 default. See docs/operator/helm-deployment.md for the upgrade-path note." nil }} + {{- fail "monitoring.serviceMonitor.tlsConfig was explicitly blanked but monitoring.serviceMonitor.enabled=true (Sprint 6 ACQ DEPL-004 closure, 2026-05-16). The values.yaml default ships caFile=/etc/prometheus/secrets/certctl-ca/ca.crt + serverName=certctl-server which matches the existingSecret mount pattern. If your Prometheus pod mounts the CA bundle at a different path, override caFile rather than blanking the block. If you genuinely need skipVerify, set `tlsConfig: { insecureSkipVerify: true }` explicitly — never blank. See docs/operator/helm-deployment.md for the upgrade-path note." }} {{- end }} {{- with .Values.monitoring.serviceMonitor.bearerTokenSecret }} bearerTokenSecret: diff --git a/deploy/helm/certctl/values.yaml b/deploy/helm/certctl/values.yaml index 28d6828..9fece21 100644 --- a/deploy/helm/certctl/values.yaml +++ b/deploy/helm/certctl/values.yaml @@ -681,25 +681,33 @@ monitoring: # key: api-key # bearerTokenSecret: {} # TLS config for the scrape endpoint. Acquisition-audit DEPL-004 - # closure (Sprint 6 ACQ, 2026-05-16): the default flipped from - # `insecureSkipVerify: true` to fail-closed. Operators MUST supply - # tlsConfig — either a real verify (caFile / ca / serverName) for - # production, or explicit `{ insecureSkipVerify: true }` to opt - # back into the pre-2026-05-16 default. The ServiceMonitor template - # `{{ required ... }}` guard surfaces missing tlsConfig at chart- - # render time before it lands in-cluster. + # closure (Sprint 6 ACQ, 2026-05-16): pre-Sprint-6 the default was + # an implicit `insecureSkipVerify: true` (fell through the + # template's else-branch). Post-Sprint-6 the default is a real + # verify against the chart's CA at the canonical mount path the + # existingSecret pattern produces (Prometheus mounts the + # certctl-ca Secret as a volume at /etc/prometheus/secrets/ + # certctl-ca/). Operators whose Prometheus pod mounts the bundle + # at a different path override `caFile` below; operators who + # genuinely want skipVerify back can do so explicitly. Operators + # who blank tlsConfig entirely (`tlsConfig: null` or + # `tlsConfig: {}`) trip the `{{ fail }}` guard in + # templates/servicemonitor.yaml at chart-render time — there is + # no way to inherit the pre-Sprint-6 implicit-skipVerify behavior + # silently. # - # In-cluster Prometheus that already trusts the certctl CA via - # the chart's existingSecret / cert-manager-emitted bundle: point - # caFile at that path (typically /etc/prometheus/secrets//ca.crt - # once you mount the Secret into the Prometheus pod). + # Production default (verify against the chart's CA): + tlsConfig: + caFile: /etc/prometheus/secrets/certctl-ca/ca.crt + serverName: certctl-server # - # Production-shaped example (verify against the chart's CA): + # Operator override — different CA mount path: # tlsConfig: - # caFile: /etc/prometheus/secrets/certctl-ca/ca.crt - # serverName: certctl-server + # caFile: /path/to/your/ca.crt + # serverName: your-cert-CN # - # Demo / dev-cluster escape hatch (operator-acknowledged): + # Operator override — demo / dev-cluster escape hatch + # (operator-acknowledged unsafe): # tlsConfig: # insecureSkipVerify: true # Optional relabeling for the scrape job. diff --git a/docs/operator/helm-deployment.md b/docs/operator/helm-deployment.md index d83e4ec..3416105 100644 --- a/docs/operator/helm-deployment.md +++ b/docs/operator/helm-deployment.md @@ -96,20 +96,35 @@ Postgres state survives the upgrade (the PVC is retained). The server / agent im ### 2026-05-16 — ServiceMonitor TLS default flipped (DEPL-004) -Acquisition-audit DEPL-004 closure. `monitoring.serviceMonitor.tlsConfig` was previously empty by default and the chart template fell through to `insecureSkipVerify: true`. Post-2026-05-16, the template emits a `{{ required ... }}` fail-closed message at `helm template` / `helm upgrade` time if neither a real verify nor an explicit opt-back is supplied. +Acquisition-audit DEPL-004 closure. Pre-2026-05-16, `monitoring.serviceMonitor.tlsConfig` was empty by default and the chart template fell through to an implicit `insecureSkipVerify: true`. Post-2026-05-16, the values.yaml default is a real TLS verify against the chart's CA (caFile + serverName matching the existingSecret mount path the chart's Prometheus integration produces). -Operators with `monitoring.serviceMonitor.enabled: true` MUST set one of: +The new default works out of the box for the canonical install (the chart's `existingSecret` or cert-manager-emitted Secret mounted at `/etc/prometheus/secrets/certctl-ca/`): ```yaml -# A. Real TLS verify against the chart's CA (production-shaped). +# Default in values.yaml (no operator action required for the +# canonical install path). monitoring: serviceMonitor: enabled: true tlsConfig: caFile: /etc/prometheus/secrets/certctl-ca/ca.crt serverName: certctl-server +``` -# B. Demo / dev-cluster — operator-acknowledged opt-back to pre-flip default. +Operators whose Prometheus pod mounts the CA bundle at a different path override `caFile`: + +```yaml +monitoring: + serviceMonitor: + enabled: true + tlsConfig: + caFile: /path/to/your/ca.crt + serverName: your-cert-CN +``` + +Operators who genuinely need `insecureSkipVerify` (demo / dev clusters) must opt in **explicitly** — blanking the `tlsConfig` block trips the chart's `{{ fail }}` guard at render time: + +```yaml monitoring: serviceMonitor: enabled: true @@ -117,7 +132,7 @@ monitoring: insecureSkipVerify: true ``` -Operators with `monitoring.serviceMonitor.enabled: false` (the chart default) need no action — the template short-circuits before the `tlsConfig` block. +There is no way to inherit the pre-2026-05-16 implicit-skipVerify behavior silently. Operators with `monitoring.serviceMonitor.enabled: false` (the chart default) need no action — the template short-circuits before the `tlsConfig` block. ## Configuration reference