mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 15:01:32 +00:00

Files

T

shankar0123 9155ec9174 fix(helm): DEPL-004 follow-up — default tlsConfig to real verify; fix ill-formed required-nil

Sprint 6 ACQ DEPL-004 closure follow-up. CI run on commit 58a15e0
caught two issues:

1. The fail-closed guard in templates/servicemonitor.yaml used
   `{{ required "msg" nil }}`, which is wrong Helm syntax — the
   bareword `nil` isn't valid in Go templates and Helm interprets
   it as no value, hitting "wrong number of args for required:
   want 2 got 0". The B3-helm-chart-coherence ci-guard's
   production-hardening render
   (`--set monitoring.serviceMonitor.enabled=true` without
   explicit tlsConfig) failed with this error AND with the
   downstream "missing kind: ServiceMonitor / PodDisruptionBudget /
   NetworkPolicy" cascades (the entire render aborted before
   producing the matrix).

2. The original DEPL-004 framing — "operators MUST explicitly
   choose tlsConfig or you get a chart-render error" — was the
   right intent but the wrong default. The chart's existingSecret
   integration mounts the CA bundle at a canonical path
   (/etc/prometheus/secrets/certctl-ca/ca.crt); defaulting to that
   path closes the implicit-skipVerify gap without forcing every
   operator to repeat the same boilerplate.

Fixes
=====

deploy/helm/certctl/values.yaml — flips
monitoring.serviceMonitor.tlsConfig from commented-out (which fell
through to implicit insecureSkipVerify: true) to a real verify
default:

  tlsConfig:
    caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
    serverName: certctl-server

Operators with a different CA mount path override caFile;
operators who genuinely want skipVerify back must set
`{ insecureSkipVerify: true }` explicitly. Operators who blank
tlsConfig entirely (`tlsConfig: null` or `tlsConfig: {}`) still
trip the fail-closed guard.

deploy/helm/certctl/templates/servicemonitor.yaml — replaces
`required "msg" nil` with `fail "msg"`. The `fail` builtin is
the correct Helm pattern for an unconditional render-time error;
`required` is for "this value MUST be non-empty" which is the
wrong semantic here (we want to fail when the operator went OUT OF
THEIR WAY to blank the default). Failure message updated to
reflect the new default + the operator-action recipes.

docs/operator/helm-deployment.md — rewrites the
"2026-05-16 — ServiceMonitor TLS default flipped" subsection to
match the new default-on-real-verify semantics. The three operator
recipes (default install / different CA mount / explicit
skipVerify) are documented; the explicit "there is no way to
inherit pre-2026-05-16 implicit-skipVerify behavior silently"
guarantee is preserved.

Verified locally: python3 YAML parse on values.yaml clean; the
helm-templates-lint and B3-helm-chart-coherence ci-guards require
helm itself which isn't in the sandbox — both should pass on the
CI re-run.

2026-05-16 22:09:42 +00:00

6.4 KiB

Raw Blame History

Helm Deployment

Last reviewed: 2026-05-05

Operator runbook for deploying certctl on Kubernetes via the bundled Helm chart at deploy/helm/certctl/.

Prereqs

Kubernetes cluster, v1.27+
kubectl configured and authenticated
helm v3.13+
Storage class for the PostgreSQL StatefulSet PVC
TLS cert source: either an operator-supplied kubernetes.io/tls Secret OR a cert-manager ClusterIssuer / Issuer. The chart refuses to render without one. See tls.md for the four cert provisioning patterns.

Install

helm install certctl deploy/helm/certctl/ \
  --namespace certctl \
  --create-namespace \
  --set server.apiKey=$(openssl rand -hex 32) \
  --set postgres.password=$(openssl rand -hex 32) \
  --set server.tls.existingSecret=certctl-server-tls

server.apiKey and postgres.password should be high-entropy values. The example above generates them inline; production deployments use a secrets manager (Vault, External Secrets Operator, AWS Secrets Manager) instead.

What you get

Server Deployment with a configurable replica count (default 1; HA needs sticky sessions on the ACME server's nonce path)
PostgreSQL StatefulSet with PVC-backed persistence
Agent DaemonSet with one agent per node (configurable via agent.daemonset.enabled=false if you don't want the in-cluster agent)
Health probes (/health liveness + /ready readiness)
Security contexts: non-root, read-only root filesystem
Optional Ingress (off by default; opt in via ingress.enabled=true)

Cert source patterns

Pattern 1 — operator-supplied Secret (recommended for non-cert-manager shops)

kubectl create secret tls certctl-server-tls \
  --cert=server.crt --key=server.key \
  --namespace certctl

helm install certctl deploy/helm/certctl/ \
  --namespace certctl \
  --set server.tls.existingSecret=certctl-server-tls

Pattern 2 — cert-manager Certificate CR (recommended for cert-manager shops)

helm install certctl deploy/helm/certctl/ \
  --namespace certctl \
  --set server.tls.certManager.enabled=true \
  --set server.tls.certManager.issuerRef.name=my-cluster-issuer \
  --set server.tls.certManager.issuerRef.kind=ClusterIssuer

Refuses to render without one of the above

helm install certctl deploy/helm/certctl/ --namespace certctl
# Error: server.tls.existingSecret OR server.tls.certManager.enabled must be set

The render-time guard catches the missing config at helm install time, not at pod-crash-loop time.

Verify the install

kubectl wait --for=condition=Ready --timeout=3m \
  -n certctl pod -l app.kubernetes.io/name=certctl-server

kubectl port-forward -n certctl svc/certctl-server 8443:8443 &

# Bundle the TLS root from the Secret to verify
kubectl get secret -n certctl certctl-server-tls -o jsonpath='{.data.ca\.crt}' \
  | base64 -d > /tmp/certctl-ca.crt
curl --cacert /tmp/certctl-ca.crt https://localhost:8443/health
# {"status":"healthy"}

If the Secret has no ca.crt key (operator-supplied Secrets often don't), use tls.crt as the bundle. For a self-signed cert the two files are identical; for a chained cert distribute the root CA bundle separately via ConfigMap.

Upgrade

helm upgrade certctl deploy/helm/certctl/ \
  --namespace certctl \
  --reuse-values

Postgres state survives the upgrade (the PVC is retained). The server / agent images bump per the chart's image.tag. See docs/archive/upgrades/ for version-specific upgrade guidance.

2026-05-16 — ServiceMonitor TLS default flipped (DEPL-004)

Acquisition-audit DEPL-004 closure. Pre-2026-05-16, monitoring.serviceMonitor.tlsConfig was empty by default and the chart template fell through to an implicit insecureSkipVerify: true. Post-2026-05-16, the values.yaml default is a real TLS verify against the chart's CA (caFile + serverName matching the existingSecret mount path the chart's Prometheus integration produces).

The new default works out of the box for the canonical install (the chart's existingSecret or cert-manager-emitted Secret mounted at /etc/prometheus/secrets/certctl-ca/):

# Default in values.yaml (no operator action required for the
# canonical install path).
monitoring:
  serviceMonitor:
    enabled: true
    tlsConfig:
      caFile: /etc/prometheus/secrets/certctl-ca/ca.crt
      serverName: certctl-server

Operators whose Prometheus pod mounts the CA bundle at a different path override caFile:

monitoring:
  serviceMonitor:
    enabled: true
    tlsConfig:
      caFile: /path/to/your/ca.crt
      serverName: your-cert-CN

Operators who genuinely need insecureSkipVerify (demo / dev clusters) must opt in explicitly — blanking the tlsConfig block trips the chart's {{ fail }} guard at render time:

monitoring:
  serviceMonitor:
    enabled: true
    tlsConfig:
      insecureSkipVerify: true

There is no way to inherit the pre-2026-05-16 implicit-skipVerify behavior silently. Operators with monitoring.serviceMonitor.enabled: false (the chart default) need no action — the template short-circuits before the tlsConfig block.

Configuration reference

Every value is documented at deploy/helm/certctl/values.yaml. Common tweaks:

server.replicaCount — replica count (default 1)
server.resources.{requests,limits} — pod resource bounds
agent.daemonset.enabled — toggle the in-cluster agent (default true)
postgres.storageSize — PVC size (default 10Gi)
ingress.enabled + ingress.host — opt into Ingress

Troubleshooting

Pod crash-loops with TLS error. Cert + key in the Secret don't pair. Verify with openssl x509 -modulus -in server.crt -noout | md5 against openssl rsa -modulus -in server.key -noout | md5 — outputs must match.

Agent DaemonSet pods can't reach the server. Service DNS / NetworkPolicy issue. Confirm the agent's CERTCTL_SERVER_URL env points at the in-cluster service name (https://certctl-server.certctl.svc.cluster.local:8443).

Postgres won't start. PVC permissions. Check kubectl describe pvc -n certctl certctl-postgres and confirm the storage class supports fsGroup.

tls.md — cert provisioning patterns + SIGHUP rotation
security.md — production security posture
runbooks/disaster-recovery.md — Postgres restore + recovery procedures
docs/archive/upgrades/ — version-specific upgrade procedures

6.4 KiB Raw Blame History