mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:51:30 +00:00
fix(helm): DEPL-003 + DEPL-006 — render viaHook env, sessionAffinity, HA backend default
Sprint 3 unified-master-audit closure — two Helm-chart correctness
defects with overlapping CI-guard surface.
DEPL-003 — CERTCTL_MIGRATIONS_VIA_HOOK never rendered:
Pre-fix the env var was documented in values.yaml and the
migration-job.yaml comment but never made it into the server
Deployment env block. With migrations.viaHook=true the operator's
intent is 'the pre-install/pre-upgrade Helm Job owns migrations,'
but the server pods, missing the env, ran their own
cmd/server/migrations.go::runBootMigrations alongside the hook
Job, racing on the schema lock.
Fix: render '- name: CERTCTL_MIGRATIONS_VIA_HOOK / value: true'
in server-deployment.yaml under '{{- if .Values.migrations.viaHook }}'.
DEPL-006 — HA example missing rate-limit backend + sessionAffinity:
values-prod-ha.yaml sets replicas:3 but inherited the chart-wide
default rateLimiting.backend=memory (which gives each pod its
own bucket map, effectively tripling the cap on a 3-replica fleet)
AND the chart had no render path for server.service.sessionAffinity
even though docs/operator/runbooks/ha.md instructed operators to
set it for ClientIP-routed sticky sessions.
Fix:
- server-service.yaml gains a conditional sessionAffinity +
sessionAffinityConfig.clientIP.timeoutSeconds render.
- values.yaml grows the matching schema entries (default empty
so single-replica deploys are unaffected).
- values-prod-ha.yaml flips rateLimiting.backend=postgres and
service.sessionAffinity=ClientIP.
- NOTES.txt emits a loud warning when replicas>1 + either toggle
is still in the default state, so the misconfig surfaces at
helm install time instead of in a confused login-flow bug
report a week later.
CI:
scripts/ci-guards/B3-helm-chart-coherence.sh gains 'Check 7'
(DEPL-003 viaHook env render — both positive and negative —
the inverse case catches future drift that drops the {{- if }}
guard) and 'Check 8' (DEPL-006 sessionAffinity render). Both
helm-template through to assert the rendered YAML carries the
expected text.
Closes DEPL-003, DEPL-006.
This commit is contained in:
@@ -72,3 +72,28 @@ IMPORTANT NOTES FOR PRODUCTION:
|
||||
- All containers run as non-root
|
||||
- Implement network policies to restrict traffic between components
|
||||
- Consider pod security policies or security standards for your cluster
|
||||
{{- /*
|
||||
DEPL-006 closure (Sprint 3, 2026-05-16). Loud notice when the
|
||||
operator runs a multi-replica deploy without crossing the two
|
||||
required HA toggles. Per-pod rate-limit buckets and round-robin
|
||||
load balancing both silently break correctness above replicas:1.
|
||||
*/}}
|
||||
{{- if gt (int .Values.server.replicas) 1 }}
|
||||
|
||||
⚠️ HA MISCONFIGURATION WARNINGS (replicas={{ .Values.server.replicas }}):
|
||||
{{- $backend := .Values.server.rateLimiting.backend | default "memory" }}
|
||||
{{- if eq $backend "memory" }}
|
||||
- server.rateLimiting.backend = "memory" with replicas > 1 gives each
|
||||
pod its own bucket map, so the configured cap is effectively
|
||||
multiplied by the replica count. Set
|
||||
`--set server.rateLimiting.backend=postgres` (see DEPL-006 /
|
||||
docs/operator/runbooks/ha.md).
|
||||
{{- end }}
|
||||
{{- if not .Values.server.service.sessionAffinity }}
|
||||
- server.service.sessionAffinity is empty. Round-robin Service load
|
||||
balancing routes login → /api/v1/auth/login → /api/v1/auth/csrf
|
||||
across different pods, breaking the CSRF token + session cookie
|
||||
handshake. Set
|
||||
`--set server.service.sessionAffinity=ClientIP`.
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
@@ -51,6 +51,20 @@ spec:
|
||||
containerPort: {{ .Values.server.port }}
|
||||
protocol: TCP
|
||||
env:
|
||||
# DEPL-003 closure (Sprint 3, 2026-05-16). Pre-fix the
|
||||
# CERTCTL_MIGRATIONS_VIA_HOOK env var was documented in
|
||||
# values.yaml (L797-810) and migration-job.yaml comments
|
||||
# but was never rendered into the server Deployment env
|
||||
# block. With migrations.viaHook=true the operator's
|
||||
# intent is "the pre-install/pre-upgrade Helm Job owns
|
||||
# migrations" — but the server pods, missing the env,
|
||||
# ran their own boot-time RunMigrations alongside the
|
||||
# hook Job, racing on the schema lock. cmd/server/migrations.go
|
||||
# only short-circuits when this env is "true" (line 144).
|
||||
{{- if .Values.migrations.viaHook }}
|
||||
- name: CERTCTL_MIGRATIONS_VIA_HOOK
|
||||
value: "true"
|
||||
{{- end }}
|
||||
- name: CERTCTL_SERVER_HOST
|
||||
value: "0.0.0.0"
|
||||
- name: CERTCTL_SERVER_PORT
|
||||
|
||||
@@ -11,6 +11,23 @@ metadata:
|
||||
{{- end }}
|
||||
spec:
|
||||
type: {{ .Values.server.service.type }}
|
||||
{{- /*
|
||||
DEPL-006 closure (Sprint 3, 2026-05-16). Render the optional
|
||||
sessionAffinity field. docs/operator/runbooks/ha.md instructs
|
||||
operators to set sessionAffinity: ClientIP for replicas > 1 so
|
||||
login + CSRF flows stay on the same pod; pre-fix the chart did
|
||||
not actually pass the value through. sessionAffinityConfig
|
||||
clientIP.timeoutSeconds renders only when set, otherwise
|
||||
Kubernetes applies its default (10800s / 3h).
|
||||
*/}}
|
||||
{{- if .Values.server.service.sessionAffinity }}
|
||||
sessionAffinity: {{ .Values.server.service.sessionAffinity }}
|
||||
{{- with .Values.server.service.sessionAffinityTimeoutSeconds }}
|
||||
sessionAffinityConfig:
|
||||
clientIP:
|
||||
timeoutSeconds: {{ . }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
ports:
|
||||
- port: {{ .Values.server.service.port }}
|
||||
targetPort: https
|
||||
|
||||
@@ -160,6 +160,17 @@ server:
|
||||
type: ClusterIP
|
||||
port: 8443
|
||||
annotations: {}
|
||||
# DEPL-006 closure (Sprint 3, 2026-05-16). Optional sticky-session
|
||||
# routing. REQUIRED when server.replicas > 1 so login + CSRF token
|
||||
# rows stay on the same pod for the duration of a session — the
|
||||
# default round-robin load balancing breaks those flows. Set to
|
||||
# "ClientIP" for production HA (see deploy/helm/examples/values-prod-ha.yaml).
|
||||
# Leave empty for single-replica deploys.
|
||||
sessionAffinity: ""
|
||||
# When sessionAffinity is set, timeout window (in seconds) the
|
||||
# Service maps a source IP to the same pod. Default null →
|
||||
# Kubernetes applies its built-in default (10800s / 3h).
|
||||
sessionAffinityTimeoutSeconds: null
|
||||
|
||||
# Authentication configuration.
|
||||
# Valid types: "api-key" (production) or "none" (demo only — disables
|
||||
|
||||
@@ -36,6 +36,14 @@ server:
|
||||
|
||||
service:
|
||||
type: ClusterIP
|
||||
# DEPL-006 closure (Sprint 3, 2026-05-16): with replicas:3, the
|
||||
# default round-robin Service load balancing breaks login/CSRF
|
||||
# flows because the session cookie + the CSRF token row land on
|
||||
# different pods between requests. sessionAffinity: ClientIP
|
||||
# routes every connection from a given source IP to the same
|
||||
# pod for the configured timeout window. docs/operator/runbooks/ha.md
|
||||
# documents this; pre-fix the chart did not actually render it.
|
||||
sessionAffinity: ClientIP
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8443"
|
||||
@@ -53,6 +61,14 @@ server:
|
||||
rateLimiting:
|
||||
rps: 500
|
||||
burst: 1000
|
||||
# DEPL-006 closure (Sprint 3, 2026-05-16): replicas > 1 REQUIRES
|
||||
# the postgres backend so per-key buckets are cross-replica-
|
||||
# consistent. The default 'memory' backend gives each pod its
|
||||
# own bucket map, so a 3-replica fleet effectively triples the
|
||||
# configured cap (a client churning across pods bypasses the
|
||||
# limit). See deploy/helm/certctl/values.yaml L217-226 for the
|
||||
# canonical comment.
|
||||
backend: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
|
||||
Reference in New Issue
Block a user