Files
certctl/deploy/helm/certctl/README.md
T
Shankar 54a41603de fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.

We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.

The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.

Files changed:

Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
  AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
  literal 'jwt' through a dedicated multi-line diagnostic naming the
  authenticating-gateway pattern, then cross-checks against
  ValidAuthTypes(). Secret-required branch simplified to api-key-only.
  Field comment on AuthConfig.Type rewritten to drop jwt and point at
  the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
  references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
  config.Load() — exits 1 on any unsupported auth-type that bypassed the
  validator. Auth-disabled startup log explicitly names the
  authenticating-gateway pattern.

Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
  (two table rows pinning the dedicated G-1 error fires regardless of
  whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
  guard against future re-introduction),
  TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
  TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
  hit the generic invalid-auth-type error). Removed the prior
  TestValidate_JWTAuth_MissingSecret happy-path since its premise is
  inverted post-G-1.
- internal/api/handler/health_test.go: removed
  TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
  into the regression suite). Pre-existing _APIKey test continues to
  cover the api-key happy path.

Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
  inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
  jwt and point at the gateway pattern; secret-required conditional
  simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
  JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
  section explaining the design rationale and listing oauth2-proxy /
  Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
  forward_auth / Apache mod_auth_openidc / nginx auth_request as the
  standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
  with preconditions, what-changes, both recovery paths, complete
  docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
  ext_authz patterns, rollback posture.

Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
  helper mirroring the existing certctl.tls.required pattern. Fails
  template render on any server.auth.type outside {api-key, none} with
  a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
  server-configmap.yaml, server-secret.yaml: invoke the helper at the
  top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
  G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
  the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
  gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
  walkthrough.

Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
  Added / Changed sections; explicit pointer at
  docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.

Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
  guard (G-1)' step. Scoped patterns catch the actual regression shapes
  (map literal, slice literal, switch case, OpenAPI enum, env-file
  default, AuthType('jwt') cast). Comments and the dedicated rejection
  branch are intentionally exempt; connector-package JWT references
  (Google OAuth2 / step-ca) are exempt as out-of-scope external
  protocols. Verified locally: the guard passes on the actual tree and
  fires on all 4 synthetic regression patterns.

Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
  account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
  one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
  external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.

Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
  diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
  regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
  with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
  longer accepted (G-1 silent auth downgrade): no JWT middleware ships
  with certctl. To use JWT/OIDC, run an authenticating gateway
  (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
  front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
  See docs/architecture.md "Authenticating-gateway pattern" and
  docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'

config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.

Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
      §2 P1 cluster, cat-g-jwt_silent_auth_downgrade
      Audit recommendation followed verbatim: 'Remove jwt from
      validAuthTypes until middleware ships'.
2026-04-25 00:22:23 +00:00

8.0 KiB

certctl Helm Chart

Production-ready Helm chart for deploying certctl on Kubernetes. Wires up the certctl server (Deployment), PostgreSQL (StatefulSet with PVC), and the agent (DaemonSet — one per node) on a private cluster, with health probes, security contexts, and optional Ingress.

Quick install

helm install certctl deploy/helm/certctl/ \
  --create-namespace --namespace certctl \
  --set server.auth.apiKey="$(openssl rand -base64 32)" \
  --set postgresql.auth.password="$(openssl rand -base64 24)"

This brings up:

  • <release>-server Deployment (HTTPS-only on port 8443; TLS 1.3)
  • <release>-postgres StatefulSet (PostgreSQL 16-alpine, 1 replica, 10Gi PVC by default)
  • <release>-agent DaemonSet (polls server, generates ECDSA P-256 keys locally)
  • Service objects, optional Ingress, and ServiceAccount with RBAC

See values.yaml for the full configuration surface — issuer settings, target connectors, scheduler intervals, notifier credentials, and resource requests/limits all live there.

Operational notes

Postgres password rotation — read this before changing postgresql.auth.password

The trap. postgresql.auth.password is bound to pg_authid exactly once — when the StatefulSet's PVC is provisioned and initdb runs. The official postgres:16-alpine image only runs initdb when /var/lib/postgresql/data is empty, so on every subsequent rollout the POSTGRES_PASSWORD env var is read into the container but ignored by postgres itself. The certctl-server container also picks up the new value (via the database URL helper template), so the two halves diverge: server presents the new password, postgres still expects the old one.

Symptom. The certctl-server pod's startup log shows:

failed to ping database: postgres rejected the configured credentials
(SQLSTATE 28P01 — invalid_password). If you recently rotated POSTGRES_PASSWORD ...

That diagnostic is emitted by internal/repository/postgres/db.go::wrapPingError — it points operators at the two remediation paths below.

Remediation, non-destructive (preferred for any environment with real data):

# 1. Rotate the password in postgres directly
kubectl -n certctl exec -it <release>-postgres-0 -- \
  psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new-password>';"

# 2. Update the secret / Helm values to the same value
helm upgrade <release> deploy/helm/certctl/ \
  --reuse-values \
  --set postgresql.auth.password='<new-password>'

# 3. Bounce the certctl-server pod so it re-reads the secret
kubectl -n certctl rollout restart deployment/<release>-server

Remediation, destructive (DESTROYS ALL CERTCTL DATA — only acceptable on dev/demo clusters):

helm uninstall <release> -n certctl
kubectl -n certctl delete pvc -l \
  app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres
helm install <release> deploy/helm/certctl/ \
  --namespace certctl \
  --set postgresql.auth.password='<new-password>'

The PVC re-creates empty, initdb runs on first boot of the new postgres pod, and pg_authid is seeded with the new password.

Why we don't fix this in the chart. The env-vs-pg_authid divergence is intrinsic to how the upstream postgres image bootstraps — initdb is run-once-per-empty-data-dir, and there is no upstream-supported way to make subsequent boots re-seed pg_authid from POSTGRES_PASSWORD. The ergonomic answer is the runtime diagnostic plus this operational note.

Cross-references. Same root cause is documented for the docker-compose path in docs/quickstart.md (Warning callout after the cp .env.example .env block) and in deploy/ENVIRONMENTS.md (Stateful volume — first-boot password binding section). The runtime diagnostic itself lives in internal/repository/postgres/db.go::wrapPingError with regression coverage in internal/repository/postgres/db_test.go.

Server API key rotation

Unlike the postgres password, server.auth.apiKey accepts a comma-separated list, so zero-downtime rotation is straightforward:

# 1. Add the new key alongside the old
helm upgrade <release> deploy/helm/certctl/ \
  --reuse-values \
  --set server.auth.apiKey='new-key,old-key'

# 2. Roll your agents / clients over to the new key

# 3. Remove the old key
helm upgrade <release> deploy/helm/certctl/ \
  --reuse-values \
  --set server.auth.apiKey='new-key'

JWT / OIDC via authenticating gateway

certctl's in-process auth surface is intentionally narrow: server.auth.type=api-key for production deployments and server.auth.type=none for development. There is no in-process JWT, OIDC, mTLS, or SAML middleware. (server.auth.type=jwt was accepted pre-G-1 but silently routed every request through the api-key bearer middleware — silent auth downgrade. The chart now fails at helm install/helm upgrade template time via the certctl.validateAuthType helper if you set it. See ../../../docs/upgrade-to-v2-jwt-removal.md if you previously had this in your values.)

For deployments that need JWT/OIDC, the canonical Kubernetes-flavored shape is to put oauth2-proxy in front of the certctl Service, attach an authenticating Ingress middleware, and run certctl with server.auth.type=none:

# 1. Install oauth2-proxy (or any OIDC-terminating sidecar) in the same namespace
helm install oauth2-proxy oauth2-proxy/oauth2-proxy \
  --namespace certctl \
  --set config.clientID="$OIDC_CLIENT_ID" \
  --set config.clientSecret="$OIDC_CLIENT_SECRET" \
  --set config.cookieSecret="$(openssl rand -base64 32)" \
  --set config.configFile='|
    provider = "oidc"
    oidc_issuer_url = "https://your-issuer/"
    upstreams = ["http://<release>-server.certctl.svc.cluster.local:8443"]
    pass_authorization_header = true
    set_authorization_header = true
    email_domains = ["*"]
  '

# 2. Install certctl with type=none (gateway terminates auth)
helm install certctl deploy/helm/certctl/ \
  --namespace certctl \
  --set server.auth.type=none \
  --set postgresql.auth.password="$(openssl rand -base64 24)"

# 3. Attach an Ingress that routes through oauth2-proxy
#    (Traefik ForwardAuth, nginx auth_request, Envoy ext_authz, etc.)

Same root pattern works with Pomerium, Authelia, Caddy forward_auth, Apache mod_auth_openidc, or any service-mesh ext_authz. See ../../../docs/architecture.md "Authenticating-gateway pattern" for the full design rationale and ../../../docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.

TLS certificate sourcing

By default the chart provisions a self-signed cert via the same init-container pattern as the docker-compose deploy. For production, supply an operator-managed Secret (cert-manager, internal CA, etc.) — see docs/tls.md for the full provisioning matrix and docs/upgrade-to-tls.md for upgrade-from-HTTP procedures.

Disabling embedded postgres

If you have an existing PostgreSQL cluster, disable the embedded one and point at it directly:

helm install certctl deploy/helm/certctl/ \
  --set postgresql.enabled=false \
  --set server.databaseUrl='postgres://certctl:<pw>@my-pg-host:5432/certctl?sslmode=require'

The volume-trap section above does not apply to this configuration — your postgres operator (or cloud DB) handles password rotation, and you control pg_authid directly.

Uninstall

helm uninstall <release> -n certctl
# Optional — also delete the postgres PVC (DESTROYS DATA):
kubectl -n certctl delete pvc -l \
  app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres

By default helm uninstall retains the StatefulSet's PVCs, so reinstalling with the same release name preserves the database. If you've changed postgresql.auth.password in your values between uninstall and reinstall, you'll hit the trap on the reinstall — apply the non-destructive remediation above, or also delete the PVC.