mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-08 09:20:02 +00:00
54a41603de
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
149 lines
8.0 KiB
Markdown
149 lines
8.0 KiB
Markdown
# certctl Helm Chart
|
|
|
|
Production-ready Helm chart for deploying [certctl](https://github.com/shankar0123/certctl) on Kubernetes. Wires up the certctl server (Deployment), PostgreSQL (StatefulSet with PVC), and the agent (DaemonSet — one per node) on a private cluster, with health probes, security contexts, and optional Ingress.
|
|
|
|
## Quick install
|
|
|
|
```bash
|
|
helm install certctl deploy/helm/certctl/ \
|
|
--create-namespace --namespace certctl \
|
|
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
|
--set postgresql.auth.password="$(openssl rand -base64 24)"
|
|
```
|
|
|
|
This brings up:
|
|
|
|
- `<release>-server` Deployment (HTTPS-only on port 8443; TLS 1.3)
|
|
- `<release>-postgres` StatefulSet (PostgreSQL 16-alpine, 1 replica, 10Gi PVC by default)
|
|
- `<release>-agent` DaemonSet (polls server, generates ECDSA P-256 keys locally)
|
|
- Service objects, optional Ingress, and ServiceAccount with RBAC
|
|
|
|
See [`values.yaml`](values.yaml) for the full configuration surface — issuer settings, target connectors, scheduler intervals, notifier credentials, and resource requests/limits all live there.
|
|
|
|
## Operational notes
|
|
|
|
### Postgres password rotation — read this before changing `postgresql.auth.password`
|
|
|
|
**The trap.** `postgresql.auth.password` is bound to `pg_authid` exactly once — when the StatefulSet's PVC is provisioned and `initdb` runs. The official `postgres:16-alpine` image only runs `initdb` when `/var/lib/postgresql/data` is empty, so on every subsequent rollout the `POSTGRES_PASSWORD` env var is read into the container but **ignored** by postgres itself. The certctl-server container also picks up the new value (via the database URL helper template), so the two halves diverge: server presents the new password, postgres still expects the old one.
|
|
|
|
**Symptom.** The certctl-server pod's startup log shows:
|
|
|
|
```
|
|
failed to ping database: postgres rejected the configured credentials
|
|
(SQLSTATE 28P01 — invalid_password). If you recently rotated POSTGRES_PASSWORD ...
|
|
```
|
|
|
|
That diagnostic is emitted by `internal/repository/postgres/db.go::wrapPingError` — it points operators at the two remediation paths below.
|
|
|
|
**Remediation, non-destructive (preferred for any environment with real data):**
|
|
|
|
```bash
|
|
# 1. Rotate the password in postgres directly
|
|
kubectl -n certctl exec -it <release>-postgres-0 -- \
|
|
psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new-password>';"
|
|
|
|
# 2. Update the secret / Helm values to the same value
|
|
helm upgrade <release> deploy/helm/certctl/ \
|
|
--reuse-values \
|
|
--set postgresql.auth.password='<new-password>'
|
|
|
|
# 3. Bounce the certctl-server pod so it re-reads the secret
|
|
kubectl -n certctl rollout restart deployment/<release>-server
|
|
```
|
|
|
|
**Remediation, destructive (DESTROYS ALL CERTCTL DATA — only acceptable on dev/demo clusters):**
|
|
|
|
```bash
|
|
helm uninstall <release> -n certctl
|
|
kubectl -n certctl delete pvc -l \
|
|
app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres
|
|
helm install <release> deploy/helm/certctl/ \
|
|
--namespace certctl \
|
|
--set postgresql.auth.password='<new-password>'
|
|
```
|
|
|
|
The PVC re-creates empty, `initdb` runs on first boot of the new postgres pod, and `pg_authid` is seeded with the new password.
|
|
|
|
**Why we don't fix this in the chart.** The env-vs-`pg_authid` divergence is intrinsic to how the upstream `postgres` image bootstraps — `initdb` is run-once-per-empty-data-dir, and there is no upstream-supported way to make subsequent boots re-seed `pg_authid` from `POSTGRES_PASSWORD`. The ergonomic answer is the runtime diagnostic plus this operational note.
|
|
|
|
**Cross-references.** Same root cause is documented for the docker-compose path in [`docs/quickstart.md`](../../../docs/quickstart.md) (Warning callout after the `cp .env.example .env` block) and in [`deploy/ENVIRONMENTS.md`](../../ENVIRONMENTS.md) (Stateful volume — first-boot password binding section). The runtime diagnostic itself lives in `internal/repository/postgres/db.go::wrapPingError` with regression coverage in `internal/repository/postgres/db_test.go`.
|
|
|
|
### Server API key rotation
|
|
|
|
Unlike the postgres password, `server.auth.apiKey` accepts a comma-separated list, so zero-downtime rotation is straightforward:
|
|
|
|
```bash
|
|
# 1. Add the new key alongside the old
|
|
helm upgrade <release> deploy/helm/certctl/ \
|
|
--reuse-values \
|
|
--set server.auth.apiKey='new-key,old-key'
|
|
|
|
# 2. Roll your agents / clients over to the new key
|
|
|
|
# 3. Remove the old key
|
|
helm upgrade <release> deploy/helm/certctl/ \
|
|
--reuse-values \
|
|
--set server.auth.apiKey='new-key'
|
|
```
|
|
|
|
### JWT / OIDC via authenticating gateway
|
|
|
|
certctl's in-process auth surface is intentionally narrow: `server.auth.type=api-key` for production deployments and `server.auth.type=none` for development. There is no in-process JWT, OIDC, mTLS, or SAML middleware. (`server.auth.type=jwt` was accepted pre-G-1 but silently routed every request through the api-key bearer middleware — silent auth downgrade. The chart now fails at `helm install`/`helm upgrade` template time via the `certctl.validateAuthType` helper if you set it. See [`../../../docs/upgrade-to-v2-jwt-removal.md`](../../../docs/upgrade-to-v2-jwt-removal.md) if you previously had this in your values.)
|
|
|
|
For deployments that need JWT/OIDC, the canonical Kubernetes-flavored shape is to put oauth2-proxy in front of the certctl Service, attach an authenticating Ingress middleware, and run certctl with `server.auth.type=none`:
|
|
|
|
```bash
|
|
# 1. Install oauth2-proxy (or any OIDC-terminating sidecar) in the same namespace
|
|
helm install oauth2-proxy oauth2-proxy/oauth2-proxy \
|
|
--namespace certctl \
|
|
--set config.clientID="$OIDC_CLIENT_ID" \
|
|
--set config.clientSecret="$OIDC_CLIENT_SECRET" \
|
|
--set config.cookieSecret="$(openssl rand -base64 32)" \
|
|
--set config.configFile='|
|
|
provider = "oidc"
|
|
oidc_issuer_url = "https://your-issuer/"
|
|
upstreams = ["http://<release>-server.certctl.svc.cluster.local:8443"]
|
|
pass_authorization_header = true
|
|
set_authorization_header = true
|
|
email_domains = ["*"]
|
|
'
|
|
|
|
# 2. Install certctl with type=none (gateway terminates auth)
|
|
helm install certctl deploy/helm/certctl/ \
|
|
--namespace certctl \
|
|
--set server.auth.type=none \
|
|
--set postgresql.auth.password="$(openssl rand -base64 24)"
|
|
|
|
# 3. Attach an Ingress that routes through oauth2-proxy
|
|
# (Traefik ForwardAuth, nginx auth_request, Envoy ext_authz, etc.)
|
|
```
|
|
|
|
Same root pattern works with Pomerium, Authelia, Caddy `forward_auth`, Apache `mod_auth_openidc`, or any service-mesh `ext_authz`. See [`../../../docs/architecture.md`](../../../docs/architecture.md) "Authenticating-gateway pattern" for the full design rationale and [`../../../docs/upgrade-to-v2-jwt-removal.md`](../../../docs/upgrade-to-v2-jwt-removal.md) for the migration walkthrough.
|
|
|
|
### TLS certificate sourcing
|
|
|
|
By default the chart provisions a self-signed cert via the same init-container pattern as the docker-compose deploy. For production, supply an operator-managed Secret (cert-manager, internal CA, etc.) — see [`docs/tls.md`](../../../docs/tls.md) for the full provisioning matrix and [`docs/upgrade-to-tls.md`](../../../docs/upgrade-to-tls.md) for upgrade-from-HTTP procedures.
|
|
|
|
## Disabling embedded postgres
|
|
|
|
If you have an existing PostgreSQL cluster, disable the embedded one and point at it directly:
|
|
|
|
```bash
|
|
helm install certctl deploy/helm/certctl/ \
|
|
--set postgresql.enabled=false \
|
|
--set server.databaseUrl='postgres://certctl:<pw>@my-pg-host:5432/certctl?sslmode=require'
|
|
```
|
|
|
|
The volume-trap section above does **not** apply to this configuration — your postgres operator (or cloud DB) handles password rotation, and you control `pg_authid` directly.
|
|
|
|
## Uninstall
|
|
|
|
```bash
|
|
helm uninstall <release> -n certctl
|
|
# Optional — also delete the postgres PVC (DESTROYS DATA):
|
|
kubectl -n certctl delete pvc -l \
|
|
app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres
|
|
```
|
|
|
|
By default `helm uninstall` retains the StatefulSet's PVCs, so reinstalling with the same release name preserves the database. If you've changed `postgresql.auth.password` in your values between uninstall and reinstall, you'll hit the trap on the reinstall — apply the non-destructive remediation above, or also delete the PVC.
|