fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)

The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.

We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.

The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.

Files changed:

Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
  AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
  literal 'jwt' through a dedicated multi-line diagnostic naming the
  authenticating-gateway pattern, then cross-checks against
  ValidAuthTypes(). Secret-required branch simplified to api-key-only.
  Field comment on AuthConfig.Type rewritten to drop jwt and point at
  the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
  references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
  config.Load() — exits 1 on any unsupported auth-type that bypassed the
  validator. Auth-disabled startup log explicitly names the
  authenticating-gateway pattern.

Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
  (two table rows pinning the dedicated G-1 error fires regardless of
  whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
  guard against future re-introduction),
  TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
  TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
  hit the generic invalid-auth-type error). Removed the prior
  TestValidate_JWTAuth_MissingSecret happy-path since its premise is
  inverted post-G-1.
- internal/api/handler/health_test.go: removed
  TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
  into the regression suite). Pre-existing _APIKey test continues to
  cover the api-key happy path.

Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
  inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
  jwt and point at the gateway pattern; secret-required conditional
  simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
  JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
  section explaining the design rationale and listing oauth2-proxy /
  Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
  forward_auth / Apache mod_auth_openidc / nginx auth_request as the
  standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
  with preconditions, what-changes, both recovery paths, complete
  docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
  ext_authz patterns, rollback posture.

Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
  helper mirroring the existing certctl.tls.required pattern. Fails
  template render on any server.auth.type outside {api-key, none} with
  a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
  server-configmap.yaml, server-secret.yaml: invoke the helper at the
  top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
  G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
  the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
  gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
  walkthrough.

Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
  Added / Changed sections; explicit pointer at
  docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.

Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
  guard (G-1)' step. Scoped patterns catch the actual regression shapes
  (map literal, slice literal, switch case, OpenAPI enum, env-file
  default, AuthType('jwt') cast). Comments and the dedicated rejection
  branch are intentionally exempt; connector-package JWT references
  (Google OAuth2 / step-ca) are exempt as out-of-scope external
  protocols. Verified locally: the guard passes on the actual tree and
  fires on all 4 synthetic regression patterns.

Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
  account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
  one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
  external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.

Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
  diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
  regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
  with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
  longer accepted (G-1 silent auth downgrade): no JWT middleware ships
  with certctl. To use JWT/OIDC, run an authenticating gateway
  (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
  front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
  See docs/architecture.md "Authenticating-gateway pattern" and
  docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'

config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.

Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
      §2 P1 cluster, cat-g-jwt_silent_auth_downgrade
      Audit recommendation followed verbatim: 'Remove jwt from
      validAuthTypes until middleware ships'.
This commit is contained in:
shankar0123
2026-04-25 00:22:23 +00:00
parent 3192cd15c5
commit 9c1d446e40
19 changed files with 629 additions and 65 deletions
+149 -15
View File
@@ -458,6 +458,8 @@ func TestValidate_InvalidAuthType(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -477,6 +479,8 @@ func TestValidate_APIKeyAuth_MissingSecret(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -484,25 +488,133 @@ func TestValidate_APIKeyAuth_MissingSecret(t *testing.T) {
}
}
func TestValidate_JWTAuth_MissingSecret(t *testing.T) {
cfg := &Config{
Server: validServerConfig(t),
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
Log: LogConfig{Level: "info", Format: "json"},
Auth: AuthConfig{Type: "jwt", Secret: ""},
Keygen: KeygenConfig{Mode: "agent"},
Scheduler: SchedulerConfig{
RenewalCheckInterval: 1 * time.Hour,
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
},
// TestValidate_JWTAuth_RejectedDedicated locks down the G-1 fix: pre-G-1
// `CERTCTL_AUTH_TYPE=jwt` was accepted by the validator (the bare error
// path was the empty-secret one previously). Post-G-1 the literal "jwt"
// value is rejected with a dedicated diagnostic regardless of whether
// Secret is set, because there is no JWT middleware in the binary —
// operators who need JWT/OIDC must front certctl with an authenticating
// gateway.
//
// Two table rows pin the contract: missing-secret cannot paper over the
// rejection (the dedicated error fires first, before the secret check),
// and a populated secret also cannot paper over it. Both paths must
// hit the dedicated G-1 diagnostic, not the generic "invalid auth
// type" or "auth secret is required".
func TestValidate_JWTAuth_RejectedDedicated(t *testing.T) {
t.Parallel()
cases := []struct {
name string
secret string
}{
{"jwt rejected (no secret)", ""},
{"jwt rejected (with secret — operator can't paper over)", "anything"},
}
if err := cfg.Validate(); err == nil {
t.Error("Validate() should return error when jwt auth has empty secret")
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{
Server: validServerConfig(t),
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
Log: LogConfig{Level: "info", Format: "json"},
Auth: AuthConfig{Type: "jwt", Secret: tc.secret},
Keygen: KeygenConfig{Mode: "agent"},
Scheduler: SchedulerConfig{
RenewalCheckInterval: 1 * time.Hour,
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
if err == nil {
t.Fatal("Validate() returned nil; expected dedicated G-1 rejection")
}
const wantSubstr = "CERTCTL_AUTH_TYPE=jwt is no longer accepted"
if !strings.Contains(err.Error(), wantSubstr) {
t.Errorf("Validate() = %v\nwant substring %q (the dedicated G-1 diagnostic)", err, wantSubstr)
}
})
}
}
// TestValidAuthTypesDoesNotContainJWT is a property-level guard against
// a future PR silently re-introducing "jwt" into the allowed set. If
// someone adds JWT back to ValidAuthTypes(), this test fails immediately
// with a pointer at the audit finding. The matching CI grep guardrail
// in .github/workflows/ci.yml provides a secondary check at build time.
func TestValidAuthTypesDoesNotContainJWT(t *testing.T) {
t.Parallel()
for _, at := range ValidAuthTypes() {
if at == "jwt" {
t.Fatalf("jwt is in ValidAuthTypes — silent auth downgrade regressed (G-1)")
}
}
}
// TestValidAuthTypesIsExactly_APIKey_None pins the current allowed set.
// If a future change adds a new auth type, this test must be updated
// alongside the validator and the helm-chart `validateAuthType` helper —
// keeping all three surfaces in sync.
func TestValidAuthTypesIsExactly_APIKey_None(t *testing.T) {
t.Parallel()
got := ValidAuthTypes()
if len(got) != 2 {
t.Fatalf("ValidAuthTypes() returned %d entries, want 2: %v", len(got), got)
}
want := map[AuthType]bool{AuthTypeAPIKey: true, AuthTypeNone: true}
for _, at := range got {
if !want[at] {
t.Errorf("unexpected auth type in ValidAuthTypes: %q", at)
}
}
}
// TestValidate_GenericInvalidAuthType ensures that values outside the
// allowed set (other than the special-cased "jwt") still surface the
// generic "invalid auth type" error. Pins that the dedicated G-1
// rejection didn't accidentally swallow non-jwt typos.
func TestValidate_GenericInvalidAuthType(t *testing.T) {
t.Parallel()
for _, badType := range []string{"", "garbage", "oidc", "mtls", "API-KEY"} {
t.Run("type="+badType, func(t *testing.T) {
cfg := &Config{
Server: validServerConfig(t),
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
Log: LogConfig{Level: "info", Format: "json"},
Auth: AuthConfig{Type: badType, Secret: "x"},
Keygen: KeygenConfig{Mode: "agent"},
Scheduler: SchedulerConfig{
RenewalCheckInterval: 1 * time.Hour,
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
if err == nil {
t.Fatalf("Validate(type=%q) returned nil; expected invalid-auth-type rejection", badType)
}
if !strings.Contains(err.Error(), "invalid auth type") {
t.Errorf("Validate(type=%q) = %v; want \"invalid auth type\" error", badType, err)
}
if strings.Contains(err.Error(), "G-1 silent auth") {
t.Errorf("Validate(type=%q) = %v; should not hit the dedicated G-1 path for non-jwt values", badType, err)
}
})
}
}
// G-1 (P1): no need to add `TestValidate_NoneAuth_AcceptsEmptySecret` or
// `TestValidate_APIKeyAuth_RequiresSecret` here — the pre-existing tests
// `TestValidate_AuthTypeNone` (above) and `TestValidate_APIKeyAuth_MissingSecret`
// (above) already cover those paths. Documented for the next reader: the
// G-1 fix flipped jwt off but did not disturb either the
// none-bypasses-secret or the api-key-requires-secret behavior.
func TestValidate_InvalidKeygenMode(t *testing.T) {
cfg := &Config{
Server: validServerConfig(t),
@@ -515,6 +627,8 @@ func TestValidate_InvalidKeygenMode(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -544,6 +658,8 @@ func TestValidate_InvalidPort(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -574,6 +690,8 @@ func TestValidate_TLSCertPathEmpty(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
@@ -605,6 +723,8 @@ func TestValidate_TLSKeyPathEmpty(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
@@ -637,6 +757,8 @@ func TestValidate_TLSCertFileMissing(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
@@ -668,6 +790,8 @@ func TestValidate_TLSKeyFileMissing(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
@@ -701,6 +825,8 @@ func TestValidate_TLSMismatchedPair(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
err := cfg.Validate()
@@ -724,6 +850,8 @@ func TestValidate_EmptyDatabaseURL(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -743,6 +871,8 @@ func TestValidate_InvalidLogLevel(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -762,6 +892,8 @@ func TestValidate_InvalidLogFormat(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {
@@ -840,6 +972,8 @@ func TestValidate_DatabaseMaxConnectionsZero(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
NotificationRetryInterval: 2 * time.Minute,
RetryInterval: 5 * time.Minute,
},
}
if err := cfg.Validate(); err == nil {