mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:31:37 +00:00
9c1d446e40
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
245 lines
10 KiB
YAML
245 lines
10 KiB
YAML
name: CI
|
|
|
|
on:
|
|
push:
|
|
branches:
|
|
- master
|
|
- v2-dev
|
|
pull_request:
|
|
branches:
|
|
- master
|
|
|
|
jobs:
|
|
go-build-and-test:
|
|
name: Go Build & Test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Set up Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version: '1.25.9'
|
|
|
|
- name: Go Build
|
|
run: |
|
|
go build ./cmd/server/...
|
|
go build ./cmd/agent/...
|
|
go build ./cmd/mcp-server/...
|
|
go build ./cmd/cli/...
|
|
|
|
- name: Go Vet
|
|
run: go vet ./...
|
|
|
|
- name: Install golangci-lint
|
|
run: |
|
|
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.11.4
|
|
|
|
- name: Run golangci-lint
|
|
run: golangci-lint run ./... --timeout 5m
|
|
|
|
- name: Install govulncheck
|
|
run: go install golang.org/x/vuln/cmd/govulncheck@latest
|
|
|
|
- name: Run govulncheck
|
|
run: govulncheck ./...
|
|
|
|
- name: Forbidden auth-type literal regression guard (G-1)
|
|
# G-1 closed the JWT silent auth downgrade by removing "jwt" from the
|
|
# accepted CERTCTL_AUTH_TYPE values. This step grep-fails the build
|
|
# if "jwt" reappears in any of the *additive* auth-type surfaces:
|
|
# the validAuthTypes / ValidAuthTypes() set, the OpenAPI enum, the
|
|
# helm chart's allowed-types list, or the .env.example default.
|
|
# Comment lines and the dedicated rejection branch in config.go
|
|
# (`c.Auth.Type == "jwt"`) are intentionally exempt — those are the
|
|
# G-1 fix itself, not a regression.
|
|
#
|
|
# Connector packages (internal/connector/) are exempt because the
|
|
# Google OAuth2 service-account JWT and step-ca provisioner one-
|
|
# time-token JWT are external-protocol uses, unrelated to certctl's
|
|
# own auth shape. Test files (_test.go) are exempt so negative
|
|
# tests can pass the literal.
|
|
#
|
|
# See docs/upgrade-to-v2-jwt-removal.md for the closure rationale,
|
|
# or internal/config/config.go::ValidAuthTypes for the allowed set.
|
|
run: |
|
|
set -e
|
|
|
|
# Scoped patterns that indicate "jwt" being added back to an
|
|
# allowed-set surface. Each catches a regression shape we've
|
|
# actually seen in pre-G-1 code:
|
|
# - Go map/slice literal: "jwt": true or "jwt",
|
|
# - Go switch case: case "jwt"
|
|
# - YAML enum: enum: [..., jwt, ...] or - jwt
|
|
# - .env conditional: AUTH_TYPE.*"jwt"|=jwt$
|
|
BAD=$(grep -rnEH \
|
|
-e '"jwt"\s*:\s*true' \
|
|
-e '"jwt"\s*,' \
|
|
-e 'case\s+"jwt"' \
|
|
-e 'enum:.*\bjwt\b' \
|
|
-e '^\s*-\s*jwt\s*$' \
|
|
-e 'AUTH_TYPE\s*=\s*jwt\s*$' \
|
|
-e 'AUTH_TYPE\s*=\s*jwt\s*#' \
|
|
-e 'auth\.type\s*=\s*jwt\s*$' \
|
|
-e 'AuthType\("jwt"\)' \
|
|
internal/config/ \
|
|
internal/api/ \
|
|
cmd/ \
|
|
api/openapi.yaml \
|
|
.env.example \
|
|
deploy/.env.example \
|
|
deploy/helm/certctl/values.yaml \
|
|
deploy/helm/certctl/templates/ \
|
|
2>/dev/null \
|
|
| grep -v '_test.go' \
|
|
| grep -vE '^\s*[^:]+:[0-9]+:\s*(//|#)' \
|
|
| grep -v 'is no longer accepted' \
|
|
|| true)
|
|
if [ -n "$BAD" ]; then
|
|
echo "G-1 regression: \"jwt\" reappeared in an allowed-set surface:"
|
|
echo "$BAD"
|
|
echo ""
|
|
echo "Allowed surface for 'jwt' literals: comment lines, the"
|
|
echo "dedicated rejection branch in internal/config/config.go,"
|
|
echo "and connector packages (Google OAuth2, step-ca)."
|
|
echo "See docs/upgrade-to-v2-jwt-removal.md and"
|
|
echo "internal/config/config.go::ValidAuthTypes()."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Race Detection
|
|
run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/crypto/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
|
|
|
|
- name: Go Test with Coverage
|
|
run: |
|
|
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
|
|
|
|
- name: Check Coverage Thresholds
|
|
run: |
|
|
# Extract per-package coverage from test output
|
|
echo "=== Coverage Report ==="
|
|
go tool cover -func=coverage.out | tail -1
|
|
|
|
# Check service layer coverage (target: 60%+)
|
|
SERVICE_COV=$(go tool cover -func=coverage.out | grep 'internal/service' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Service layer coverage: ${SERVICE_COV}%"
|
|
|
|
# Check handler layer coverage (target: 60%+)
|
|
HANDLER_COV=$(go tool cover -func=coverage.out | grep 'internal/api/handler' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Handler layer coverage: ${HANDLER_COV}%"
|
|
|
|
# Check domain layer coverage (target: 40%+)
|
|
DOMAIN_COV=$(go tool cover -func=coverage.out | grep 'internal/domain' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Domain layer coverage: ${DOMAIN_COV}%"
|
|
|
|
# Check middleware layer coverage (target: 50%+)
|
|
MIDDLEWARE_COV=$(go tool cover -func=coverage.out | grep 'internal/api/middleware' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Middleware layer coverage: ${MIDDLEWARE_COV}%"
|
|
|
|
# Check crypto package coverage (target: 85%+)
|
|
# M-8 rationale: encryption primitives are a security-critical gate.
|
|
# v2 format, key-derivation, fallback, and fail-closed sentinel paths
|
|
# all need exhaustive coverage to avoid silent regressions (CWE-916 / CWE-329).
|
|
CRYPTO_COV=$(go tool cover -func=coverage.out | grep 'internal/crypto' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Crypto package coverage: ${CRYPTO_COV}%"
|
|
|
|
# Fail if thresholds not met
|
|
if [ "$(echo "$SERVICE_COV < 55" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Service layer coverage ${SERVICE_COV}% is below 55% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$HANDLER_COV < 60" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Handler layer coverage ${HANDLER_COV}% is below 60% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$CRYPTO_COV < 85" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 85% threshold"
|
|
exit 1
|
|
fi
|
|
echo "Coverage thresholds passed!"
|
|
|
|
- name: Upload Coverage Report
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: go-coverage
|
|
path: coverage.out
|
|
retention-days: 30
|
|
|
|
frontend-build:
|
|
name: Frontend Build
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Set up Node.js
|
|
uses: actions/setup-node@v4
|
|
with:
|
|
node-version: '22'
|
|
|
|
- name: Install Dependencies
|
|
working-directory: web
|
|
run: npm ci
|
|
|
|
- name: TypeScript Check
|
|
working-directory: web
|
|
run: npx tsc --noEmit
|
|
|
|
- name: Run Frontend Tests
|
|
working-directory: web
|
|
run: npx vitest run
|
|
|
|
- name: Build Frontend
|
|
working-directory: web
|
|
run: npx vite build
|
|
|
|
helm-lint:
|
|
name: Helm Chart Validation
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Install Helm
|
|
uses: azure/setup-helm@v4
|
|
with:
|
|
version: '3.13.0'
|
|
|
|
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
|
|
# configured. Every lint/template invocation below must pick exactly one
|
|
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
|
|
# (certctl.tls.required) and docs/tls.md.
|
|
- name: Lint Helm Chart
|
|
run: |
|
|
helm lint deploy/helm/certctl/ \
|
|
--set server.tls.existingSecret=certctl-tls-ci
|
|
|
|
- name: Template Helm Chart (existingSecret mode)
|
|
run: |
|
|
helm template certctl deploy/helm/certctl/ \
|
|
--set server.tls.existingSecret=certctl-tls-ci \
|
|
> /dev/null
|
|
|
|
- name: Template Helm Chart (cert-manager mode)
|
|
run: |
|
|
helm template certctl deploy/helm/certctl/ \
|
|
--set server.tls.certManager.enabled=true \
|
|
--set server.tls.certManager.issuerRef.name=letsencrypt-prod \
|
|
> /dev/null
|
|
|
|
- name: Template Helm Chart (guard fails without TLS)
|
|
run: |
|
|
# Inverse test: the chart MUST refuse to render when no TLS source is
|
|
# configured. If this ever renders successfully, the fail-loud guard
|
|
# in certctl.tls.required has regressed.
|
|
if helm template certctl deploy/helm/certctl/ > /dev/null 2>&1; then
|
|
echo "::error::Helm chart rendered without a TLS source — fail-loud guard regressed"
|
|
exit 1
|
|
fi
|