mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:01:37 +00:00
9c1d446e40
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
Certctl Helm Chart
Production-ready Helm chart for deploying certctl (self-hosted certificate lifecycle management platform) on Kubernetes.
Table of Contents
- Quick Start
- Chart Features
- Prerequisites
- Installation
- Configuration
- Usage Examples
- Upgrading
- Uninstalling
- Architecture
- Security Considerations
- Troubleshooting
Quick Start
# Add the chart repository (when available)
helm repo add certctl https://charts.example.com
helm repo update
# Install with default values
helm install certctl certctl/certctl \
--set server.auth.apiKey="your-secure-api-key" \
--set postgresql.auth.password="your-secure-password"
# Check installation status
kubectl get pods -l app.kubernetes.io/instance=certctl
Chart Features
- Server Deployment — certctl control plane with configurable replicas
- PostgreSQL StatefulSet — Persistent database with automatic schema migration
- Agent DaemonSet or Deployment — Flexible agent deployment (per-node or custom replicas)
- Ingress Support — Optional HTTPS ingress with cert-manager integration
- Security Contexts — Non-root containers, read-only filesystems, minimal capabilities
- Resource Limits — Configurable CPU and memory requests/limits
- Health Checks — Liveness and readiness probes on all containers
- ConfigMaps and Secrets — Centralized configuration management
- Service Account and RBAC — Optional cluster role bindings
- Pod Disruption Budgets — HA-ready with configurable disruption budgets
- Monitoring — Optional Prometheus ServiceMonitor support
Prerequisites
- Kubernetes 1.19 or later
- Helm 3.0 or later
- Optional: cert-manager (for automatic TLS certificate provisioning)
- Optional: Prometheus (for metrics scraping)
Installation
1. Using Chart from Repository
helm repo add certctl https://charts.example.com
helm repo update
helm install certctl certctl/certctl -f my-values.yaml
2. Using Local Chart
cd deploy/helm
helm install certctl certctl/ \
--set server.auth.apiKey="$(openssl rand -base64 32)" \
--set postgresql.auth.password="$(openssl rand -base64 32)"
3. Minimal Production Installation
helm install certctl certctl/certctl \
--namespace certctl \
--create-namespace \
--set server.auth.apiKey="change-me" \
--set postgresql.auth.password="change-me" \
--set server.replicas=2 \
--set server.resources.requests.cpu=200m \
--set server.resources.requests.memory=256Mi \
--set ingress.enabled=true \
--set ingress.className=nginx \
--set ingress.hosts[0].host=certctl.example.com
Configuration
Server Configuration
server:
replicas: 1 # Number of server replicas
port: 8443 # Service port
auth:
type: api-key # Authentication type
apiKey: "your-api-key" # REQUIRED for production
logging:
level: info # Log level (debug, info, warn, error)
format: json # Output format
issuer:
local:
enabled: true # Enable local CA issuer
acme:
enabled: false # Enable ACME issuer
directoryURL: "" # ACME directory URL
email: "" # ACME registration email
challengeType: "http-01" # Challenge type (http-01, dns-01, dns-persist-01)
PostgreSQL Configuration
postgresql:
enabled: true # Use managed PostgreSQL
auth:
database: certctl
username: certctl
password: "your-password" # REQUIRED
storage:
size: 10Gi # PVC size
storageClass: "" # Use default StorageClass
Agent Configuration
agent:
enabled: true # Deploy agents
kind: DaemonSet # DaemonSet (one per node) or Deployment
replicas: 1 # For Deployment kind only
discoveryDirs: "" # Comma-separated cert discovery paths
nodeSelector: {} # Node affinity for DaemonSet
Ingress Configuration
ingress:
enabled: false
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: certctl.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: certctl-tls
hosts:
- certctl.example.com
See values.yaml for all available configuration options.
Usage Examples
Example 1: High Availability Setup
# ha-values.yaml
server:
replicas: 3
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
postgresql:
storage:
size: 50Gi
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values: [server]
topologyKey: kubernetes.io/hostname
Deploy with:
helm install certctl certctl/certctl -f ha-values.yaml
Example 2: External PostgreSQL Database
# external-db-values.yaml
postgresql:
enabled: false
server:
env:
CERTCTL_DATABASE_URL: "postgres://user:password@rds.example.com:5432/certctl?sslmode=require"
Deploy with:
helm install certctl certctl/certctl -f external-db-values.yaml
Example 3: ACME + Let's Encrypt
# acme-values.yaml
server:
issuer:
acme:
enabled: true
directoryURL: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
challengeType: dns-01
dnsPresentScript: /scripts/dns-present.sh
dnsCleanupScript: /scripts/dns-cleanup.sh
dnsPropagationWait: 30s
Example 4: Email Notifications via Slack + SMTP
# notifications-values.yaml
server:
smtp:
enabled: true
host: smtp.example.com
port: 587
username: certctl@example.com
password: "smtp-password"
fromAddress: certctl@example.com
useTLS: true
notifiers:
slack:
enabled: true
webhookUrl: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
channel: "#certificates"
Upgrading
# Update chart repository
helm repo update
# Upgrade release
helm upgrade certctl certctl/certctl -f values.yaml
# View upgrade history
helm history certctl
# Rollback to previous version
helm rollback certctl 1
Uninstalling
# Delete the release (keeps data by default)
helm uninstall certctl
# Also delete persistent data
kubectl delete pvc --all -l app.kubernetes.io/instance=certctl
# Delete namespace
kubectl delete namespace certctl
Architecture
Components
┌──────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────────┐ │
│ │ Ingress/LB │ │ Agent Pod 1 │ │
│ │ (optional) │ │ (DaemonSet) │ │
│ └────────┬────────┘ └──────────────────┘ │
│ │ │
│ ▼ ┌──────────────────┐ │
│ ┌─────────────────────────┐ │ Agent Pod 2 │ │
│ │ Server Deployment │ │ (DaemonSet) │ │
│ │ (1 to N replicas) │ └──────────────────┘ │
│ │ - REST API │ │
│ │ - Scheduler │ ┌──────────────────┐ │
│ │ - UI Dashboard │ │ Agent Pod N │ │
│ └────────┬────────────────┘ │ (DaemonSet) │ │
│ │ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ PostgreSQL StatefulSet │ │
│ │ - Database │ │
│ │ - PVC (persistent) │ │
│ └──────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
Network Communication
- Server → PostgreSQL: Internal cluster DNS (
certctl-postgres:5432) - Agent → Server: Internal cluster DNS (
certctl-server:8443) - External → Server: Via Ingress or Service (ClusterIP/LoadBalancer/NodePort)
Security Considerations
1. Secrets Management
All sensitive data is stored in Kubernetes Secrets:
- PostgreSQL credentials
- API keys
- SMTP passwords
- ACME account secrets
Best Practices:
- Use sealed-secrets or external-secrets operator
- Enable encryption at rest in etcd
- Rotate secrets regularly
# Example: Using sealed-secrets
kubectl create secret generic certctl-api-key --from-literal=api-key="$(openssl rand -base64 32)" --dry-run=client -o yaml | kubeseal -f - | kubectl apply -f -
2. RBAC
The chart creates minimal RBAC by default:
- ServiceAccount per release
- ClusterRole (empty, extensible)
- ClusterRoleBinding
To restrict further:
rbac:
create: true
# Add specific rules here
3. Pod Security
All containers run with:
- Non-root user (UID 1000)
- Read-only root filesystem
- No privilege escalation
- Dropped capabilities (ALL)
4. Network Policies
Restrict pod-to-pod communication:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: certctl-default-deny
spec:
podSelector:
matchLabels:
app.kubernetes.io/instance: certctl
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: certctl
egress:
- to:
- namespaceSelector:
matchLabels:
name: certctl
- to:
- podSelector: {}
ports:
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53
5. TLS/HTTPS
Enable HTTPS with cert-manager:
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true
Then configure Ingress with TLS.
6. API Key Security
For production:
- Generate a strong API key:
openssl rand -base64 32 - Store securely (Vault, sealed-secrets, etc.)
- Never commit to Git
- Rotate periodically
# Generate and deploy API key
NEW_KEY=$(openssl rand -base64 32)
kubectl patch secret certctl-server -p "{\"data\":{\"api-key\":\"$(echo -n $NEW_KEY | base64)\"}}"
Troubleshooting
1. Pods Not Starting
# Check pod status
kubectl get pods -l app.kubernetes.io/instance=certctl
kubectl describe pod <pod-name>
kubectl logs <pod-name>
2. Database Connection Issues
# Verify PostgreSQL is running
kubectl get pods -l app.kubernetes.io/component=postgres
kubectl logs -l app.kubernetes.io/component=postgres
# Test connection from server pod
kubectl exec -it <server-pod> -- \
psql postgres://certctl:password@certctl-postgres:5432/certctl
3. Agent Not Connecting
# Check agent logs
kubectl logs -l app.kubernetes.io/component=agent
# Verify server is reachable
kubectl exec -it <agent-pod> -- \
wget -q -O - http://certctl-server:8443/health
4. Persistent Data Loss
# Check PVC status
kubectl get pvc
# Verify data is being stored
kubectl exec -it <postgres-pod> -- \
ls -lah /var/lib/postgresql/data/postgres
5. Permission Denied Errors
The chart runs containers as non-root (UID 1000). If you see permission errors:
# Temporarily allow root for debugging
server:
securityContext:
runAsUser: 0 # NOT FOR PRODUCTION
6. Out of Memory
Increase resource limits:
helm upgrade certctl certctl/certctl \
--set server.resources.limits.memory=1Gi \
--set postgresql.resources.limits.memory=2Gi
7. Certificate Validation Issues
For self-signed certificates:
kubectl exec -it <pod> -- \
CERTCTL_TLS_INSECURE_SKIP_VERIFY=true <command>
Common Issues and Solutions
| Issue | Solution |
|---|---|
ImagePullBackOff |
Update server.image.repository to your registry |
CrashLoopBackOff |
Check logs with kubectl logs <pod> |
Pending PVC |
Check storage class availability |
| Connection timeout | Verify network policies and service DNS |
| High memory usage | Adjust postgresql.resources.limits and server.resources.limits |
Support and Contributing
For issues, questions, or contributions, visit:
- GitHub: https://github.com/shankar0123/certctl
- Documentation: https://github.com/shankar0123/certctl/tree/main/docs
License
BSL-1.1 (converts to Apache 2.0 in 2033)