mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-09 12:08:56 +00:00
52248be717
Breaking change release. Plaintext HTTP listener removed. The certctl control plane now terminates TLS 1.3 on :8443 via http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md. Server - cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback), preflightServerTLS validation - cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe, watchSIGHUP wiring, cert/key path config threading - tls_test.go: 418-line regression coverage of reload, preflight, callback behavior, SAN validation Config - CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required) - Plaintext rejection: agents/CLI/MCP pre-flight-fail on http:// URLs with a pointer to docs/upgrade-to-tls.md Agents, CLI, MCP - All three pre-flight-reject http:// URLs with fail-loud diagnostic - CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust - CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass (loud warning on startup) - install-agent.sh emits both vars as commented template lines docker-compose - certctl-tls-init sidecar generates SAN-valid self-signed cert into deploy/test/certs/ on first boot - All demo-stack curls pin against ca.crt with --cacert Helm chart - Three TLS provisioning modes, exactly one required: - server.tls.existingSecret (operator-supplied) - server.tls.certManager.enabled (cert-manager integration) - server.tls.selfSigned.enabled (eval only — not for production) - server-certificate.yaml template for cert-manager mode - helm install without a TLS source fails at template render with a pointer to docs/tls.md CI - .github/workflows/ci.yml Helm Chart Validation step renders the chart in both existingSecret and cert-manager modes, plus an inverse guard-regression test that asserts helm template MUST refuse to render when no TLS source is configured. Previously the single `helm template` invocation hit the certctl.tls.required fail-loud guard and exit-1'd CI. Four invocations now: lint (existingSecret), template (existingSecret), template (cert-manager), template (no args — must fail). Integration tests - deploy/test/integration_test.go stands up the Compose stack over HTTPS, extracts the CA bundle, and exercises every certctl API over https://localhost:8443 - All 34 integration subtests green (per Phase 8 local CI-parity) Documentation - New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload) - New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade warnings, fleet-roll sequencing) - CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry (file heading unchanged; release tag is v2.0.47) - All curls in docs/, examples/, deploy/helm/ guides use https://localhost:8443 --cacert Verification - grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits - grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin API default, SSRF doc comment) — zero certctl endpoints - Tasks #197–#206 (Phases 0–8) all closed in the tracker Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix).
180 lines
14 KiB
Markdown
180 lines
14 KiB
Markdown
# TLS on the Control Plane
|
|
|
|
certctl's control plane is HTTPS-only as of v2.2. There is no plaintext `http://` listener, no `auto` mode, no dual-listener bridge, no TLS 1.2 escape hatch. The server refuses to start without a cert+key pair, the agent/CLI/MCP clients reject `http://` URLs at startup, and the Helm chart refuses to render without either an operator-supplied Secret or a cert-manager Certificate CR.
|
|
|
|
This doc covers four cert provisioning patterns, SIGHUP-based cert rotation, and the client-side CA-trust configuration agents and the CLI need to talk to the server. If you are upgrading from a pre-HTTPS release and want the step-by-step cutover procedure, read [`upgrade-to-tls.md`](upgrade-to-tls.md) first and come back here for reference.
|
|
|
|
## What you get
|
|
|
|
The server binds TLS 1.3 only with an explicit curve preference of `[X25519, P-256]`. TLS 1.3 cipher suites are non-negotiable (all three mandatory suites — AES-128-GCM-SHA256, AES-256-GCM-SHA384, CHACHA20-POLY1305-SHA256 — are always offered), so there is no `CipherSuites` knob to misconfigure. No TLS 1.2 fallback is available.
|
|
|
|
Two env vars are required on the server:
|
|
|
|
- `CERTCTL_SERVER_TLS_CERT_PATH` — filesystem path to the PEM-encoded server certificate
|
|
- `CERTCTL_SERVER_TLS_KEY_PATH` — filesystem path to the PEM-encoded private key that signs the cert
|
|
|
|
Both paths are read during a fail-loud preflight in `cmd/server/main.go` (see `preflightServerTLS` in `cmd/server/tls.go`). If either is unset, unreadable, or the cert+key pair does not round-trip through `tls.LoadX509KeyPair`, the process refuses to start and emits a diagnostic pointing back at this doc. The rationale lives in §3 of the HTTPS-Everywhere milestone: a cert-lifecycle product should not silently bind plaintext.
|
|
|
|
## Pattern 1 — Self-signed bootstrap for docker-compose demos
|
|
|
|
This is the default for the `deploy/docker-compose.yml` stack. It exists so `docker compose up -d --build` just works on a laptop without the operator standing up a CA first. It is not appropriate for any non-demo environment.
|
|
|
|
An init container named `certctl-tls-init` runs once before the server starts. It uses the `alpine/openssl` image and generates an ed25519 self-signed cert:
|
|
|
|
```
|
|
openssl req -x509 -newkey ed25519 -nodes \
|
|
-keyout /etc/certctl/tls/server.key \
|
|
-out /etc/certctl/tls/server.crt \
|
|
-days 3650 \
|
|
-subj "/CN=certctl-server" \
|
|
-addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1,IP:::1"
|
|
```
|
|
|
|
The cert, its matching key, and a copy of the cert published as `ca.crt` land in a named volume (`certs`) mounted at `/etc/certctl/tls/` in the server container (read-only) and the agent container (read-only). The bootstrap is idempotent — if `server.crt`, `server.key`, and `ca.crt` are already present on the volume, the init container logs `TLS cert already present at …` and exits cleanly.
|
|
|
|
Single-cert design. CN is `certctl-server` to match the Docker-network hostname. The SAN list is `[certctl-server, localhost, 127.0.0.1, ::1]`, which covers both container-internal agent→server traffic and operator browser/curl access to `https://localhost:8443`. There is no separate intermediate/root chain — the server cert and the CA bundle are the same PEM. This is the whole point of a demo bootstrap.
|
|
|
|
To force regeneration (rotate the demo cert), tear the volume down: `docker compose down -v`. The next `up` re-runs the init container.
|
|
|
|
The server's Docker healthcheck and the agent both verify against `/etc/certctl/tls/ca.crt`; no `-k` / `InsecureSkipVerify` anywhere in the default stack.
|
|
|
|
## Pattern 2 — Operator-supplied `kubernetes.io/tls` Secret (Helm)
|
|
|
|
This is the default path for Helm installs. The operator provisions a Secret of type `kubernetes.io/tls` holding `tls.crt` + `tls.key` (and optionally `ca.crt` for mounting a CA bundle to clients in the same cluster) from whatever source they already trust — their internal CA, a manually-issued cert, step-ca, AWS ACM PCA exported to PEM, or the output of the self-signed bootstrap pattern above copied into a cluster Secret.
|
|
|
|
```
|
|
kubectl create secret tls certctl-server-tls \
|
|
--cert=server.crt \
|
|
--key=server.key \
|
|
--namespace certctl
|
|
```
|
|
|
|
Then:
|
|
|
|
```
|
|
helm install certctl deploy/helm/certctl \
|
|
--namespace certctl \
|
|
--set server.tls.existingSecret=certctl-server-tls
|
|
```
|
|
|
|
The Secret is mounted read-only at `/etc/certctl/tls/` in the server pod. The `CERTCTL_SERVER_TLS_CERT_PATH` and `CERTCTL_SERVER_TLS_KEY_PATH` env vars are wired to `tls.crt` and `tls.key` keys inside that mount. If `ca.crt` is absent from the Secret, clients that need a CA bundle should use `tls.crt` as the bundle (self-signed case) or mount a separate ConfigMap with the root chain (operator-CA case).
|
|
|
|
If the operator sets neither `server.tls.existingSecret` nor `server.tls.certManager.enabled=true`, `helm template` / `helm install` fails at render-time with a diagnostic pointing at this doc. The guard is implemented in `deploy/helm/certctl/templates/_helpers.tpl` under the `certctl.tls.required` helper. This is deliberate: the HTTPS-only server would crash-loop on an empty path, so we fail earlier at Helm-render time.
|
|
|
|
## Pattern 3 — cert-manager `Certificate` CR (Helm, opt-in)
|
|
|
|
For clusters that already run cert-manager, the chart can provision a `Certificate` CR that writes into the Secret the server pod reads from. This is opt-in — the default is `server.tls.certManager.enabled: false` — because not every cluster has cert-manager installed, and we refuse to ship a chart that silently depends on an external controller.
|
|
|
|
```
|
|
helm install certctl deploy/helm/certctl \
|
|
--namespace certctl \
|
|
--set server.tls.certManager.enabled=true \
|
|
--set server.tls.certManager.issuerRef.name=my-cluster-issuer \
|
|
--set server.tls.certManager.issuerRef.kind=ClusterIssuer
|
|
```
|
|
|
|
The rendered `Certificate` (see `deploy/helm/certctl/templates/server-certificate.yaml`) writes `tls.crt` + `tls.key` + `ca.crt` into the Secret named by `server.tls.certManager.secretName` (defaults to `<fullname>-tls`). The server pod reads from that same Secret; the agent DaemonSet mounts the same Secret as its CA bundle source.
|
|
|
|
cert-manager handles rotation. certctl-server handles in-place reload — see the SIGHUP section below.
|
|
|
|
The chart enforces that if `server.tls.certManager.enabled=true`, `server.tls.certManager.issuerRef.name` must also be set. An empty `issuerRef.name` makes `helm template` fail with a diagnostic naming the missing flag.
|
|
|
|
## Pattern 4 — Manually-issued from an internal CA
|
|
|
|
For operators running neither Helm nor docker-compose (bare-metal / custom orchestration), the server just needs two files on disk pointed at by `CERTCTL_SERVER_TLS_CERT_PATH` and `CERTCTL_SERVER_TLS_KEY_PATH`. Issue the cert from your internal CA with:
|
|
|
|
- CN matching the hostname your agents and operators use to dial the server (e.g., `certctl.prod.example.com`)
|
|
- SAN list covering every hostname and IP that appears in `CERTCTL_SERVER_URL` values across your agent fleet
|
|
- Key usage: digital signature + key encipherment
|
|
- Extended key usage: server auth
|
|
|
|
Store the key with mode `0600` and owner matching the UID the server runs as (`1000` in our shipped Dockerfile). The server process reads both files during `preflightServerTLS` at startup and again on every SIGHUP.
|
|
|
|
The full CA chain that signed the server cert should be distributed to agents, CLI operators, and MCP clients as their `CERTCTL_SERVER_CA_BUNDLE_PATH` — see the client section below.
|
|
|
|
## SIGHUP cert rotation
|
|
|
|
The server wraps its cert+key pair in a `*certHolder` (see `cmd/server/tls.go`) that guards the loaded `*tls.Certificate` under a `sync.Mutex`. The `*tls.Config` wires `GetCertificate` to the holder, so every new inbound TLS handshake reads whatever cert the holder currently has.
|
|
|
|
Send `SIGHUP` to the server PID and the holder re-reads both files from disk. On success, the next new connection uses the new cert; in-flight requests finish on the previous cert. A log line goes out:
|
|
|
|
```
|
|
TLS cert reloaded via SIGHUP cert_path=/etc/certctl/tls/server.crt key_path=/etc/certctl/tls/server.key
|
|
```
|
|
|
|
On failure (missing file, malformed PEM, key does not sign cert), the old cert is retained and an error logs:
|
|
|
|
```
|
|
TLS cert reload failed; continuing with previous cert cert_path=… key_path=… error=…
|
|
```
|
|
|
|
This is deliberately fail-safe on reload (as opposed to fail-loud on startup). A cert-manager renewal race, a partially-copied file, a typo in a rotation script — none of those should crash a running server and drop every agent connection. The operator sees the error in logs, fixes the underlying issue, and sends another `SIGHUP`.
|
|
|
|
Pair with cert-manager, certbot `--post-hook`, or any rotation tool that can fire a signal. For docker-compose, `docker compose kill -s HUP certctl-server` works. For Kubernetes, reload is typically handled by cert-manager updating the Secret and the mounted file changing on the next kubelet sync — no explicit SIGHUP needed if the volume mount is `subPath`-free.
|
|
|
|
Startup is a different story. If the cert is missing or malformed at process start, the server exits non-zero rather than binding plaintext or attempting a retry loop. That's the HTTPS-only contract.
|
|
|
|
## Client-side TLS: agents, CLI, MCP
|
|
|
|
Everything that talks to the server enforces HTTPS on the URL.
|
|
|
|
### Agent
|
|
|
|
`CERTCTL_SERVER_URL` must be `https://…`. `http://`, bare hostnames, `ftp://`, `ws://`, and empty strings are rejected at startup by `validateHTTPSScheme` in `cmd/agent/main.go` with a diagnostic pointing at `upgrade-to-tls.md`. There is no warning-and-proceed path.
|
|
|
|
Two additional env vars control how the agent verifies the server cert:
|
|
|
|
- `CERTCTL_SERVER_CA_BUNDLE_PATH` — filesystem path to a PEM-encoded CA bundle that signed the server cert. Loaded into `*tls.Config.RootCAs` on the agent's HTTP client. If unset, the agent falls back to the OS system trust store.
|
|
- `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY` — defaults to `false`. Setting it to `true` skips verification entirely. **Dev-only escape hatch.** The agent logs a prominent warning at startup (`TLS certificate verification is disabled … never enable this in production`). Use this only when dialing a demo server whose cert you haven't bothered to mount into the agent container.
|
|
|
|
Equivalent CLI flags: `--ca-bundle <path>` and `--insecure-skip-verify`.
|
|
|
|
If both the CA bundle and `InsecureSkipVerify=true` are set, `InsecureSkipVerify` wins — it's the whole point of the flag. Don't do this in production.
|
|
|
|
### CLI (`certctl-cli`)
|
|
|
|
Same contract as the agent:
|
|
|
|
- `CERTCTL_SERVER_URL` defaults to `https://` scheme; `http://` rejected at startup
|
|
- `--ca-bundle <path>` flag or `CERTCTL_SERVER_CA_BUNDLE_PATH` env var — CA bundle for server cert verification
|
|
- `--insecure` flag or `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true` — skip verification (dev only)
|
|
- Error diagnostic on empty URL explicitly mentions both `--server` and `CERTCTL_SERVER_URL` so operators see the right knob to turn
|
|
|
|
The CLI shares the URL-scheme validation with the agent; the test pins in `cmd/cli/main_test.go:TestValidateHTTPSScheme` cover the full rejection matrix.
|
|
|
|
### MCP server (`certctl-mcp-server`)
|
|
|
|
Same three controls as CLI, env-var-driven only (no flags — MCP runs as a stdio subprocess and inherits env from the launching LLM client):
|
|
|
|
- `CERTCTL_SERVER_URL` must start with `https://`
|
|
- `CERTCTL_SERVER_CA_BUNDLE_PATH` optional CA bundle
|
|
- `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY` optional skip
|
|
|
|
Claude Desktop / other MCP client configs should set all three in the tool's env block.
|
|
|
|
## Troubleshooting: fail-loud preflight errors
|
|
|
|
Every preflight failure message ends with `(see docs/tls.md)` so this doc is the first hit when an operator searches. Common failures:
|
|
|
|
**`CERTCTL_SERVER_TLS_CERT_PATH is empty: HTTPS-only control plane refuses to start`**
|
|
Set the env var. For docker-compose this is already set to `/etc/certctl/tls/server.crt` in the shipped compose file — if you're seeing this, check the `certctl-tls-init` service logs to see why the init container didn't populate the volume. For Helm, check that `server.tls.existingSecret` or `server.tls.certManager.enabled=true` is set.
|
|
|
|
**`TLS cert file "…" unreadable: …`**
|
|
The cert path is set but `os.Stat` failed. Check filesystem permissions — the server runs as UID 1000 in our shipped Dockerfile; the cert needs to be readable by that UID. Typos in the path also land here.
|
|
|
|
**`TLS cert/key pair invalid (cert="…" key="…"): …`**
|
|
Both files exist but `tls.LoadX509KeyPair` refused them. Typical causes: the private key does not sign the certificate, the key is encrypted with a passphrase (not supported — remove the passphrase with `openssl pkey` before mounting), or one of the two is DER-encoded instead of PEM. Re-issue the pair from the same CA call and re-mount.
|
|
|
|
**Client side: `tls: failed to verify certificate: x509: certificate signed by unknown authority`**
|
|
The client did not trust the CA that signed the server cert. Either mount the CA bundle via `CERTCTL_SERVER_CA_BUNDLE_PATH`, add the CA to the system trust store on the client host, or (dev only) set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`.
|
|
|
|
**Client side: `tls: first record does not look like a TLS handshake`**
|
|
The client is speaking plaintext HTTP to an HTTPS server (or vice-versa). Check that `CERTCTL_SERVER_URL` starts with `https://`. If you are upgrading from a pre-v2.2 release and your agents are old, they will surface this error until you roll the DaemonSet — see [`upgrade-to-tls.md`](upgrade-to-tls.md).
|
|
|
|
## Related docs
|
|
|
|
- [`upgrade-to-tls.md`](upgrade-to-tls.md) — one-step cutover from pre-HTTPS releases
|
|
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough with HTTPS examples
|
|
- [`test-env.md`](test-env.md) — integration test environment (also HTTPS-only)
|
|
- Milestone spec: `prompts/https-everywhere-milestone.md` (authoritative source for locked decisions)
|