fix(security): close BUNDLE 1 — server+agent connector config validation chain

Bundle 1 closure (2026-05-12 acquisition diligence audit). Closes the
acquisition-blocker chain: target.edit (default r-operator grant per
migrations/000029_rbac.up.sql:196) → arbitrary reload_command stored
without validation → agent createTargetConnector json.Unmarshal-only
→ sh -c on agent host. README's 'shell injection prevention on all
connector scripts' claim is now true at the chain level.

Server-side: new internal/connector/target/configcheck package + a
configcheck.Validate call in target.go::Create + ::Update +
::CreateTarget + ::UpdateTarget (all 4 entry points). Rejects shell
metacharacters in reload_command / validate_command / restart_command
for nginx, apache, haproxy, postfix/dovecot, javakeystore, ssh. Sentinel
errors.Is(err, service.ErrInvalidConnectorConfig) available for handler
400 mapping. Non-shell connector types (F5, IIS, Caddy, Traefik, Envoy,
cloud targets, K8s) are no-ops by design.

Agent-side: defense-in-depth connector.ValidateConfig(ctx, configJSON)
call in cmd/agent/main.go inserted between createTargetConnector and
DeployCertificate. This catches (a) configs pre-dating the server gate,
(b) encrypted-blob tampering, (c) per-connector filesystem invariants
that the server can't check.

F5 (S2 finding): proven docs-vs-code drift, not a security bug. The
applyDefaults function never set Insecure=true; runtime default has
always been Go zero-value (false → TLS verified). Three lying 'default
true' comments in f5/f5.go (lines 30, 45-47, 126) rewritten to match
actual code behavior.

Docs (C4 + C9): README L12 + L68 narrowed — 'any CA / any server' →
'Twelve native CA connectors plus an OpenSSL adapter; fifteen native
deployment-target connectors plus a proxy-agent pattern.' 'Every deploy
goes through atomic-write + ...' narrowed to file-based connectors with
inline link to per-target guarantee matrix. New deployment-model.md §1.6
ships a 15-target × 8-property guarantee table covering atomic write /
owner-perms / SHA-256 idempotency / pre-deploy snapshot / on-failure
rollback / post-deploy TLS verify / Prometheus counters / shell-injection
validation — including the K8s preview honesty marker (CLAIM-H4).

Tests: internal/connector/target/configcheck/configcheck_test.go covers
14 shell-injection payloads (semicolon, pipe, backtick, dollar-paren,
redirect, and-chain, newline, double-quote, escape, dollar-var) × 7
shell-using connectors + benign-command acceptance + non-shell no-op
behavior + empty config + malformed JSON. All pass.

Verification (run from /sessions/gifted-blissful-pasteur/mnt/cowork/certctl):
  go fmt ./...              # clean (no diffs)
  go vet ./...              # clean (no findings)
  go test -short -count=1 ./internal/... ./cmd/...
                            # 60+ packages all ok, zero FAIL

Audit-Closes: BUNDLE-1 RT-C1 SEC-M4 CLAIM-M2 CLAIM-L3
Audit-Verifies-False: S2 (F5 'default insecure' was a comment lie, code was always secure)
This commit is contained in:
shankar0123
2026-05-12 23:48:08 +00:00
parent 96d4b1e623
commit d60a0ac297
7 changed files with 403 additions and 9 deletions
+40
View File
@@ -28,6 +28,46 @@ a single shared primitive:
This document describes the operator-visible surface. The Go-level
contract lives at `internal/deploy/doc.go`.
## 1.6. Per-target guarantee matrix
Added 2026-05-12 (Bundle 1 / CLAIM-M2 closure). The README previously
claimed "every deploy goes through atomic-write + ownership-preservation
+ SHA-256 idempotency + per-target Prometheus counters + pre-deploy
snapshot + on-failure rollback." That claim is true for the file-based
deploy primitive only. Cloud / API targets use vendor-SDK semantics and
do not share the same primitive. This matrix is the authoritative
per-target answer.
Legend: ✓ = supported / always on. ✗ = not applicable to this target
family. ◐ = partial / vendor-specific equivalent. preview = ships but
the production code path is a stub (see CLAIM-H4).
| Target | Atomic write | Owner/perms preserved | SHA-256 idempotency | Pre-deploy snapshot | On-failure rollback | Post-deploy TLS verify | Prometheus counters | Server+agent shell-injection validation |
|---|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
| NGINX | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Apache | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| HAProxy | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Caddy | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ (no operator commands) |
| Traefik | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Envoy | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Postfix / Dovecot| ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SSH known-hosts | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ (no TLS endpoint) | ✓ | ✓ |
| JavaKeystore | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ (file format, no socket) | ✓ | ✓ |
| IIS | ◐ (Windows cert store API) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| WinCertStore | ◐ (Windows cert store API) | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✗ |
| F5 BIG-IP | ✓ (iControl REST transaction) | ✗ (no FS) | ◐ (cert object name) | ◐ (transaction rollback) | ✓ (transaction rollback) | ✓ (mgmt API GET) | ✓ | ✗ |
| AWS ACM | ✗ (SDK call) | ✗ (no FS) | ◐ (ACM-side replace) | ✗ | ◐ (re-import old ARN) | ✗ | ✓ | ✗ |
| Azure Key Vault | ✗ (SDK call) | ✗ (no FS) | ◐ (KV-side versioning) | ✗ | ◐ (KV versioning) | ✗ | ✓ | ✗ |
| Kubernetes Secrets | preview | preview | preview | preview | preview | preview | preview | ✗ |
**Notes on the matrix:**
- **Atomic write / owner-perms / SHA-256 idempotency / snapshot / rollback** are properties of the shared `deploy.Apply` primitive in `internal/deploy/`. They apply to file-based targets where certctl writes to disk.
- **Cloud / API targets** (AWS ACM, Azure Key Vault) use the vendor SDK's import / replace operation. The vendor handles versioning and atomicity at their layer. certctl tracks the operation outcome via Prometheus counters; "rollback" in this row means "re-import the previous cert ARN" rather than the file-primitive's `os.Rename` rollback.
- **F5** uses iControl REST transactions for atomicity (deploy-hardening I docs above). It does not touch a filesystem; the snapshot/rollback semantics live in the F5 transaction protocol.
- **Kubernetes Secrets** ships but the production client (`realK8sClient`) returns `"real Kubernetes client not implemented"` for all methods (see `internal/connector/target/k8ssecret/k8ssecret.go:395+`). Operators evaluating against a real cluster should treat this connector as preview until the production client lands.
- **Server+agent shell-injection validation** (Bundle 1 / RT-C1 closure 2026-05-12) is on for every connector that accepts operator-supplied command strings: `reload_command`, `validate_command`, `restart_command`. Validation runs at API ingestion (`internal/service/target.go::Create` + `::Update` + `::CreateTarget` + `::UpdateTarget` via `internal/connector/target/configcheck`) AND on the agent before deploy (`cmd/agent/main.go` post-`createTargetConnector`, calling each connector's full `ValidateConfig` method). Connectors that do not accept operator shell strings (Caddy / Traefik / Envoy / cloud targets) skip this check by design.
## 1.5. Audit closure status (2026-05-02 deployment-target audit)
The 2026-05-02 deployment-target coverage audit