Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge_<vendor>_<edge>_E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml.
5.9 KiB
NGINX Connector — Operator Deep-Dive
Per Phase 14 of the deploy-hardening II master bundle. Operator- grade documentation for the NGINX target connector.
Overview
The NGINX connector (internal/connector/target/nginx/) is the
canonical implementation of the deploy-hardening I atomic + verify
- rollback contract (Bundle I Phase 4). Every other file-based connector models on this one.
Vendor versions tested
- NGINX 1.25 LTS (current LTS branch)
- NGINX 1.27 stable (current stable branch)
Older versions (1.18 EOL'd 2021, 1.20 EOL'd 2022) are explicitly out of scope per frozen decision 0.1.
Deploy contract
Every cert deploy follows the Bundle I deploy.Apply(ctx, plan)
flow:
- Idempotency check — SHA-256 over cert+chain+key bytes; skip if all match destination.
- Pre-deploy backup — copy existing files to
<path>.certctl-bak.<unix-nanos>. - Atomic write — temp-file + chown + atomic rename per destination.
- PreCommit (validate) — runs
nginx -tper the operator'svalidate_command. Failure aborts; no live cert touched. - Atomic rename — temp → final for every File entry.
- PostCommit (reload) — runs
nginx -s reloadper the operator'sreload_command. - Post-deploy TLS verify — dials the configured endpoint; pulls leaf cert SHA-256; compares against deployed bytes. Mismatch triggers automatic rollback.
Per-quirk operator guidance
SSL session cache holds old cert
TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E
NGINX's ssl_session_cache (default shared:SSL:10m) keeps TLS
session IDs valid for ssl_session_timeout (default 5min). Clients
that resume via session ID see the OLD cert until their session
expires.
Operator action: this is documented behavior, not a bug.
Tune via ssl_session_timeout 5m; (default) or shorter if your
cert rotation cadence demands. Post-deploy verify in certctl will
return the NEW cert from a fresh handshake (no session resumption);
warm clients see the OLD cert until session-cache eviction.
SNI multi-server-name binding
TestVendorEdge_NGINX_SNIMultiServerName_DeployBindsCorrectVhost_E2E
When NGINX has multiple server { server_name a.example b.example; }
blocks, the operator deploys with metadata pointing at the
specific vhost. Connector binds to that vhost only; other vhosts
remain unchanged.
IPv6 dual-stack
TestVendorEdge_NGINX_IPv6DualStackBindsBoth_E2E
NGINX listening on 0.0.0.0:443 + [::]:443 serves the new cert
on both stacks after a single deploy.
Operator action: if your post-deploy verify endpoint resolves
to IPv6 only on some networks but IPv4 only on others, configure
PostDeployVerifyAttempts: 5 to cover both paths.
Reload vs restart
TestVendorEdge_NGINX_ReloadVsRestart_NoConnectionDrop_E2E
nginx -s reload (graceful) preserves in-flight TLS connections
via worker handoff. nginx -s stop && nginx drops them.
Operator action: never use restart for cert rotation. The
connector's default reload_command: nginx -s reload is correct.
Binary upgrade
TestVendorEdge_NGINX_UpgradeBinaryHotReload_E2E
nginx -s upgrade rolls out a new binary without dropping
connections. Not commonly used; documented for ops teams that do
rolling NGINX binary upgrades.
Config syntax error → rollback
TestVendorEdge_NGINX_ConfigSyntaxError_RollbackRestoresPreviousCert_E2E
If nginx -t rejects the staged config, the deploy package's
PreCommit gate fires before the atomic rename — no live file is
touched. The cert directory is exactly as it was.
Missing intermediate
TestVendorEdge_NGINX_MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E
If the operator deploys a leaf-only cert (no intermediate), NGINX will start serving it but downstream clients fail chain validation. The connector's post-deploy TLS verify catches this via cert chain walk; rollback fires automatically.
Access log privacy
TestVendorEdge_NGINX_AccessLogPrivacy_NoCertBytesLeakInLogs_E2E
NGINX's default access_log and error_log formats do NOT include
SSL key bytes. The connector does not modify NGINX's logging config.
Operator action: if you've customized log_format to include
$ssl_* variables, audit the format string for sensitive fields.
Per-version reload-command compat
TestVendorEdge_NGINX_NGINX125_vs_127_ReloadCommandCompatible_E2E
nginx -s reload semantics are identical between 1.25 LTS and
1.27 stable. No per-version branch needed in operator config.
High-concurrency deploy under load
TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E
NGINX's worker handoff during reload is graceful; concurrent TLS handshakes during a deploy succeed without 5xx errors.
Troubleshooting matrix
| Symptom | Test name | Root cause | Operator action |
|---|---|---|---|
| Old cert returned 5min after deploy | SSLSessionCacheHoldsOldCert_E2E |
session cache TTL | tune ssl_session_timeout |
| Wrong vhost serves new cert | SNIMultiServerName_E2E |
misconfigured server_name selector | verify vhost metadata |
| Post-verify fails on IPv6 | IPv6DualStackBindsBoth_E2E |
flaky DNS resolution | PostDeployVerifyAttempts: 5 |
| Connection drops on cert change | n/a | using restart instead of reload | use nginx -s reload |
Deploy aborts with nginx -t error |
ConfigSyntaxError_RollbackRestoresPreviousCert_E2E |
bad config (not deploy's fault) | fix config; redeploy |
| Chain-validation failure post-deploy | MissingIntermediate_E2E |
leaf-only cert | include full chain in deploy |
V3-Pro deferrals
- Pin NGINX
ssl_session_ticket_keyrotation interaction with cert rotation (rare; documented but not tested). - NGINX Plus
dyn_pemAPI integration (commercial; not V2 scope).
Related docs
- Atomic deploy + post-verify + rollback — the Bundle I primitive every connector consumes.
- Vendor compatibility matrix
- Connectors reference