mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 22:21:30 +00:00
0834bc1ad5
Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge_<vendor>_<edge>_E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml.
160 lines
5.9 KiB
Markdown
160 lines
5.9 KiB
Markdown
# NGINX Connector — Operator Deep-Dive
|
|
|
|
> Per Phase 14 of the deploy-hardening II master bundle. Operator-
|
|
> grade documentation for the NGINX target connector.
|
|
|
|
## Overview
|
|
|
|
The NGINX connector (`internal/connector/target/nginx/`) is the
|
|
canonical implementation of the deploy-hardening I atomic + verify
|
|
+ rollback contract (Bundle I Phase 4). Every other file-based
|
|
connector models on this one.
|
|
|
|
## Vendor versions tested
|
|
|
|
- **NGINX 1.25 LTS** (current LTS branch)
|
|
- **NGINX 1.27 stable** (current stable branch)
|
|
|
|
Older versions (1.18 EOL'd 2021, 1.20 EOL'd 2022) are explicitly
|
|
out of scope per frozen decision 0.1.
|
|
|
|
## Deploy contract
|
|
|
|
Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)`
|
|
flow:
|
|
|
|
1. **Idempotency check** — SHA-256 over cert+chain+key bytes; skip
|
|
if all match destination.
|
|
2. **Pre-deploy backup** — copy existing files to
|
|
`<path>.certctl-bak.<unix-nanos>`.
|
|
3. **Atomic write** — temp-file + chown + atomic rename per
|
|
destination.
|
|
4. **PreCommit (validate)** — runs `nginx -t` per the operator's
|
|
`validate_command`. Failure aborts; no live cert touched.
|
|
5. **Atomic rename** — temp → final for every File entry.
|
|
6. **PostCommit (reload)** — runs `nginx -s reload` per the
|
|
operator's `reload_command`.
|
|
7. **Post-deploy TLS verify** — dials the configured endpoint;
|
|
pulls leaf cert SHA-256; compares against deployed bytes.
|
|
Mismatch triggers automatic rollback.
|
|
|
|
## Per-quirk operator guidance
|
|
|
|
### SSL session cache holds old cert
|
|
|
|
`TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E`
|
|
|
|
NGINX's `ssl_session_cache` (default `shared:SSL:10m`) keeps TLS
|
|
session IDs valid for `ssl_session_timeout` (default 5min). Clients
|
|
that resume via session ID see the OLD cert until their session
|
|
expires.
|
|
|
|
**Operator action:** this is documented behavior, not a bug.
|
|
Tune via `ssl_session_timeout 5m;` (default) or shorter if your
|
|
cert rotation cadence demands. Post-deploy verify in certctl will
|
|
return the NEW cert from a fresh handshake (no session resumption);
|
|
warm clients see the OLD cert until session-cache eviction.
|
|
|
|
### SNI multi-server-name binding
|
|
|
|
`TestVendorEdge_NGINX_SNIMultiServerName_DeployBindsCorrectVhost_E2E`
|
|
|
|
When NGINX has multiple `server { server_name a.example b.example; }`
|
|
blocks, the operator deploys with metadata pointing at the
|
|
specific vhost. Connector binds to that vhost only; other vhosts
|
|
remain unchanged.
|
|
|
|
### IPv6 dual-stack
|
|
|
|
`TestVendorEdge_NGINX_IPv6DualStackBindsBoth_E2E`
|
|
|
|
NGINX listening on `0.0.0.0:443` + `[::]:443` serves the new cert
|
|
on both stacks after a single deploy.
|
|
|
|
**Operator action:** if your post-deploy verify endpoint resolves
|
|
to IPv6 only on some networks but IPv4 only on others, configure
|
|
`PostDeployVerifyAttempts: 5` to cover both paths.
|
|
|
|
### Reload vs restart
|
|
|
|
`TestVendorEdge_NGINX_ReloadVsRestart_NoConnectionDrop_E2E`
|
|
|
|
`nginx -s reload` (graceful) preserves in-flight TLS connections
|
|
via worker handoff. `nginx -s stop && nginx` drops them.
|
|
|
|
**Operator action:** never use restart for cert rotation. The
|
|
connector's default `reload_command: nginx -s reload` is correct.
|
|
|
|
### Binary upgrade
|
|
|
|
`TestVendorEdge_NGINX_UpgradeBinaryHotReload_E2E`
|
|
|
|
`nginx -s upgrade` rolls out a new binary without dropping
|
|
connections. Not commonly used; documented for ops teams that do
|
|
rolling NGINX binary upgrades.
|
|
|
|
### Config syntax error → rollback
|
|
|
|
`TestVendorEdge_NGINX_ConfigSyntaxError_RollbackRestoresPreviousCert_E2E`
|
|
|
|
If `nginx -t` rejects the staged config, the deploy package's
|
|
PreCommit gate fires before the atomic rename — no live file is
|
|
touched. The cert directory is exactly as it was.
|
|
|
|
### Missing intermediate
|
|
|
|
`TestVendorEdge_NGINX_MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E`
|
|
|
|
If the operator deploys a leaf-only cert (no intermediate), NGINX
|
|
will start serving it but downstream clients fail chain validation.
|
|
The connector's post-deploy TLS verify catches this via cert chain
|
|
walk; rollback fires automatically.
|
|
|
|
### Access log privacy
|
|
|
|
`TestVendorEdge_NGINX_AccessLogPrivacy_NoCertBytesLeakInLogs_E2E`
|
|
|
|
NGINX's default `access_log` and `error_log` formats do NOT include
|
|
SSL key bytes. The connector does not modify NGINX's logging config.
|
|
|
|
**Operator action:** if you've customized `log_format` to include
|
|
`$ssl_*` variables, audit the format string for sensitive fields.
|
|
|
|
### Per-version reload-command compat
|
|
|
|
`TestVendorEdge_NGINX_NGINX125_vs_127_ReloadCommandCompatible_E2E`
|
|
|
|
`nginx -s reload` semantics are identical between 1.25 LTS and
|
|
1.27 stable. No per-version branch needed in operator config.
|
|
|
|
### High-concurrency deploy under load
|
|
|
|
`TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E`
|
|
|
|
NGINX's worker handoff during reload is graceful; concurrent TLS
|
|
handshakes during a deploy succeed without 5xx errors.
|
|
|
|
## Troubleshooting matrix
|
|
|
|
| Symptom | Test name | Root cause | Operator action |
|
|
|---|---|---|---|
|
|
| Old cert returned 5min after deploy | `SSLSessionCacheHoldsOldCert_E2E` | session cache TTL | tune `ssl_session_timeout` |
|
|
| Wrong vhost serves new cert | `SNIMultiServerName_E2E` | misconfigured server_name selector | verify vhost metadata |
|
|
| Post-verify fails on IPv6 | `IPv6DualStackBindsBoth_E2E` | flaky DNS resolution | `PostDeployVerifyAttempts: 5` |
|
|
| Connection drops on cert change | n/a | using restart instead of reload | use `nginx -s reload` |
|
|
| Deploy aborts with `nginx -t` error | `ConfigSyntaxError_RollbackRestoresPreviousCert_E2E` | bad config (not deploy's fault) | fix config; redeploy |
|
|
| Chain-validation failure post-deploy | `MissingIntermediate_E2E` | leaf-only cert | include full chain in deploy |
|
|
|
|
## V3-Pro deferrals
|
|
|
|
- Pin NGINX `ssl_session_ticket_key` rotation interaction with cert
|
|
rotation (rare; documented but not tested).
|
|
- NGINX Plus `dyn_pem` API integration (commercial; not V2 scope).
|
|
|
|
## Related docs
|
|
|
|
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
|
— the Bundle I primitive every connector consumes.
|
|
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
|
- [Connectors reference](connectors.md)
|