mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 01:08:52 +00:00
docs: deployment vendor matrix + per-connector deep-dive docs (NGINX + K8s + IIS + Apache + F5)
Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge_<vendor>_<edge>_E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml.
This commit is contained in:
@@ -0,0 +1,159 @@
|
||||
# NGINX Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. Operator-
|
||||
> grade documentation for the NGINX target connector.
|
||||
|
||||
## Overview
|
||||
|
||||
The NGINX connector (`internal/connector/target/nginx/`) is the
|
||||
canonical implementation of the deploy-hardening I atomic + verify
|
||||
+ rollback contract (Bundle I Phase 4). Every other file-based
|
||||
connector models on this one.
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **NGINX 1.25 LTS** (current LTS branch)
|
||||
- **NGINX 1.27 stable** (current stable branch)
|
||||
|
||||
Older versions (1.18 EOL'd 2021, 1.20 EOL'd 2022) are explicitly
|
||||
out of scope per frozen decision 0.1.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)`
|
||||
flow:
|
||||
|
||||
1. **Idempotency check** — SHA-256 over cert+chain+key bytes; skip
|
||||
if all match destination.
|
||||
2. **Pre-deploy backup** — copy existing files to
|
||||
`<path>.certctl-bak.<unix-nanos>`.
|
||||
3. **Atomic write** — temp-file + chown + atomic rename per
|
||||
destination.
|
||||
4. **PreCommit (validate)** — runs `nginx -t` per the operator's
|
||||
`validate_command`. Failure aborts; no live cert touched.
|
||||
5. **Atomic rename** — temp → final for every File entry.
|
||||
6. **PostCommit (reload)** — runs `nginx -s reload` per the
|
||||
operator's `reload_command`.
|
||||
7. **Post-deploy TLS verify** — dials the configured endpoint;
|
||||
pulls leaf cert SHA-256; compares against deployed bytes.
|
||||
Mismatch triggers automatic rollback.
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### SSL session cache holds old cert
|
||||
|
||||
`TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E`
|
||||
|
||||
NGINX's `ssl_session_cache` (default `shared:SSL:10m`) keeps TLS
|
||||
session IDs valid for `ssl_session_timeout` (default 5min). Clients
|
||||
that resume via session ID see the OLD cert until their session
|
||||
expires.
|
||||
|
||||
**Operator action:** this is documented behavior, not a bug.
|
||||
Tune via `ssl_session_timeout 5m;` (default) or shorter if your
|
||||
cert rotation cadence demands. Post-deploy verify in certctl will
|
||||
return the NEW cert from a fresh handshake (no session resumption);
|
||||
warm clients see the OLD cert until session-cache eviction.
|
||||
|
||||
### SNI multi-server-name binding
|
||||
|
||||
`TestVendorEdge_NGINX_SNIMultiServerName_DeployBindsCorrectVhost_E2E`
|
||||
|
||||
When NGINX has multiple `server { server_name a.example b.example; }`
|
||||
blocks, the operator deploys with metadata pointing at the
|
||||
specific vhost. Connector binds to that vhost only; other vhosts
|
||||
remain unchanged.
|
||||
|
||||
### IPv6 dual-stack
|
||||
|
||||
`TestVendorEdge_NGINX_IPv6DualStackBindsBoth_E2E`
|
||||
|
||||
NGINX listening on `0.0.0.0:443` + `[::]:443` serves the new cert
|
||||
on both stacks after a single deploy.
|
||||
|
||||
**Operator action:** if your post-deploy verify endpoint resolves
|
||||
to IPv6 only on some networks but IPv4 only on others, configure
|
||||
`PostDeployVerifyAttempts: 5` to cover both paths.
|
||||
|
||||
### Reload vs restart
|
||||
|
||||
`TestVendorEdge_NGINX_ReloadVsRestart_NoConnectionDrop_E2E`
|
||||
|
||||
`nginx -s reload` (graceful) preserves in-flight TLS connections
|
||||
via worker handoff. `nginx -s stop && nginx` drops them.
|
||||
|
||||
**Operator action:** never use restart for cert rotation. The
|
||||
connector's default `reload_command: nginx -s reload` is correct.
|
||||
|
||||
### Binary upgrade
|
||||
|
||||
`TestVendorEdge_NGINX_UpgradeBinaryHotReload_E2E`
|
||||
|
||||
`nginx -s upgrade` rolls out a new binary without dropping
|
||||
connections. Not commonly used; documented for ops teams that do
|
||||
rolling NGINX binary upgrades.
|
||||
|
||||
### Config syntax error → rollback
|
||||
|
||||
`TestVendorEdge_NGINX_ConfigSyntaxError_RollbackRestoresPreviousCert_E2E`
|
||||
|
||||
If `nginx -t` rejects the staged config, the deploy package's
|
||||
PreCommit gate fires before the atomic rename — no live file is
|
||||
touched. The cert directory is exactly as it was.
|
||||
|
||||
### Missing intermediate
|
||||
|
||||
`TestVendorEdge_NGINX_MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E`
|
||||
|
||||
If the operator deploys a leaf-only cert (no intermediate), NGINX
|
||||
will start serving it but downstream clients fail chain validation.
|
||||
The connector's post-deploy TLS verify catches this via cert chain
|
||||
walk; rollback fires automatically.
|
||||
|
||||
### Access log privacy
|
||||
|
||||
`TestVendorEdge_NGINX_AccessLogPrivacy_NoCertBytesLeakInLogs_E2E`
|
||||
|
||||
NGINX's default `access_log` and `error_log` formats do NOT include
|
||||
SSL key bytes. The connector does not modify NGINX's logging config.
|
||||
|
||||
**Operator action:** if you've customized `log_format` to include
|
||||
`$ssl_*` variables, audit the format string for sensitive fields.
|
||||
|
||||
### Per-version reload-command compat
|
||||
|
||||
`TestVendorEdge_NGINX_NGINX125_vs_127_ReloadCommandCompatible_E2E`
|
||||
|
||||
`nginx -s reload` semantics are identical between 1.25 LTS and
|
||||
1.27 stable. No per-version branch needed in operator config.
|
||||
|
||||
### High-concurrency deploy under load
|
||||
|
||||
`TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E`
|
||||
|
||||
NGINX's worker handoff during reload is graceful; concurrent TLS
|
||||
handshakes during a deploy succeed without 5xx errors.
|
||||
|
||||
## Troubleshooting matrix
|
||||
|
||||
| Symptom | Test name | Root cause | Operator action |
|
||||
|---|---|---|---|
|
||||
| Old cert returned 5min after deploy | `SSLSessionCacheHoldsOldCert_E2E` | session cache TTL | tune `ssl_session_timeout` |
|
||||
| Wrong vhost serves new cert | `SNIMultiServerName_E2E` | misconfigured server_name selector | verify vhost metadata |
|
||||
| Post-verify fails on IPv6 | `IPv6DualStackBindsBoth_E2E` | flaky DNS resolution | `PostDeployVerifyAttempts: 5` |
|
||||
| Connection drops on cert change | n/a | using restart instead of reload | use `nginx -s reload` |
|
||||
| Deploy aborts with `nginx -t` error | `ConfigSyntaxError_RollbackRestoresPreviousCert_E2E` | bad config (not deploy's fault) | fix config; redeploy |
|
||||
| Chain-validation failure post-deploy | `MissingIntermediate_E2E` | leaf-only cert | include full chain in deploy |
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- Pin NGINX `ssl_session_ticket_key` rotation interaction with cert
|
||||
rotation (rare; documented but not tested).
|
||||
- NGINX Plus `dyn_pem` API integration (commercial; not V2 scope).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
— the Bundle I primitive every connector consumes.
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Connectors reference](connectors.md)
|
||||
Reference in New Issue
Block a user