From 0834bc1ad59689673a37da2eac75fffaa125c90f Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Thu, 30 Apr 2026 16:16:48 +0000 Subject: [PATCH] docs: deployment vendor matrix + per-connector deep-dive docs (NGINX + K8s + IIS + Apache + F5) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge___E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml. --- docs/connector-apache.md | 101 +++++++++++++++++++ docs/connector-f5.md | 166 +++++++++++++++++++++++++++++++ docs/connector-iis.md | 130 ++++++++++++++++++++++++ docs/connector-k8s.md | 117 ++++++++++++++++++++++ docs/connector-nginx.md | 159 +++++++++++++++++++++++++++++ docs/deployment-vendor-matrix.md | 91 +++++++++++++++++ 6 files changed, 764 insertions(+) create mode 100644 docs/connector-apache.md create mode 100644 docs/connector-f5.md create mode 100644 docs/connector-iis.md create mode 100644 docs/connector-k8s.md create mode 100644 docs/connector-nginx.md create mode 100644 docs/deployment-vendor-matrix.md diff --git a/docs/connector-apache.md b/docs/connector-apache.md new file mode 100644 index 0000000..2181fcf --- /dev/null +++ b/docs/connector-apache.md @@ -0,0 +1,101 @@ +# Apache httpd Connector — Operator Deep-Dive + +> Per Phase 14 of the deploy-hardening II master bundle. + +## Overview + +The Apache connector (`internal/connector/target/apache/`) deploys +TLS certs to Apache 2.4 LTS via separate cert/chain/key files + +`apachectl configtest` validate + `apachectl graceful` reload. +Mirrors the canonical NGINX template (Bundle I Phase 5). + +## Vendor versions tested + +- **Apache httpd 2.4 LTS** (only LTS branch; 2.6 is dev branch) + +## Per-quirk operator guidance + +### Multi-vhost cert-by-vhost + +`TestVendorEdge_Apache_MultiVhostCertByVhost_DeployIsolated_E2E` + +When Apache has multiple `` blocks each with its own +`SSLCertificateFile`, connector deploys to the matching vhost +only. Other vhosts unchanged. + +### `apachectl graceful-stop` drains cleanly + +`TestVendorEdge_Apache_ApachectlGracefulStop_DrainsCleanly_E2E` + +`apachectl graceful` (the connector default) preserves in-flight +TLS connections. `apachectl restart` drops them. + +### `mod_ssl` absent + +`TestVendorEdge_Apache_ModSSLAbsent_DeployFailsWithActionableError_E2E` + +If `mod_ssl` isn't loaded, `apachectl configtest` fails with +"Invalid command 'SSLCertificateFile'". Connector surfaces this +verbatim — operator action: `LoadModule ssl_module modules/mod_ssl.so`. + +### `.htaccess` interactions + +`TestVendorEdge_Apache_HtaccessRequireSSL_NotImpactedByDeploy_E2E` + +`.htaccess` rules requiring SSL are not impacted by cert rotation. +The `Require` directive evaluates per-request against the +connection's TLS state, not the cert file. + +### Apache 2.4 LTS reload semantics pinned + +`TestVendorEdge_Apache_Apache24LTSReloadSemanticsPinned_E2E` + +`apachectl graceful` semantics stable across 2.4.x patch versions. +No per-version branch needed. + +### Syntax error rollback + +`TestVendorEdge_Apache_SyntaxErrorRollback_E2E` + +`apachectl configtest` failure aborts before atomic rename. Live +cert untouched. + +### Per-vhost key ownership + +`TestVendorEdge_Apache_PerVhostKeyOwnership_E2E` + +When multiple vhosts share the same key file, ownership is +preserved across rotation. When each vhost has its own key, +per-file ownership is preserved per Bundle I Phase 5. + +### Reload preserves connections + +`TestVendorEdge_Apache_ReloadVsRestart_PreservesConnections_E2E` + +In-flight TLS sessions survive `apachectl graceful` worker +swap. Documented in `docs/deployment-atomicity.md`. + +### SNI server_name binding + +`TestVendorEdge_Apache_SNIServerNameDeployBindsCorrect_E2E` + +When deploy specifies `server_name` metadata, connector targets +the matching `` block. + +### Cert chain ordering + +`TestVendorEdge_Apache_ChainOrderingNormalized_E2E` + +Apache requires leaf cert FIRST in `SSLCertificateFile` (or +chain in `SSLCertificateChainFile`). Connector preserves operator- +supplied ordering across rotation. + +## V3-Pro deferrals + +- Apache 2.6 (when it ships LTS). +- mod_md (Apache's built-in ACME) interop. + +## Related docs + +- [Atomic deploy + post-verify + rollback](deployment-atomicity.md) +- [Vendor compatibility matrix](deployment-vendor-matrix.md) diff --git a/docs/connector-f5.md b/docs/connector-f5.md new file mode 100644 index 0000000..1de9c35 --- /dev/null +++ b/docs/connector-f5.md @@ -0,0 +1,166 @@ +# F5 BIG-IP Connector — Operator Deep-Dive + +> Per Phase 14 of the deploy-hardening II master bundle. + +## Overview + +The F5 connector (`internal/connector/target/f5/`) deploys TLS +certs to F5 BIG-IP load balancers via the iControl REST API. +F5's transactional API gives certctl atomic-update semantics for +free at the API level — the Bundle I rollback wire layers +on-failure cleanup of orphaned crypto objects. + +## Vendor versions tested + +- **F5 v15.1 LTS** +- **F5 v17.0 LTS** +- **F5 v17.5** + +## Two-tier validation strategy (frozen decision 0.3) + +1. **CI tier**: `f5-mock-icontrol` sidecar — in-tree Go server at + `deploy/test/f5-mock-icontrol/` implementing the iControl REST + surface this bundle exercises (auth, file upload, transactions, + SSL profile CRUD). All `TestVendorEdge_F5_*_E2E` tests run + against this in CI. +2. **Customer-grade tier**: operator-supplied real F5 vagrant box. + Documented setup recipe below. Manual smoke required for + "verified" status in `docs/deployment-vendor-matrix.md`. + +The mock implements a SUBSET of iControl REST. A real F5 may +diverge on quirks the mock doesn't model. Customer-grade +validation against the vagrant box is the validation tier above +the mock. + +## Setting up the operator-supplied real F5 + +```bash +# F5 Networks publishes BIG-IP VE (Virtual Edition) on: +# https://downloads.f5.com → BIG-IP VE → 17.5.0 → Vagrant +# Download the .box file (requires F5 account; free tier ok). +vagrant box add f5/big-ip-17.5.0 ~/Downloads/BIGIP-17.5.0.0.0.box +vagrant init f5/big-ip-17.5.0 +vagrant up + +# Then point certctl at vagrant's mapped management interface: +# https://localhost:8443 with admin/ +# Per-target Config: +# Host: "localhost" +# Port: 8443 +# Username: "admin" +# Password: "" +``` + +Run the F5 vendor-edge tests against the real F5 by setting: + +``` +F5_REAL_HOST=localhost:8443 \ +F5_REAL_USER=admin \ +F5_REAL_PASS= \ +INTEGRATION=1 go test -tags integration \ + -run 'TestVendorEdge_F5' ./deploy/test/... +``` + +(Test bodies opt into the real-F5 path when these env vars are +set; otherwise default to the mock sidecar.) + +## Per-quirk operator guidance + +### SSL profile reference counting + +`TestVendorEdge_F5_SSLProfileReferenceCounting_TransactionWithNVS_AtomicCommit_E2E` + +When a transaction binds the new SSL profile to N virtual +servers, F5 commits all N atomically. Failure aborts all N. + +### Client SSL vs server SSL profile + +`TestVendorEdge_F5_ClientSSLProfileVsServerSSLProfile_DeployUpdatesCorrect_E2E` + +F5 has separate `client-ssl` profiles (terminating TLS from clients) +and `server-ssl` profiles (originating TLS to backends). Connector +targets the operator-named profile only. + +### Partition handling + +`TestVendorEdge_F5_PartitionCommonVsCustom_DeployRespectsPartition_E2E` + +F5 partitions namespace objects (Common, custom-tenant). Connector +respects the operator-supplied `Partition`. + +### v15 vs v17 API stability + +`TestVendorEdge_F5_F5v15_vs_v17_TransactionAPIShapeStable_E2E` + +`mgmt/tm/transaction` API shape stable across v15.1 LTS and v17.x. +No per-version branch needed. + +### Large cert chain (>4 links) + +`TestVendorEdge_F5_LargeCertChainHandling_E2E` + +v15.x had a known issue with cert chains >4 links (silent +truncation of the deep links). v17.x lifted this limit. + +**Operator action:** if on v15.x, keep chains ≤4 links OR upgrade +to v17.x. Documented loud in this doc. + +### Auth token expiry + +`TestVendorEdge_F5_AuthTokenExpiryRefresh_E2E` + +F5 auth tokens expire (default 1200s). Connector re-authenticates +on 401 transparently. + +### Transaction timeout cleanup + +`TestVendorEdge_F5_TransactionTimeoutCleanup_E2E` + +Open transactions timeout after 120s. Bundle I rollback wire +catches orphaned crypto objects (uploaded files not committed via +transaction). + +### Same-VS update + +`TestVendorEdge_F5_VirtualServerBindingOnSameVS_E2E` + +Re-binding an SSL profile on the same Virtual Server is atomic +at the F5 API level. No listener disruption. + +### SSL options preservation + +`TestVendorEdge_F5_SSLOptionsPreservedAcrossRotation_E2E` + +Operator-supplied `cipher-list`, `no-tls-v1`, `secure-renegotiate` +options on the SSL profile preserved across cert rotation. + +### iControl REST rate limit + +`TestVendorEdge_F5_iControlRESTRateLimit_E2E` + +F5 iControl REST defaults to 100 req/s. Connector backs off on +429 with exponential retry. + +## Troubleshooting matrix + +| Symptom | Test name | Operator action | +|---|---|---| +| Cert deploys but only 4 chain links served | `LargeCertChainHandling_E2E` | upgrade to v17.x or shorten chain | +| Frequent 401 retries | `AuthTokenExpiryRefresh_E2E` | benign; tune token lifetime if needed | +| Orphaned `/Common/cert-` objects | `TransactionTimeoutCleanup_E2E` | run cleanup script; check for hung deploys | +| Wrong partition deployed to | `PartitionCommonVsCustom_E2E` | verify `Partition` in connector config | +| Cipher list reset post-rotate | `SSLOptionsPreservedAcrossRotation_E2E` | bug — file an issue | + +## V3-Pro deferrals + +- F5 GTM (DNS-load-balancer cert deploys). +- F5 NGINX Plus cert deploy via the F5 API (when F5 ships the + unified API). +- AS3 declarative deploy (operator-friendly JSON declaration vs + the imperative iControl REST flow). + +## Related docs + +- [Atomic deploy + post-verify + rollback](deployment-atomicity.md) +- [Vendor compatibility matrix](deployment-vendor-matrix.md) +- F5 official iControl REST docs: diff --git a/docs/connector-iis.md b/docs/connector-iis.md new file mode 100644 index 0000000..9446226 --- /dev/null +++ b/docs/connector-iis.md @@ -0,0 +1,130 @@ +# Microsoft IIS Connector — Operator Deep-Dive + +> Per Phase 14 of the deploy-hardening II master bundle. + +## Overview + +The IIS connector (`internal/connector/target/iis/`) deploys TLS +certs to Windows IIS servers via PowerShell (`Import-PfxCertificate` ++ `New-WebBinding` + SNI binding). Pre-deploy snapshot of the +existing thumbprint allows rollback if the new binding fails. + +## Vendor versions tested + +- **Windows Server 2019** with IIS 10 +- **Windows Server 2022** with IIS 10 + +## CI runner constraint + +Per frozen decision 0.4: Windows containers run only on Windows +hosts. Linux CI runners CAN'T run the IIS sidecar. IIS e2e tests +run on a separate `windows-vendor-e2e` GitHub Actions matrix job +on `windows-latest` runners. Operators on Linux-only CI use +`//go:build integration && !no_iis` to skip. + +## Per-quirk operator guidance + +### App-pool recycle (opt-in) + +`TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E` + +By default, IIS picks up new SSL bindings without app-pool +recycle (the binding-edit path is hot). Some sites need recycle +to fully reload (e.g., apps that cache cert handles). + +**Operator action:** set `AppPoolRecycle: true` per-target. The +connector then runs `Restart-WebAppPool ` after binding update. + +### SNI multi-binding per site + +`TestVendorEdge_IIS_SNIMultiBindingPerSite_DeployUpdatesCorrectBinding_E2E` + +When a site has multiple SNI bindings (different hostnames on +the same site), connector targets the binding matching the +operator-supplied hostname. Other bindings unchanged. + +### CCS (Centralized Certificate Store) + +`TestVendorEdge_IIS_CCSCentralizedCertStoreVariant_DeployToSharedStore_E2E` + +CCS is the file-based variant where multiple IIS servers share +a UNC path of cert files. Connector writes to the shared path; +all IIS servers pick it up automatically. + +### WinRM remote vs local PowerShell + +`TestVendorEdge_IIS_WinRMRemotePath_vs_LocalPowerShellPath_BothWork_E2E` + +Two code paths produce equivalent cert installs: +- `WinRMHost: ""` → local PowerShell (agent runs on the IIS server) +- `WinRMHost: "iis.example"` → remote PowerShell via WinRM + +Both rotate the same way. WinRM path requires network reachability +to port 5985/5986. + +### Server 2019 vs 2022 PowerShell compat + +`TestVendorEdge_IIS_WindowsServer2019_vs_2022_PowerShellCompat_E2E` + +`Import-PfxCertificate` + `New-WebBinding` semantics are stable +across server versions. PowerShell 5.1 (2019) + PowerShell 7.x +(2022) both work. + +### Friendly name + +`TestVendorEdge_IIS_FriendlyNameUpdatedOnRotation_E2E` + +Connector preserves operator-supplied `FriendlyName` on the cert +across rotation. Useful for IIS GUI identification. + +### HTTP/2 + ALPN + +`TestVendorEdge_IIS_HTTP2ALPNPreserved_E2E` + +IIS h2 negotiation preserved across cert rotation. The +`netsh http show sslcert` ALPN attribute survives the binding swap. + +### Binding-type validation + +`TestVendorEdge_IIS_BindingTypeHttpsValidated_E2E` + +Connector refuses to deploy to non-`https` bindings (e.g., `http`, +`net.tcp`). Surfaces actionable error. + +### ARR reverse-proxy + +`TestVendorEdge_IIS_ARRReverseProxyCertRotation_E2E` + +Sites using Application Request Routing as reverse proxy: cert +rotation does not invalidate ARR routes. The cert-binding edit +is independent of the ARR config. + +### Atomic SNI binding swap + +`TestVendorEdge_IIS_RemovePreviousBindingOnRotate_E2E` + +Connector removes the previous SNI binding BEFORE inserting the +new one (atomicity at the IIS API level). Prevents brief +window where two bindings serve different certs for the same +hostname. + +## Troubleshooting matrix + +| Symptom | Test name | Operator action | +|---|---|---| +| Cert installed but app pool serving old cert | `AppPoolRecycle_OptInForCertChange_E2E` | set `AppPoolRecycle: true` | +| Wrong SNI binding updated | `SNIMultiBindingPerSite_E2E` | verify hostname selector | +| Permission denied on cert install | n/a | agent must run as administrator | +| WinRM connection failed | `WinRMRemotePath_vs_LocalPowerShellPath_E2E` | check WinRM port 5985/5986 reachability | +| h2 negotiation broken post-rotate | `HTTP2ALPNPreserved_E2E` | re-run `netsh http add sslcert` with `appid + clientcertnegotiation=enable` | + +## V3-Pro deferrals + +- IIS Application Initialization module integration (warm cert + cache after rotation). +- Azure Key Vault + IIS integration (operator opt-in). + +## Related docs + +- [Atomic deploy + post-verify + rollback](deployment-atomicity.md) +- [Vendor compatibility matrix](deployment-vendor-matrix.md) diff --git a/docs/connector-k8s.md b/docs/connector-k8s.md new file mode 100644 index 0000000..7550125 --- /dev/null +++ b/docs/connector-k8s.md @@ -0,0 +1,117 @@ +# Kubernetes Secrets Connector — Operator Deep-Dive + +> Per Phase 14 of the deploy-hardening II master bundle. + +## Overview + +The K8s connector (`internal/connector/target/k8ssecret/`) deploys +TLS certs into `kubernetes.io/tls` Secrets. Atomic at the API +server level (Update is transactional); the post-deploy verify +SHA-256-compares the returned Secret data against deployed bytes +(defends against admission webhooks that modify cert data). + +## Vendor versions tested + +- **Kubernetes 1.28 LTS** +- **Kubernetes 1.30** +- **Kubernetes 1.31** (current stable) + +## Per-quirk operator guidance + +### Kubelet sync wait contract + +`TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E` + +After Secret update, kubelet projects new cert bytes into +pod-mounted volumes. Default sync interval ~60s. The connector +waits up to `CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT` (default +60s). + +**Operator action:** for slow clusters (large pod count, slow +node DNS), tune the env var upward. For fast clusters, the +default is fine. + +### Admission webhook mutation + +`TestVendorEdge_K8s_AdmissionWebhookModifiesSecretData_DeployDetectsViaSHA256Compare_E2E` + +Some admission webhooks (Vault Agent Injector, OPA Gatekeeper) +mutate Secret data on Update. The connector pulls the Secret +back after Update and SHA-256-compares against deployed bytes. +Mismatch surfaces as deploy failure. + +### Multi-version API stability + +`TestVendorEdge_K8s_K8s128LTS_vs_130_vs_131_SecretAPIContractStable_E2E` + +`kubernetes.io/tls` Secret schema (data.tls.crt + data.tls.key) +is stable across 1.28-1.31. No per-version branch needed. + +### Typed vs Opaque Secret + +`TestVendorEdge_K8s_TypedKubernetesIOTLSVsUntypedOpaque_DeployRespectsType_E2E` + +Connector preserves operator-supplied Secret type. Typed +`kubernetes.io/tls` is the canonical form; untyped `Opaque` is +preserved for operators with legacy automation that expects it. + +### Cert-manager interop + +`TestVendorEdge_K8s_CertManagerInterop_RawSecretVsCertificateCRD_E2E` + +Connector targets raw Secrets, NOT cert-manager `Certificate` CRs. +Operators using cert-manager should NOT also point certctl at the +same Secret name (cert-manager will overwrite). Documented +coexistence: certctl handles non-cert-manager Secrets; +cert-manager handles its own. + +### Multi-namespace + +`TestVendorEdge_K8s_MultiNamespaceDeploy_DeployUpdatesCorrectNamespace_E2E` + +Connector targets the configured `Namespace` only. Cross-namespace +deploys require multiple connector entries. + +### RBAC errors + +`TestVendorEdge_K8s_RBACInsufficientPermissions_DeployFailsWithActionableError_E2E` + +Connector surfaces the K8s API's `forbidden: secrets is restricted` +error verbatim. Operator action: bind a Role with +`secrets: get,update,create` verbs to the agent's ServiceAccount. + +### Labels + annotations preservation + +`TestVendorEdge_K8s_LabelsAnnotationsPreserved_E2E` + +Connector merges (not replaces) operator-supplied metadata. Custom +labels/annotations on the Secret survive cert rotation. + +### Pod-mounted Secret rollover + +`TestVendorEdge_K8s_PodMountedSecretRollover_E2E` + +When a pod mounts the Secret as a volume, kubelet projects new +cert bytes into the pod's filesystem after sync. Pods watching +the file (via inotify or polling) pick up the new cert without +restart. + +### Immutable Secret flag + +`TestVendorEdge_K8s_ImmutableSecretFlag_E2E` + +K8s Secrets can be marked `immutable: true` for performance. +Update fails with actionable error; operator must drop the flag, +update, then re-apply if desired. + +## V3-Pro deferrals + +- cert-manager `Certificate` CR interop as first-class deploy + target (V3-Pro: certctl as cert-manager external issuer). +- Multi-cluster federation (deploy a single cert across N + clusters with single connector entry). + +## Related docs + +- [Atomic deploy + post-verify + rollback](deployment-atomicity.md) +- [Vendor compatibility matrix](deployment-vendor-matrix.md) diff --git a/docs/connector-nginx.md b/docs/connector-nginx.md new file mode 100644 index 0000000..cfd21a6 --- /dev/null +++ b/docs/connector-nginx.md @@ -0,0 +1,159 @@ +# NGINX Connector — Operator Deep-Dive + +> Per Phase 14 of the deploy-hardening II master bundle. Operator- +> grade documentation for the NGINX target connector. + +## Overview + +The NGINX connector (`internal/connector/target/nginx/`) is the +canonical implementation of the deploy-hardening I atomic + verify ++ rollback contract (Bundle I Phase 4). Every other file-based +connector models on this one. + +## Vendor versions tested + +- **NGINX 1.25 LTS** (current LTS branch) +- **NGINX 1.27 stable** (current stable branch) + +Older versions (1.18 EOL'd 2021, 1.20 EOL'd 2022) are explicitly +out of scope per frozen decision 0.1. + +## Deploy contract + +Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)` +flow: + +1. **Idempotency check** — SHA-256 over cert+chain+key bytes; skip + if all match destination. +2. **Pre-deploy backup** — copy existing files to + `.certctl-bak.`. +3. **Atomic write** — temp-file + chown + atomic rename per + destination. +4. **PreCommit (validate)** — runs `nginx -t` per the operator's + `validate_command`. Failure aborts; no live cert touched. +5. **Atomic rename** — temp → final for every File entry. +6. **PostCommit (reload)** — runs `nginx -s reload` per the + operator's `reload_command`. +7. **Post-deploy TLS verify** — dials the configured endpoint; + pulls leaf cert SHA-256; compares against deployed bytes. + Mismatch triggers automatic rollback. + +## Per-quirk operator guidance + +### SSL session cache holds old cert + +`TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E` + +NGINX's `ssl_session_cache` (default `shared:SSL:10m`) keeps TLS +session IDs valid for `ssl_session_timeout` (default 5min). Clients +that resume via session ID see the OLD cert until their session +expires. + +**Operator action:** this is documented behavior, not a bug. +Tune via `ssl_session_timeout 5m;` (default) or shorter if your +cert rotation cadence demands. Post-deploy verify in certctl will +return the NEW cert from a fresh handshake (no session resumption); +warm clients see the OLD cert until session-cache eviction. + +### SNI multi-server-name binding + +`TestVendorEdge_NGINX_SNIMultiServerName_DeployBindsCorrectVhost_E2E` + +When NGINX has multiple `server { server_name a.example b.example; }` +blocks, the operator deploys with metadata pointing at the +specific vhost. Connector binds to that vhost only; other vhosts +remain unchanged. + +### IPv6 dual-stack + +`TestVendorEdge_NGINX_IPv6DualStackBindsBoth_E2E` + +NGINX listening on `0.0.0.0:443` + `[::]:443` serves the new cert +on both stacks after a single deploy. + +**Operator action:** if your post-deploy verify endpoint resolves +to IPv6 only on some networks but IPv4 only on others, configure +`PostDeployVerifyAttempts: 5` to cover both paths. + +### Reload vs restart + +`TestVendorEdge_NGINX_ReloadVsRestart_NoConnectionDrop_E2E` + +`nginx -s reload` (graceful) preserves in-flight TLS connections +via worker handoff. `nginx -s stop && nginx` drops them. + +**Operator action:** never use restart for cert rotation. The +connector's default `reload_command: nginx -s reload` is correct. + +### Binary upgrade + +`TestVendorEdge_NGINX_UpgradeBinaryHotReload_E2E` + +`nginx -s upgrade` rolls out a new binary without dropping +connections. Not commonly used; documented for ops teams that do +rolling NGINX binary upgrades. + +### Config syntax error → rollback + +`TestVendorEdge_NGINX_ConfigSyntaxError_RollbackRestoresPreviousCert_E2E` + +If `nginx -t` rejects the staged config, the deploy package's +PreCommit gate fires before the atomic rename — no live file is +touched. The cert directory is exactly as it was. + +### Missing intermediate + +`TestVendorEdge_NGINX_MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E` + +If the operator deploys a leaf-only cert (no intermediate), NGINX +will start serving it but downstream clients fail chain validation. +The connector's post-deploy TLS verify catches this via cert chain +walk; rollback fires automatically. + +### Access log privacy + +`TestVendorEdge_NGINX_AccessLogPrivacy_NoCertBytesLeakInLogs_E2E` + +NGINX's default `access_log` and `error_log` formats do NOT include +SSL key bytes. The connector does not modify NGINX's logging config. + +**Operator action:** if you've customized `log_format` to include +`$ssl_*` variables, audit the format string for sensitive fields. + +### Per-version reload-command compat + +`TestVendorEdge_NGINX_NGINX125_vs_127_ReloadCommandCompatible_E2E` + +`nginx -s reload` semantics are identical between 1.25 LTS and +1.27 stable. No per-version branch needed in operator config. + +### High-concurrency deploy under load + +`TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E` + +NGINX's worker handoff during reload is graceful; concurrent TLS +handshakes during a deploy succeed without 5xx errors. + +## Troubleshooting matrix + +| Symptom | Test name | Root cause | Operator action | +|---|---|---|---| +| Old cert returned 5min after deploy | `SSLSessionCacheHoldsOldCert_E2E` | session cache TTL | tune `ssl_session_timeout` | +| Wrong vhost serves new cert | `SNIMultiServerName_E2E` | misconfigured server_name selector | verify vhost metadata | +| Post-verify fails on IPv6 | `IPv6DualStackBindsBoth_E2E` | flaky DNS resolution | `PostDeployVerifyAttempts: 5` | +| Connection drops on cert change | n/a | using restart instead of reload | use `nginx -s reload` | +| Deploy aborts with `nginx -t` error | `ConfigSyntaxError_RollbackRestoresPreviousCert_E2E` | bad config (not deploy's fault) | fix config; redeploy | +| Chain-validation failure post-deploy | `MissingIntermediate_E2E` | leaf-only cert | include full chain in deploy | + +## V3-Pro deferrals + +- Pin NGINX `ssl_session_ticket_key` rotation interaction with cert + rotation (rare; documented but not tested). +- NGINX Plus `dyn_pem` API integration (commercial; not V2 scope). + +## Related docs + +- [Atomic deploy + post-verify + rollback](deployment-atomicity.md) + — the Bundle I primitive every connector consumes. +- [Vendor compatibility matrix](deployment-vendor-matrix.md) +- [Connectors reference](connectors.md) diff --git a/docs/deployment-vendor-matrix.md b/docs/deployment-vendor-matrix.md new file mode 100644 index 0000000..d4a9afa --- /dev/null +++ b/docs/deployment-vendor-matrix.md @@ -0,0 +1,91 @@ +# Deployment Vendor Compatibility Matrix + +> Deploy-hardening II master bundle deliverable. The procurement-team +> headline doc — SOC 2 / PCI auditors paste this into evidence packs. +> Per frozen decision 0.14: a (connector × vendor-version) cell is +> "verified" only when ALL apply: ≥1 happy-path e2e passes against +> the real sidecar; ≥1 specific-quirk test for that version passes; +> operator manual smoke completed at least once on a real (non-CI) +> instance of that vendor version. + +## Status legend + +- **✓** — verified per the three-criterion bar above +- **CI** — happy-path + quirk e2e green in CI; operator manual smoke + pending (the third criterion) +- **mock** — verified against the in-tree mock; real-vendor validation + is the operator's tier above +- **pending** — planned; tests written; sidecar not yet wired +- **n/a** — combination not applicable + +Per frozen decision 0.1: only LTS + current-stable versions per +vendor. EOL versions explicitly excluded. + +## Matrix + +| Connector | Vendor | Version | Status | Known Issues | Workaround | E2E Test Name(s) | +|---|---|---|---|---|---|---| +| **NGINX** | nginx.org | 1.25 LTS | CI | SSL session cache holds old cert ~5min | `ssl_session_timeout 5m;` (default) — operator-tunable | `TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E` | +| NGINX | nginx.org | 1.27 stable | CI | (same) | (same) | (same) | +| **Apache httpd** | httpd.apache.org | 2.4 LTS | CI | mod_ssl multi-vhost ownership | per-vhost cert config; SSLCertificateFile per `` | `TestVendorEdge_Apache_MultiVhostCertByVhost_E2E` | +| **HAProxy** | haproxy.org | 2.6 LTS | CI | reload vs restart semantics | use `systemctl reload haproxy` not `restart` | `TestVendorEdge_HAProxy_ReloadPreservesConnectionsViaSocketActivation_E2E` | +| HAProxy | haproxy.org | 2.8 | CI | (same) | (same) | (same) | +| HAProxy | haproxy.org | 3.0 | CI | (same) | (same) | (same) | +| **Traefik** | traefik.io | 2.x | CI | static-config cert paths require restart | use dynamic file-provider config | `TestVendorEdge_Traefik_StaticConfigRequiresRestart_DocumentedAsLimitation_E2E` | +| Traefik | traefik.io | 3.x | CI | (same) | (same) | (same) | +| **Caddy** | caddyserver.com | 2.x | CI | admin API auth lockdown breaks default deploy | set `Caddy.AdminAuthorizationHeader` per-target | `TestVendorEdge_Caddy_AdminAPILockedDownWithAuth_DeployUsesConfiguredAuthHeaders_E2E` | +| **Envoy** | envoyproxy.io | 1.30 | CI | file-mode SDS only in V2; gRPC SDS V3-Pro | use SDS=file (default) | `TestVendorEdge_Envoy_SDSFileMode_DeployRewritesYAML_EnvoyHotReloads_E2E` | +| Envoy | envoyproxy.io | 1.32 | CI | (same) | (same) | (same) | +| **Postfix** | postfix.org | 3.6 | CI | per-listener cert binding | configure cert per-listener block | `TestVendorEdge_Postfix_MultiListenerCertBinding_DeployUpdatesCorrectListener_E2E` | +| Postfix | postfix.org | 3.8 | CI | (same) | (same) | (same) | +| **Dovecot** | dovecot.org | 2.3 | CI | submission/submissions port variants | configure both inet_listener blocks | `TestVendorEdge_Dovecot_SubmissionSubmissionsPortVariants_E2E` | +| **IIS** | microsoft.com | IIS 10 (Server 2019) | pending | Windows-host-only CI; app-pool recycle opt-in | `AppPoolRecycle: true` per-target if needed | `TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E` | +| IIS | microsoft.com | IIS 10 (Server 2022) | pending | (same) | (same) | (same) | +| **F5 BIG-IP** | f5.com | v15.1 LTS | mock | larger cert chain (>4 links) historical issue | use cert chain ≤4 links OR upgrade to v17 | `TestVendorEdge_F5_LargeCertChainHandling_E2E` | +| F5 BIG-IP | f5.com | v17.0 | mock | (chain limit lifted) | n/a | (same) | +| F5 BIG-IP | f5.com | v17.5 | mock | (same) | n/a | (same) | +| **SSH** | openssh.com | OpenSSH 8.x | CI | sftp subsystem may be disabled | connector falls back to scp | `TestVendorEdge_SSH_SFTPSubsystemAbsent_FallsBackToSCP_E2E` | +| SSH | openssh.com | OpenSSH 9.x | CI | (same) | (same) | (same) | +| **WinCertStore** | microsoft.com | Windows Server 2019 | pending | cert store ACL: NS vs IIS_IUSRS | configure store ACL per IIS app-pool identity | `TestVendorEdge_WinCertStore_CertStoreACL_NetworkServiceAccess_E2E` | +| WinCertStore | microsoft.com | Windows Server 2022 | pending | (same) | (same) | (same) | +| **JavaKeystore** | adoptium.net | JDK 11 LTS | pending | keytool `-importkeystore` semantics | use `KeytoolPath` config to pin to JDK | `TestVendorEdge_JavaKeystore_JDK11_vs_17_vs_21_KeytoolBehavior_E2E` | +| JavaKeystore | adoptium.net | JDK 17 LTS | pending | (same) | (same) | (same) | +| JavaKeystore | adoptium.net | JDK 21 LTS | pending | (same) | (same) | (same) | +| **Kubernetes** | kubernetes.io | 1.28 LTS | CI | kubelet sync ~60s for pod-mounted Secrets | `CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT=60s` (default) | `TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E` | +| Kubernetes | kubernetes.io | 1.30 | CI | (same) | (same) | (same) | +| Kubernetes | kubernetes.io | 1.31 current | CI | (same) | (same) | (same) | + +## Quarterly re-pin cadence + +Every sidecar `FROM` in `deploy/docker-compose.test.yml` carries a +SHA-256 digest pin per the H-001 CI guard. Operator re-pins +quarterly: + +1. Pull the latest tag of each sidecar image. +2. Run the per-vendor e2e matrix against the new digest. +3. If green, update the digest in `docker-compose.test.yml` + this + matrix's "Status" column. +4. If red, file an issue against the connector + leave the digest + pinned to the last-known-good. + +## How to add a new vendor version + +1. Add a new sidecar entry to `deploy/docker-compose.test.yml` with + the new image digest. +2. Add a row to this matrix marking status as "pending". +3. Write `TestVendorEdge___E2E` test(s) that + exercise the vendor's known quirks against the new sidecar. +4. Once tests pass in CI, mark status "CI". +5. After operator manual smoke, mark status "✓". + +## Per-connector deep-dive docs + +For the top 5 most-deployed connectors: + +- [NGINX deep-dive](connector-nginx.md) +- [Kubernetes deep-dive](connector-k8s.md) +- [IIS deep-dive](connector-iis.md) +- [Apache deep-dive](connector-apache.md) +- [F5 deep-dive](connector-f5.md) + +Other connector docs live in [docs/connectors.md](connectors.md).