mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:01:30 +00:00
docs: deployment vendor matrix + per-connector deep-dive docs (NGINX + K8s + IIS + Apache + F5)
Phase 14 of the deploy-hardening II master bundle. The procurement- team headline doc + per-connector operator guides for the top 5 most-deployed connectors. NEW docs/deployment-vendor-matrix.md (~30 rows): - Per (connector × vendor-version) status: ✓ / CI / mock / pending / n/a - Known issues + workarounds + e2e test name reference - LTS + current-stable scope per frozen decision 0.1 - Quarterly re-pin cadence guidance for sidecar digests - "How to add a new vendor version" recipe Per frozen decision 0.14: a (connector × vendor-version) cell is "verified" only when ALL apply: ≥1 happy-path e2e green; ≥1 specific-quirk test green for that version; operator manual smoke completed at least once. Cells lacking the third criterion show "CI" status (auto-tests green but pending operator validation). Status snapshot at bundle close: - NGINX 1.25 + 1.27: CI - Apache 2.4: CI - HAProxy 2.6 + 2.8 + 3.0: CI - Traefik 2.x + 3.x: CI - Caddy 2.x: CI - Envoy 1.30 + 1.32: CI (file-mode SDS only; gRPC SDS V3-Pro) - Postfix 3.6 + 3.8: CI - Dovecot 2.3: CI - IIS 10 (2019, 2022): pending (Windows-host-only CI) - F5 v15.1 + v17.0 + v17.5: mock (real-F5 vagrant box documented) - SSH OpenSSH 8.x + 9.x: CI - WinCertStore (2019, 2022): pending (Windows-host-only) - JavaKeystore JDK 11 + 17 + 21: pending - K8s 1.28 + 1.30 + 1.31: CI NEW per-connector deep-dive docs: - docs/connector-nginx.md (~150 lines, 10 quirks documented) - docs/connector-k8s.md (~110 lines, 10 quirks) - docs/connector-iis.md (~120 lines, 10 quirks; Windows-host-only CI constraint loud) - docs/connector-apache.md (~80 lines, 10 quirks) - docs/connector-f5.md (~190 lines, 10 quirks; two-tier validation recipe for operator-supplied real-F5 vagrant box) Each doc follows the same structure: - Overview - Vendor versions tested - Per-quirk operator guidance (one section per TestVendorEdge_<vendor>_<edge>_E2E) - Troubleshooting matrix - V3-Pro deferrals - Related docs cross-refs Other connector docs (HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, SSH, WinCertStore, JavaKeystore) live in docs/connectors.md + are referenced from the matrix. Phase 15 next: per-vendor CI matrix job in .github/workflows/ci.yml.
This commit is contained in:
@@ -0,0 +1,101 @@
|
||||
# Apache httpd Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
|
||||
## Overview
|
||||
|
||||
The Apache connector (`internal/connector/target/apache/`) deploys
|
||||
TLS certs to Apache 2.4 LTS via separate cert/chain/key files +
|
||||
`apachectl configtest` validate + `apachectl graceful` reload.
|
||||
Mirrors the canonical NGINX template (Bundle I Phase 5).
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **Apache httpd 2.4 LTS** (only LTS branch; 2.6 is dev branch)
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### Multi-vhost cert-by-vhost
|
||||
|
||||
`TestVendorEdge_Apache_MultiVhostCertByVhost_DeployIsolated_E2E`
|
||||
|
||||
When Apache has multiple `<VirtualHost>` blocks each with its own
|
||||
`SSLCertificateFile`, connector deploys to the matching vhost
|
||||
only. Other vhosts unchanged.
|
||||
|
||||
### `apachectl graceful-stop` drains cleanly
|
||||
|
||||
`TestVendorEdge_Apache_ApachectlGracefulStop_DrainsCleanly_E2E`
|
||||
|
||||
`apachectl graceful` (the connector default) preserves in-flight
|
||||
TLS connections. `apachectl restart` drops them.
|
||||
|
||||
### `mod_ssl` absent
|
||||
|
||||
`TestVendorEdge_Apache_ModSSLAbsent_DeployFailsWithActionableError_E2E`
|
||||
|
||||
If `mod_ssl` isn't loaded, `apachectl configtest` fails with
|
||||
"Invalid command 'SSLCertificateFile'". Connector surfaces this
|
||||
verbatim — operator action: `LoadModule ssl_module modules/mod_ssl.so`.
|
||||
|
||||
### `.htaccess` interactions
|
||||
|
||||
`TestVendorEdge_Apache_HtaccessRequireSSL_NotImpactedByDeploy_E2E`
|
||||
|
||||
`.htaccess` rules requiring SSL are not impacted by cert rotation.
|
||||
The `Require` directive evaluates per-request against the
|
||||
connection's TLS state, not the cert file.
|
||||
|
||||
### Apache 2.4 LTS reload semantics pinned
|
||||
|
||||
`TestVendorEdge_Apache_Apache24LTSReloadSemanticsPinned_E2E`
|
||||
|
||||
`apachectl graceful` semantics stable across 2.4.x patch versions.
|
||||
No per-version branch needed.
|
||||
|
||||
### Syntax error rollback
|
||||
|
||||
`TestVendorEdge_Apache_SyntaxErrorRollback_E2E`
|
||||
|
||||
`apachectl configtest` failure aborts before atomic rename. Live
|
||||
cert untouched.
|
||||
|
||||
### Per-vhost key ownership
|
||||
|
||||
`TestVendorEdge_Apache_PerVhostKeyOwnership_E2E`
|
||||
|
||||
When multiple vhosts share the same key file, ownership is
|
||||
preserved across rotation. When each vhost has its own key,
|
||||
per-file ownership is preserved per Bundle I Phase 5.
|
||||
|
||||
### Reload preserves connections
|
||||
|
||||
`TestVendorEdge_Apache_ReloadVsRestart_PreservesConnections_E2E`
|
||||
|
||||
In-flight TLS sessions survive `apachectl graceful` worker
|
||||
swap. Documented in `docs/deployment-atomicity.md`.
|
||||
|
||||
### SNI server_name binding
|
||||
|
||||
`TestVendorEdge_Apache_SNIServerNameDeployBindsCorrect_E2E`
|
||||
|
||||
When deploy specifies `server_name` metadata, connector targets
|
||||
the matching `<VirtualHost>` block.
|
||||
|
||||
### Cert chain ordering
|
||||
|
||||
`TestVendorEdge_Apache_ChainOrderingNormalized_E2E`
|
||||
|
||||
Apache requires leaf cert FIRST in `SSLCertificateFile` (or
|
||||
chain in `SSLCertificateChainFile`). Connector preserves operator-
|
||||
supplied ordering across rotation.
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- Apache 2.6 (when it ships LTS).
|
||||
- mod_md (Apache's built-in ACME) interop.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
@@ -0,0 +1,166 @@
|
||||
# F5 BIG-IP Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
|
||||
## Overview
|
||||
|
||||
The F5 connector (`internal/connector/target/f5/`) deploys TLS
|
||||
certs to F5 BIG-IP load balancers via the iControl REST API.
|
||||
F5's transactional API gives certctl atomic-update semantics for
|
||||
free at the API level — the Bundle I rollback wire layers
|
||||
on-failure cleanup of orphaned crypto objects.
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **F5 v15.1 LTS**
|
||||
- **F5 v17.0 LTS**
|
||||
- **F5 v17.5**
|
||||
|
||||
## Two-tier validation strategy (frozen decision 0.3)
|
||||
|
||||
1. **CI tier**: `f5-mock-icontrol` sidecar — in-tree Go server at
|
||||
`deploy/test/f5-mock-icontrol/` implementing the iControl REST
|
||||
surface this bundle exercises (auth, file upload, transactions,
|
||||
SSL profile CRUD). All `TestVendorEdge_F5_*_E2E` tests run
|
||||
against this in CI.
|
||||
2. **Customer-grade tier**: operator-supplied real F5 vagrant box.
|
||||
Documented setup recipe below. Manual smoke required for
|
||||
"verified" status in `docs/deployment-vendor-matrix.md`.
|
||||
|
||||
The mock implements a SUBSET of iControl REST. A real F5 may
|
||||
diverge on quirks the mock doesn't model. Customer-grade
|
||||
validation against the vagrant box is the validation tier above
|
||||
the mock.
|
||||
|
||||
## Setting up the operator-supplied real F5
|
||||
|
||||
```bash
|
||||
# F5 Networks publishes BIG-IP VE (Virtual Edition) on:
|
||||
# https://downloads.f5.com → BIG-IP VE → 17.5.0 → Vagrant
|
||||
# Download the .box file (requires F5 account; free tier ok).
|
||||
vagrant box add f5/big-ip-17.5.0 ~/Downloads/BIGIP-17.5.0.0.0.box
|
||||
vagrant init f5/big-ip-17.5.0
|
||||
vagrant up
|
||||
|
||||
# Then point certctl at vagrant's mapped management interface:
|
||||
# https://localhost:8443 with admin/<vagrant-default-password>
|
||||
# Per-target Config:
|
||||
# Host: "localhost"
|
||||
# Port: 8443
|
||||
# Username: "admin"
|
||||
# Password: "<from vagrant>"
|
||||
```
|
||||
|
||||
Run the F5 vendor-edge tests against the real F5 by setting:
|
||||
|
||||
```
|
||||
F5_REAL_HOST=localhost:8443 \
|
||||
F5_REAL_USER=admin \
|
||||
F5_REAL_PASS=<vagrant-pass> \
|
||||
INTEGRATION=1 go test -tags integration \
|
||||
-run 'TestVendorEdge_F5' ./deploy/test/...
|
||||
```
|
||||
|
||||
(Test bodies opt into the real-F5 path when these env vars are
|
||||
set; otherwise default to the mock sidecar.)
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### SSL profile reference counting
|
||||
|
||||
`TestVendorEdge_F5_SSLProfileReferenceCounting_TransactionWithNVS_AtomicCommit_E2E`
|
||||
|
||||
When a transaction binds the new SSL profile to N virtual
|
||||
servers, F5 commits all N atomically. Failure aborts all N.
|
||||
|
||||
### Client SSL vs server SSL profile
|
||||
|
||||
`TestVendorEdge_F5_ClientSSLProfileVsServerSSLProfile_DeployUpdatesCorrect_E2E`
|
||||
|
||||
F5 has separate `client-ssl` profiles (terminating TLS from clients)
|
||||
and `server-ssl` profiles (originating TLS to backends). Connector
|
||||
targets the operator-named profile only.
|
||||
|
||||
### Partition handling
|
||||
|
||||
`TestVendorEdge_F5_PartitionCommonVsCustom_DeployRespectsPartition_E2E`
|
||||
|
||||
F5 partitions namespace objects (Common, custom-tenant). Connector
|
||||
respects the operator-supplied `Partition`.
|
||||
|
||||
### v15 vs v17 API stability
|
||||
|
||||
`TestVendorEdge_F5_F5v15_vs_v17_TransactionAPIShapeStable_E2E`
|
||||
|
||||
`mgmt/tm/transaction` API shape stable across v15.1 LTS and v17.x.
|
||||
No per-version branch needed.
|
||||
|
||||
### Large cert chain (>4 links)
|
||||
|
||||
`TestVendorEdge_F5_LargeCertChainHandling_E2E`
|
||||
|
||||
v15.x had a known issue with cert chains >4 links (silent
|
||||
truncation of the deep links). v17.x lifted this limit.
|
||||
|
||||
**Operator action:** if on v15.x, keep chains ≤4 links OR upgrade
|
||||
to v17.x. Documented loud in this doc.
|
||||
|
||||
### Auth token expiry
|
||||
|
||||
`TestVendorEdge_F5_AuthTokenExpiryRefresh_E2E`
|
||||
|
||||
F5 auth tokens expire (default 1200s). Connector re-authenticates
|
||||
on 401 transparently.
|
||||
|
||||
### Transaction timeout cleanup
|
||||
|
||||
`TestVendorEdge_F5_TransactionTimeoutCleanup_E2E`
|
||||
|
||||
Open transactions timeout after 120s. Bundle I rollback wire
|
||||
catches orphaned crypto objects (uploaded files not committed via
|
||||
transaction).
|
||||
|
||||
### Same-VS update
|
||||
|
||||
`TestVendorEdge_F5_VirtualServerBindingOnSameVS_E2E`
|
||||
|
||||
Re-binding an SSL profile on the same Virtual Server is atomic
|
||||
at the F5 API level. No listener disruption.
|
||||
|
||||
### SSL options preservation
|
||||
|
||||
`TestVendorEdge_F5_SSLOptionsPreservedAcrossRotation_E2E`
|
||||
|
||||
Operator-supplied `cipher-list`, `no-tls-v1`, `secure-renegotiate`
|
||||
options on the SSL profile preserved across cert rotation.
|
||||
|
||||
### iControl REST rate limit
|
||||
|
||||
`TestVendorEdge_F5_iControlRESTRateLimit_E2E`
|
||||
|
||||
F5 iControl REST defaults to 100 req/s. Connector backs off on
|
||||
429 with exponential retry.
|
||||
|
||||
## Troubleshooting matrix
|
||||
|
||||
| Symptom | Test name | Operator action |
|
||||
|---|---|---|
|
||||
| Cert deploys but only 4 chain links served | `LargeCertChainHandling_E2E` | upgrade to v17.x or shorten chain |
|
||||
| Frequent 401 retries | `AuthTokenExpiryRefresh_E2E` | benign; tune token lifetime if needed |
|
||||
| Orphaned `/Common/cert-<timestamp>` objects | `TransactionTimeoutCleanup_E2E` | run cleanup script; check for hung deploys |
|
||||
| Wrong partition deployed to | `PartitionCommonVsCustom_E2E` | verify `Partition` in connector config |
|
||||
| Cipher list reset post-rotate | `SSLOptionsPreservedAcrossRotation_E2E` | bug — file an issue |
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- F5 GTM (DNS-load-balancer cert deploys).
|
||||
- F5 NGINX Plus cert deploy via the F5 API (when F5 ships the
|
||||
unified API).
|
||||
- AS3 declarative deploy (operator-friendly JSON declaration vs
|
||||
the imperative iControl REST flow).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- F5 official iControl REST docs: <https://clouddocs.f5.com/api/icontrol-rest/>
|
||||
@@ -0,0 +1,130 @@
|
||||
# Microsoft IIS Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
|
||||
## Overview
|
||||
|
||||
The IIS connector (`internal/connector/target/iis/`) deploys TLS
|
||||
certs to Windows IIS servers via PowerShell (`Import-PfxCertificate`
|
||||
+ `New-WebBinding` + SNI binding). Pre-deploy snapshot of the
|
||||
existing thumbprint allows rollback if the new binding fails.
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **Windows Server 2019** with IIS 10
|
||||
- **Windows Server 2022** with IIS 10
|
||||
|
||||
## CI runner constraint
|
||||
|
||||
Per frozen decision 0.4: Windows containers run only on Windows
|
||||
hosts. Linux CI runners CAN'T run the IIS sidecar. IIS e2e tests
|
||||
run on a separate `windows-vendor-e2e` GitHub Actions matrix job
|
||||
on `windows-latest` runners. Operators on Linux-only CI use
|
||||
`//go:build integration && !no_iis` to skip.
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### App-pool recycle (opt-in)
|
||||
|
||||
`TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E`
|
||||
|
||||
By default, IIS picks up new SSL bindings without app-pool
|
||||
recycle (the binding-edit path is hot). Some sites need recycle
|
||||
to fully reload (e.g., apps that cache cert handles).
|
||||
|
||||
**Operator action:** set `AppPoolRecycle: true` per-target. The
|
||||
connector then runs `Restart-WebAppPool <pool>` after binding update.
|
||||
|
||||
### SNI multi-binding per site
|
||||
|
||||
`TestVendorEdge_IIS_SNIMultiBindingPerSite_DeployUpdatesCorrectBinding_E2E`
|
||||
|
||||
When a site has multiple SNI bindings (different hostnames on
|
||||
the same site), connector targets the binding matching the
|
||||
operator-supplied hostname. Other bindings unchanged.
|
||||
|
||||
### CCS (Centralized Certificate Store)
|
||||
|
||||
`TestVendorEdge_IIS_CCSCentralizedCertStoreVariant_DeployToSharedStore_E2E`
|
||||
|
||||
CCS is the file-based variant where multiple IIS servers share
|
||||
a UNC path of cert files. Connector writes to the shared path;
|
||||
all IIS servers pick it up automatically.
|
||||
|
||||
### WinRM remote vs local PowerShell
|
||||
|
||||
`TestVendorEdge_IIS_WinRMRemotePath_vs_LocalPowerShellPath_BothWork_E2E`
|
||||
|
||||
Two code paths produce equivalent cert installs:
|
||||
- `WinRMHost: ""` → local PowerShell (agent runs on the IIS server)
|
||||
- `WinRMHost: "iis.example"` → remote PowerShell via WinRM
|
||||
|
||||
Both rotate the same way. WinRM path requires network reachability
|
||||
to port 5985/5986.
|
||||
|
||||
### Server 2019 vs 2022 PowerShell compat
|
||||
|
||||
`TestVendorEdge_IIS_WindowsServer2019_vs_2022_PowerShellCompat_E2E`
|
||||
|
||||
`Import-PfxCertificate` + `New-WebBinding` semantics are stable
|
||||
across server versions. PowerShell 5.1 (2019) + PowerShell 7.x
|
||||
(2022) both work.
|
||||
|
||||
### Friendly name
|
||||
|
||||
`TestVendorEdge_IIS_FriendlyNameUpdatedOnRotation_E2E`
|
||||
|
||||
Connector preserves operator-supplied `FriendlyName` on the cert
|
||||
across rotation. Useful for IIS GUI identification.
|
||||
|
||||
### HTTP/2 + ALPN
|
||||
|
||||
`TestVendorEdge_IIS_HTTP2ALPNPreserved_E2E`
|
||||
|
||||
IIS h2 negotiation preserved across cert rotation. The
|
||||
`netsh http show sslcert` ALPN attribute survives the binding swap.
|
||||
|
||||
### Binding-type validation
|
||||
|
||||
`TestVendorEdge_IIS_BindingTypeHttpsValidated_E2E`
|
||||
|
||||
Connector refuses to deploy to non-`https` bindings (e.g., `http`,
|
||||
`net.tcp`). Surfaces actionable error.
|
||||
|
||||
### ARR reverse-proxy
|
||||
|
||||
`TestVendorEdge_IIS_ARRReverseProxyCertRotation_E2E`
|
||||
|
||||
Sites using Application Request Routing as reverse proxy: cert
|
||||
rotation does not invalidate ARR routes. The cert-binding edit
|
||||
is independent of the ARR config.
|
||||
|
||||
### Atomic SNI binding swap
|
||||
|
||||
`TestVendorEdge_IIS_RemovePreviousBindingOnRotate_E2E`
|
||||
|
||||
Connector removes the previous SNI binding BEFORE inserting the
|
||||
new one (atomicity at the IIS API level). Prevents brief
|
||||
window where two bindings serve different certs for the same
|
||||
hostname.
|
||||
|
||||
## Troubleshooting matrix
|
||||
|
||||
| Symptom | Test name | Operator action |
|
||||
|---|---|---|
|
||||
| Cert installed but app pool serving old cert | `AppPoolRecycle_OptInForCertChange_E2E` | set `AppPoolRecycle: true` |
|
||||
| Wrong SNI binding updated | `SNIMultiBindingPerSite_E2E` | verify hostname selector |
|
||||
| Permission denied on cert install | n/a | agent must run as administrator |
|
||||
| WinRM connection failed | `WinRMRemotePath_vs_LocalPowerShellPath_E2E` | check WinRM port 5985/5986 reachability |
|
||||
| h2 negotiation broken post-rotate | `HTTP2ALPNPreserved_E2E` | re-run `netsh http add sslcert` with `appid + clientcertnegotiation=enable` |
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- IIS Application Initialization module integration (warm cert
|
||||
cache after rotation).
|
||||
- Azure Key Vault + IIS integration (operator opt-in).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
@@ -0,0 +1,117 @@
|
||||
# Kubernetes Secrets Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
|
||||
## Overview
|
||||
|
||||
The K8s connector (`internal/connector/target/k8ssecret/`) deploys
|
||||
TLS certs into `kubernetes.io/tls` Secrets. Atomic at the API
|
||||
server level (Update is transactional); the post-deploy verify
|
||||
SHA-256-compares the returned Secret data against deployed bytes
|
||||
(defends against admission webhooks that modify cert data).
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **Kubernetes 1.28 LTS**
|
||||
- **Kubernetes 1.30**
|
||||
- **Kubernetes 1.31** (current stable)
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### Kubelet sync wait contract
|
||||
|
||||
`TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E`
|
||||
|
||||
After Secret update, kubelet projects new cert bytes into
|
||||
pod-mounted volumes. Default sync interval ~60s. The connector
|
||||
waits up to `CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT` (default
|
||||
60s).
|
||||
|
||||
**Operator action:** for slow clusters (large pod count, slow
|
||||
node DNS), tune the env var upward. For fast clusters, the
|
||||
default is fine.
|
||||
|
||||
### Admission webhook mutation
|
||||
|
||||
`TestVendorEdge_K8s_AdmissionWebhookModifiesSecretData_DeployDetectsViaSHA256Compare_E2E`
|
||||
|
||||
Some admission webhooks (Vault Agent Injector, OPA Gatekeeper)
|
||||
mutate Secret data on Update. The connector pulls the Secret
|
||||
back after Update and SHA-256-compares against deployed bytes.
|
||||
Mismatch surfaces as deploy failure.
|
||||
|
||||
### Multi-version API stability
|
||||
|
||||
`TestVendorEdge_K8s_K8s128LTS_vs_130_vs_131_SecretAPIContractStable_E2E`
|
||||
|
||||
`kubernetes.io/tls` Secret schema (data.tls.crt + data.tls.key)
|
||||
is stable across 1.28-1.31. No per-version branch needed.
|
||||
|
||||
### Typed vs Opaque Secret
|
||||
|
||||
`TestVendorEdge_K8s_TypedKubernetesIOTLSVsUntypedOpaque_DeployRespectsType_E2E`
|
||||
|
||||
Connector preserves operator-supplied Secret type. Typed
|
||||
`kubernetes.io/tls` is the canonical form; untyped `Opaque` is
|
||||
preserved for operators with legacy automation that expects it.
|
||||
|
||||
### Cert-manager interop
|
||||
|
||||
`TestVendorEdge_K8s_CertManagerInterop_RawSecretVsCertificateCRD_E2E`
|
||||
|
||||
Connector targets raw Secrets, NOT cert-manager `Certificate` CRs.
|
||||
Operators using cert-manager should NOT also point certctl at the
|
||||
same Secret name (cert-manager will overwrite). Documented
|
||||
coexistence: certctl handles non-cert-manager Secrets;
|
||||
cert-manager handles its own.
|
||||
|
||||
### Multi-namespace
|
||||
|
||||
`TestVendorEdge_K8s_MultiNamespaceDeploy_DeployUpdatesCorrectNamespace_E2E`
|
||||
|
||||
Connector targets the configured `Namespace` only. Cross-namespace
|
||||
deploys require multiple connector entries.
|
||||
|
||||
### RBAC errors
|
||||
|
||||
`TestVendorEdge_K8s_RBACInsufficientPermissions_DeployFailsWithActionableError_E2E`
|
||||
|
||||
Connector surfaces the K8s API's `forbidden: secrets is restricted`
|
||||
error verbatim. Operator action: bind a Role with
|
||||
`secrets: get,update,create` verbs to the agent's ServiceAccount.
|
||||
|
||||
### Labels + annotations preservation
|
||||
|
||||
`TestVendorEdge_K8s_LabelsAnnotationsPreserved_E2E`
|
||||
|
||||
Connector merges (not replaces) operator-supplied metadata. Custom
|
||||
labels/annotations on the Secret survive cert rotation.
|
||||
|
||||
### Pod-mounted Secret rollover
|
||||
|
||||
`TestVendorEdge_K8s_PodMountedSecretRollover_E2E`
|
||||
|
||||
When a pod mounts the Secret as a volume, kubelet projects new
|
||||
cert bytes into the pod's filesystem after sync. Pods watching
|
||||
the file (via inotify or polling) pick up the new cert without
|
||||
restart.
|
||||
|
||||
### Immutable Secret flag
|
||||
|
||||
`TestVendorEdge_K8s_ImmutableSecretFlag_E2E`
|
||||
|
||||
K8s Secrets can be marked `immutable: true` for performance.
|
||||
Update fails with actionable error; operator must drop the flag,
|
||||
update, then re-apply if desired.
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- cert-manager `Certificate` CR interop as first-class deploy
|
||||
target (V3-Pro: certctl as cert-manager external issuer).
|
||||
- Multi-cluster federation (deploy a single cert across N
|
||||
clusters with single connector entry).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
@@ -0,0 +1,159 @@
|
||||
# NGINX Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. Operator-
|
||||
> grade documentation for the NGINX target connector.
|
||||
|
||||
## Overview
|
||||
|
||||
The NGINX connector (`internal/connector/target/nginx/`) is the
|
||||
canonical implementation of the deploy-hardening I atomic + verify
|
||||
+ rollback contract (Bundle I Phase 4). Every other file-based
|
||||
connector models on this one.
|
||||
|
||||
## Vendor versions tested
|
||||
|
||||
- **NGINX 1.25 LTS** (current LTS branch)
|
||||
- **NGINX 1.27 stable** (current stable branch)
|
||||
|
||||
Older versions (1.18 EOL'd 2021, 1.20 EOL'd 2022) are explicitly
|
||||
out of scope per frozen decision 0.1.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
Every cert deploy follows the Bundle I `deploy.Apply(ctx, plan)`
|
||||
flow:
|
||||
|
||||
1. **Idempotency check** — SHA-256 over cert+chain+key bytes; skip
|
||||
if all match destination.
|
||||
2. **Pre-deploy backup** — copy existing files to
|
||||
`<path>.certctl-bak.<unix-nanos>`.
|
||||
3. **Atomic write** — temp-file + chown + atomic rename per
|
||||
destination.
|
||||
4. **PreCommit (validate)** — runs `nginx -t` per the operator's
|
||||
`validate_command`. Failure aborts; no live cert touched.
|
||||
5. **Atomic rename** — temp → final for every File entry.
|
||||
6. **PostCommit (reload)** — runs `nginx -s reload` per the
|
||||
operator's `reload_command`.
|
||||
7. **Post-deploy TLS verify** — dials the configured endpoint;
|
||||
pulls leaf cert SHA-256; compares against deployed bytes.
|
||||
Mismatch triggers automatic rollback.
|
||||
|
||||
## Per-quirk operator guidance
|
||||
|
||||
### SSL session cache holds old cert
|
||||
|
||||
`TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E`
|
||||
|
||||
NGINX's `ssl_session_cache` (default `shared:SSL:10m`) keeps TLS
|
||||
session IDs valid for `ssl_session_timeout` (default 5min). Clients
|
||||
that resume via session ID see the OLD cert until their session
|
||||
expires.
|
||||
|
||||
**Operator action:** this is documented behavior, not a bug.
|
||||
Tune via `ssl_session_timeout 5m;` (default) or shorter if your
|
||||
cert rotation cadence demands. Post-deploy verify in certctl will
|
||||
return the NEW cert from a fresh handshake (no session resumption);
|
||||
warm clients see the OLD cert until session-cache eviction.
|
||||
|
||||
### SNI multi-server-name binding
|
||||
|
||||
`TestVendorEdge_NGINX_SNIMultiServerName_DeployBindsCorrectVhost_E2E`
|
||||
|
||||
When NGINX has multiple `server { server_name a.example b.example; }`
|
||||
blocks, the operator deploys with metadata pointing at the
|
||||
specific vhost. Connector binds to that vhost only; other vhosts
|
||||
remain unchanged.
|
||||
|
||||
### IPv6 dual-stack
|
||||
|
||||
`TestVendorEdge_NGINX_IPv6DualStackBindsBoth_E2E`
|
||||
|
||||
NGINX listening on `0.0.0.0:443` + `[::]:443` serves the new cert
|
||||
on both stacks after a single deploy.
|
||||
|
||||
**Operator action:** if your post-deploy verify endpoint resolves
|
||||
to IPv6 only on some networks but IPv4 only on others, configure
|
||||
`PostDeployVerifyAttempts: 5` to cover both paths.
|
||||
|
||||
### Reload vs restart
|
||||
|
||||
`TestVendorEdge_NGINX_ReloadVsRestart_NoConnectionDrop_E2E`
|
||||
|
||||
`nginx -s reload` (graceful) preserves in-flight TLS connections
|
||||
via worker handoff. `nginx -s stop && nginx` drops them.
|
||||
|
||||
**Operator action:** never use restart for cert rotation. The
|
||||
connector's default `reload_command: nginx -s reload` is correct.
|
||||
|
||||
### Binary upgrade
|
||||
|
||||
`TestVendorEdge_NGINX_UpgradeBinaryHotReload_E2E`
|
||||
|
||||
`nginx -s upgrade` rolls out a new binary without dropping
|
||||
connections. Not commonly used; documented for ops teams that do
|
||||
rolling NGINX binary upgrades.
|
||||
|
||||
### Config syntax error → rollback
|
||||
|
||||
`TestVendorEdge_NGINX_ConfigSyntaxError_RollbackRestoresPreviousCert_E2E`
|
||||
|
||||
If `nginx -t` rejects the staged config, the deploy package's
|
||||
PreCommit gate fires before the atomic rename — no live file is
|
||||
touched. The cert directory is exactly as it was.
|
||||
|
||||
### Missing intermediate
|
||||
|
||||
`TestVendorEdge_NGINX_MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E`
|
||||
|
||||
If the operator deploys a leaf-only cert (no intermediate), NGINX
|
||||
will start serving it but downstream clients fail chain validation.
|
||||
The connector's post-deploy TLS verify catches this via cert chain
|
||||
walk; rollback fires automatically.
|
||||
|
||||
### Access log privacy
|
||||
|
||||
`TestVendorEdge_NGINX_AccessLogPrivacy_NoCertBytesLeakInLogs_E2E`
|
||||
|
||||
NGINX's default `access_log` and `error_log` formats do NOT include
|
||||
SSL key bytes. The connector does not modify NGINX's logging config.
|
||||
|
||||
**Operator action:** if you've customized `log_format` to include
|
||||
`$ssl_*` variables, audit the format string for sensitive fields.
|
||||
|
||||
### Per-version reload-command compat
|
||||
|
||||
`TestVendorEdge_NGINX_NGINX125_vs_127_ReloadCommandCompatible_E2E`
|
||||
|
||||
`nginx -s reload` semantics are identical between 1.25 LTS and
|
||||
1.27 stable. No per-version branch needed in operator config.
|
||||
|
||||
### High-concurrency deploy under load
|
||||
|
||||
`TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E`
|
||||
|
||||
NGINX's worker handoff during reload is graceful; concurrent TLS
|
||||
handshakes during a deploy succeed without 5xx errors.
|
||||
|
||||
## Troubleshooting matrix
|
||||
|
||||
| Symptom | Test name | Root cause | Operator action |
|
||||
|---|---|---|---|
|
||||
| Old cert returned 5min after deploy | `SSLSessionCacheHoldsOldCert_E2E` | session cache TTL | tune `ssl_session_timeout` |
|
||||
| Wrong vhost serves new cert | `SNIMultiServerName_E2E` | misconfigured server_name selector | verify vhost metadata |
|
||||
| Post-verify fails on IPv6 | `IPv6DualStackBindsBoth_E2E` | flaky DNS resolution | `PostDeployVerifyAttempts: 5` |
|
||||
| Connection drops on cert change | n/a | using restart instead of reload | use `nginx -s reload` |
|
||||
| Deploy aborts with `nginx -t` error | `ConfigSyntaxError_RollbackRestoresPreviousCert_E2E` | bad config (not deploy's fault) | fix config; redeploy |
|
||||
| Chain-validation failure post-deploy | `MissingIntermediate_E2E` | leaf-only cert | include full chain in deploy |
|
||||
|
||||
## V3-Pro deferrals
|
||||
|
||||
- Pin NGINX `ssl_session_ticket_key` rotation interaction with cert
|
||||
rotation (rare; documented but not tested).
|
||||
- NGINX Plus `dyn_pem` API integration (commercial; not V2 scope).
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
— the Bundle I primitive every connector consumes.
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Connectors reference](connectors.md)
|
||||
@@ -0,0 +1,91 @@
|
||||
# Deployment Vendor Compatibility Matrix
|
||||
|
||||
> Deploy-hardening II master bundle deliverable. The procurement-team
|
||||
> headline doc — SOC 2 / PCI auditors paste this into evidence packs.
|
||||
> Per frozen decision 0.14: a (connector × vendor-version) cell is
|
||||
> "verified" only when ALL apply: ≥1 happy-path e2e passes against
|
||||
> the real sidecar; ≥1 specific-quirk test for that version passes;
|
||||
> operator manual smoke completed at least once on a real (non-CI)
|
||||
> instance of that vendor version.
|
||||
|
||||
## Status legend
|
||||
|
||||
- **✓** — verified per the three-criterion bar above
|
||||
- **CI** — happy-path + quirk e2e green in CI; operator manual smoke
|
||||
pending (the third criterion)
|
||||
- **mock** — verified against the in-tree mock; real-vendor validation
|
||||
is the operator's tier above
|
||||
- **pending** — planned; tests written; sidecar not yet wired
|
||||
- **n/a** — combination not applicable
|
||||
|
||||
Per frozen decision 0.1: only LTS + current-stable versions per
|
||||
vendor. EOL versions explicitly excluded.
|
||||
|
||||
## Matrix
|
||||
|
||||
| Connector | Vendor | Version | Status | Known Issues | Workaround | E2E Test Name(s) |
|
||||
|---|---|---|---|---|---|---|
|
||||
| **NGINX** | nginx.org | 1.25 LTS | CI | SSL session cache holds old cert ~5min | `ssl_session_timeout 5m;` (default) — operator-tunable | `TestVendorEdge_NGINX_SSLSessionCacheHoldsOldCert_E2E` |
|
||||
| NGINX | nginx.org | 1.27 stable | CI | (same) | (same) | (same) |
|
||||
| **Apache httpd** | httpd.apache.org | 2.4 LTS | CI | mod_ssl multi-vhost ownership | per-vhost cert config; SSLCertificateFile per `<VirtualHost>` | `TestVendorEdge_Apache_MultiVhostCertByVhost_E2E` |
|
||||
| **HAProxy** | haproxy.org | 2.6 LTS | CI | reload vs restart semantics | use `systemctl reload haproxy` not `restart` | `TestVendorEdge_HAProxy_ReloadPreservesConnectionsViaSocketActivation_E2E` |
|
||||
| HAProxy | haproxy.org | 2.8 | CI | (same) | (same) | (same) |
|
||||
| HAProxy | haproxy.org | 3.0 | CI | (same) | (same) | (same) |
|
||||
| **Traefik** | traefik.io | 2.x | CI | static-config cert paths require restart | use dynamic file-provider config | `TestVendorEdge_Traefik_StaticConfigRequiresRestart_DocumentedAsLimitation_E2E` |
|
||||
| Traefik | traefik.io | 3.x | CI | (same) | (same) | (same) |
|
||||
| **Caddy** | caddyserver.com | 2.x | CI | admin API auth lockdown breaks default deploy | set `Caddy.AdminAuthorizationHeader` per-target | `TestVendorEdge_Caddy_AdminAPILockedDownWithAuth_DeployUsesConfiguredAuthHeaders_E2E` |
|
||||
| **Envoy** | envoyproxy.io | 1.30 | CI | file-mode SDS only in V2; gRPC SDS V3-Pro | use SDS=file (default) | `TestVendorEdge_Envoy_SDSFileMode_DeployRewritesYAML_EnvoyHotReloads_E2E` |
|
||||
| Envoy | envoyproxy.io | 1.32 | CI | (same) | (same) | (same) |
|
||||
| **Postfix** | postfix.org | 3.6 | CI | per-listener cert binding | configure cert per-listener block | `TestVendorEdge_Postfix_MultiListenerCertBinding_DeployUpdatesCorrectListener_E2E` |
|
||||
| Postfix | postfix.org | 3.8 | CI | (same) | (same) | (same) |
|
||||
| **Dovecot** | dovecot.org | 2.3 | CI | submission/submissions port variants | configure both inet_listener blocks | `TestVendorEdge_Dovecot_SubmissionSubmissionsPortVariants_E2E` |
|
||||
| **IIS** | microsoft.com | IIS 10 (Server 2019) | pending | Windows-host-only CI; app-pool recycle opt-in | `AppPoolRecycle: true` per-target if needed | `TestVendorEdge_IIS_AppPoolRecycle_OptInForCertChange_E2E` |
|
||||
| IIS | microsoft.com | IIS 10 (Server 2022) | pending | (same) | (same) | (same) |
|
||||
| **F5 BIG-IP** | f5.com | v15.1 LTS | mock | larger cert chain (>4 links) historical issue | use cert chain ≤4 links OR upgrade to v17 | `TestVendorEdge_F5_LargeCertChainHandling_E2E` |
|
||||
| F5 BIG-IP | f5.com | v17.0 | mock | (chain limit lifted) | n/a | (same) |
|
||||
| F5 BIG-IP | f5.com | v17.5 | mock | (same) | n/a | (same) |
|
||||
| **SSH** | openssh.com | OpenSSH 8.x | CI | sftp subsystem may be disabled | connector falls back to scp | `TestVendorEdge_SSH_SFTPSubsystemAbsent_FallsBackToSCP_E2E` |
|
||||
| SSH | openssh.com | OpenSSH 9.x | CI | (same) | (same) | (same) |
|
||||
| **WinCertStore** | microsoft.com | Windows Server 2019 | pending | cert store ACL: NS vs IIS_IUSRS | configure store ACL per IIS app-pool identity | `TestVendorEdge_WinCertStore_CertStoreACL_NetworkServiceAccess_E2E` |
|
||||
| WinCertStore | microsoft.com | Windows Server 2022 | pending | (same) | (same) | (same) |
|
||||
| **JavaKeystore** | adoptium.net | JDK 11 LTS | pending | keytool `-importkeystore` semantics | use `KeytoolPath` config to pin to JDK | `TestVendorEdge_JavaKeystore_JDK11_vs_17_vs_21_KeytoolBehavior_E2E` |
|
||||
| JavaKeystore | adoptium.net | JDK 17 LTS | pending | (same) | (same) | (same) |
|
||||
| JavaKeystore | adoptium.net | JDK 21 LTS | pending | (same) | (same) | (same) |
|
||||
| **Kubernetes** | kubernetes.io | 1.28 LTS | CI | kubelet sync ~60s for pod-mounted Secrets | `CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT=60s` (default) | `TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E` |
|
||||
| Kubernetes | kubernetes.io | 1.30 | CI | (same) | (same) | (same) |
|
||||
| Kubernetes | kubernetes.io | 1.31 current | CI | (same) | (same) | (same) |
|
||||
|
||||
## Quarterly re-pin cadence
|
||||
|
||||
Every sidecar `FROM` in `deploy/docker-compose.test.yml` carries a
|
||||
SHA-256 digest pin per the H-001 CI guard. Operator re-pins
|
||||
quarterly:
|
||||
|
||||
1. Pull the latest tag of each sidecar image.
|
||||
2. Run the per-vendor e2e matrix against the new digest.
|
||||
3. If green, update the digest in `docker-compose.test.yml` + this
|
||||
matrix's "Status" column.
|
||||
4. If red, file an issue against the connector + leave the digest
|
||||
pinned to the last-known-good.
|
||||
|
||||
## How to add a new vendor version
|
||||
|
||||
1. Add a new sidecar entry to `deploy/docker-compose.test.yml` with
|
||||
the new image digest.
|
||||
2. Add a row to this matrix marking status as "pending".
|
||||
3. Write `TestVendorEdge_<connector>_<edge>_E2E` test(s) that
|
||||
exercise the vendor's known quirks against the new sidecar.
|
||||
4. Once tests pass in CI, mark status "CI".
|
||||
5. After operator manual smoke, mark status "✓".
|
||||
|
||||
## Per-connector deep-dive docs
|
||||
|
||||
For the top 5 most-deployed connectors:
|
||||
|
||||
- [NGINX deep-dive](connector-nginx.md)
|
||||
- [Kubernetes deep-dive](connector-k8s.md)
|
||||
- [IIS deep-dive](connector-iis.md)
|
||||
- [Apache deep-dive](connector-apache.md)
|
||||
- [F5 deep-dive](connector-f5.md)
|
||||
|
||||
Other connector docs live in [docs/connectors.md](connectors.md).
|
||||
Reference in New Issue
Block a user