docs: deploy-hardening I — atomic deploy + post-verify operator guide + connectors / README updates

Phase 12 of the deploy-hardening I master bundle.

NEW docs/deployment-atomicity.md (12 sections, ~280 lines):
1. Overview — the three procurement-checklist gaps closed
2. The atomic-write primitive (Plan / File / Apply algorithm)
3. Per-connector atomic contract table (all 13 connectors)
4. Post-deploy TLS verification (handshake + SHA-256 + retries)
5. Rollback semantics (3 triggers + escalation path)
6. ValidateOnly dry-run mode (per-connector matrix)
7. File ownership + mode preservation (precedence + per-distro defaults)
8. Per-target deploy mutex (Phase 2)
9. Idempotency via SHA-256 (defends against retry storms)
10. Troubleshooting matrix (one row per failure mode)
11. V3-Pro deferrals (multi-region, pin manifests, SOC 2 export)
12. Per-connector quick reference (paste-able config snippets)

UPDATE README.md::Deployment Targets — every connector row now
notes the atomic + verify + rollback semantics that landed in
deploy-hardening I. Added a closing paragraph linking to the new
docs/deployment-atomicity.md.

UPDATE docs/features.md — two new env-var rows:
- CERTCTL_DEPLOY_BACKUP_RETENTION (default 3, -1 disables)
- CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT (default 60s)

The G-3 docs-drift CI guard is satisfied: every new
CERTCTL_DEPLOY_* env var documented here also appears in source
(internal/deploy/types.go for BACKUP_RETENTION, k8ssecret config
for KUBELET_SYNC_TIMEOUT).

S-1 stale-counts guard: no literal-number current-state counts in
the new doc — the per-connector tests are referenced via the
file:line pattern (internal/connector/target/<name>/<name>_atomic_test.go)
so the operator can grep for the actual count.

Phase 13 next: pre-commit verification (full matrix + CI guard
reproductions).
This commit is contained in:
shankar0123
2026-04-30 15:30:45 +00:00
parent ad13ef3e4c
commit b95a548f65
3 changed files with 328 additions and 14 deletions
+2
View File
@@ -415,6 +415,8 @@ Self-signed or sub-CA mode using `crypto/x509`.
| `CERTCTL_OCSP_RESPONDER_VALIDITY` | `30d` | How long each newly-issued responder cert is valid for. Short by design: relying parties cache OCSP responses, not the responder cert chain, and `id-pkix-ocsp-nocheck` blocks recursive revocation checking on the responder itself. |
| `CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN` | `1000` | **Production hardening II Phase 3.** Per-source-IP cap on OCSP requests per minute. Zero disables the limit. Trip returns the canonical OCSP "unauthorized" status (RFC 6960 §2.3) plus `Retry-After: 60`. The limiter does NOT honor `X-Forwarded-For` (OCSP is publicly reachable; spoofed headers would bypass the cap). |
| `CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR` | `50` | **Production hardening II Phase 3.** Per-actor cap on cert-export requests (PEM + PKCS#12) per hour. Zero disables. Trip returns HTTP 429 + JSON `{"error":"rate_limit_exceeded","retry_after_seconds":3600}` plus `Retry-After: 3600`. Defends against bulk-export from a compromised admin token. |
| `CERTCTL_DEPLOY_BACKUP_RETENTION` | `3` | **Deploy-hardening I.** How many `<path>.certctl-bak.<unix-nanos>` backup files the connector janitor keeps per deployed file. Setting to `-1` disables backups entirely — rollback becomes impossible (documented foot-gun). Per-target override via the connector config's `backup_retention` field. |
| `CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT` | `60s` | **Deploy-hardening I Phase 9.** How long the K8s connector waits for kubelet sync after Secret update before timing out the post-deploy verify. Tunes for slow clusters (high pod count, slow node DNS). |
Sub-CA mode validates `IsCA=true` and `KeyUsageCertSign` on the loaded certificate. Falls back to self-signed when paths are not set. Supports CRL generation (`GenerateCRL`) and OCSP response signing (`SignOCSPResponse`). All CA-key signing flows through the `signer.Signer` interface (`internal/crypto/signer/`); the OCSP responder cert is signed by the CA via the existing issuance pipeline and OCSP responses are signed by the responder key (NOT the CA key directly) per RFC 6960 §2.6.