mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 04:58:57 +00:00
e292faafc6cd52af5282764f4ef1f148adebec92
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
febf50090b |
envoy: atomic SDS JSON write + post-deploy watcher pickup poll
Closes Bundle 3 of the 2026-05-02 deployment-target coverage audit (see cowork/deployment-target-audit-2026-05-02/RESULTS.md). The audit ranked this fix #3 by acquirer impact behind the K8s real client (#1) and the docs realignment (#2 / Bundle 1). Two production-grade gaps closed: 1. SDS JSON config write was non-atomic. Cert/key/chain at envoy.go L155/L168/L183 went through deploy.AtomicWriteFile (atomic + backups + ownership preservation), but the SDS JSON at L260 went through os.WriteFile directly. A power loss / OOM / process-kill mid-write of the SDS JSON produces a torn file Envoy cannot parse, and Envoy's file-based SDS watcher refuses to load any cert (not just the rotating one) until the JSON is repaired by hand. Replaced with deploy.AtomicWriteFile and threaded ctx through writeSDSConfig. 2. No watcher pickup confirmation before returning success. Pre-fix, DeployCertificate returned the moment file writes completed. Envoy's SDS watcher is asynchronous; a caller running post-deploy TLS verify immediately after DeployCertificate could see Envoy still serving the old cert (watcher latency, load-balanced replica hit one that hadn't reloaded yet). Added the canonical post-deploy verify pattern (mirrors nginx.go::runPostDeployVerify L416): probe seam + retry/backoff + SHA-256 fingerprint compare against request.CertPEM. On verify failure, restore from per-file backups via the new restoreFromBackups helper. Envoy has no PostCommit reload to re-run; the watcher auto-reloads on the restored files. Config additions to envoy.Config (mirror nginx.Config L84-93): - PostDeployVerify *PostDeployVerifyConfig (Enabled, Endpoint, Timeout) - PostDeployVerifyAttempts int (default 3 in runPostDeployVerify) - PostDeployVerifyBackoff time.Duration (default 2s) - BackupRetention int (mirrors nginx; passed to AtomicWriteFile per file) Default behaviour unchanged for callers that don't set PostDeployVerify — verify is opt-in. nil or Enabled=false skips it entirely. Probe seam: c.probe = tlsprobe.ProbeTLS at construction; tests inject via the new SetTestProbe method. Same shape NGINX uses (nginx.go:130); also mirrors the existing Traefik SetTestProbe at traefik.go:62. WriteResult retention: every AtomicWriteFile call now retains its *deploy.WriteResult in a local []*deploy.WriteResult slice so the rollback path can restore from BackupPath across all four files (cert, key, chain, SDS JSON), not just the cert. Pre-fix the cert's WriteResult was discarded. restoreFromBackups (envoy.go new): iterates the WriteResults from a successful per-file pass, rewrites each non-idempotent destination from its BackupPath via AtomicWriteFile{SkipIdempotent:true, BackupRetention:-1}. The -1 prevents backup-of-the-backup pollution. For files that didn't exist pre-deploy (BackupPath == ""), restore = remove. Mirrors nginx.go::rollbackToBackups (L487-515) with the reload step elided. Idempotency gate: shouldRunVerify returns true unless EVERY WriteResult was Idempotent — same all-files semantics NGINX gets from res.SkippedAsIdempotent. Pre-fix Envoy had no verify at all, so there was no gate to get wrong; this introduces the correct all-files shape from the start. Tests added to envoy_atomic_test.go: - TestEnvoy_Atomic_SDSConfigWriteIsAtomic — pre-writes a sentinel SDS JSON, runs DeployCertificate, asserts a backup file with deploy.BackupSuffix appears alongside the new sds.json (proves AtomicWriteFile is now in the SDS path). - TestEnvoy_Atomic_WatcherPickupRetries — stub probe returns wrong fingerprint on attempts 1+2 and correct on attempt 3; deploy succeeds; probe called exactly 3 times. - TestEnvoy_Atomic_WatcherPickupAllAttemptsFail_RollsBack — pre-writes SENTINEL bytes for cert+key, stub probe always wrong; deploy returns wrapped error AND the destination files contain the sentinel bytes (rollback restored). - TestEnvoy_Atomic_PostDeployVerifyDisabledByDefault — Config with nil PostDeployVerify; asserts probe is never called (opt-in default preserved). A small certPEMFingerprint helper added to the test file mirrors the production envoy.certPEMToFingerprint (which is package-private — external tests can't call it). docs/deployment-atomicity.md L87 row already documents "TLS handshake | atomic-write replaces os.WriteFile" — pre-fix the claim was aspirational (verify happened in the agent verify-and-report path, not the connector; SDS JSON wasn't atomic). Post-fix the claim is honest. No doc change required. Verified locally: - gofmt -l ./internal/connector/target/envoy/ clean - go vet ./internal/connector/target/envoy/... clean - staticcheck ./internal/connector/target/envoy/... clean - go build ./... clean - go test -race -count=1 ./internal/connector/target/envoy/... green (5 pre-existing tests + 4 new = 9 total) - go test -short -count=1 ./internal/connector/target/... green Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 3. |
||
|
|
7cb453a336 |
chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit
|
||
|
|
a7cce9afdd |
feat(traefik,caddy,envoy,postfix): atomic deploy + post-deploy TLS verify + rollback + ValidateOnly
Phase 7 of the deploy-hardening I master bundle. Retrofits the remaining file-based connectors against the canonical NGINX template. Per-connector quirks codified: - Postfix/Dovecot: full retrofit with PreCommit (postfix check / doveconf -n) + PostCommit (postfix reload / doveadm reload) + post-deploy TLS verify. Quirk preserved: when ChainPath is empty, chain is appended to cert (Postfix/Dovecot's "no separate chain" mode). Per-distro user defaults: postfix, dovecot, _postfix. Default key mode 0600. ValidateOnly real impl returns sentinel when no ValidateCommand. - Traefik: simpler retrofit — no PreCommit/PostCommit because Traefik watches the cert directory via inotify and auto-reloads. Atomic-write via deploy.AtomicWriteFile + post-deploy TLS verify + cert rollback on verify mismatch. Default key mode 0600. ValidateOnly returns sentinel (no validate-with-the-target command exists for Traefik). - Caddy: retrofitted both modes. File mode replaces os.WriteFile with deploy.AtomicWriteFile (preserves the file watcher's auto- reload). API mode unchanged (POST /load already atomic at the Caddy admin server). ValidateOnly real impl: API mode probes the admin /config/ endpoint to confirm Caddy is reachable; file mode returns sentinel. - Envoy: file mode atomic-write via deploy.AtomicWriteFile. Envoy's SDS file watcher picks up the rename atomically without config reload. ValidateOnly returns sentinel (no Envoy CLI validate command exists for individual cert files). Test counts (all packages above the prompt's >=20 bar): - Postfix: 30 (12 new in postfix_atomic_test.go + 18 pre-existing) - Traefik: 22 (12 new in traefik_atomic_test.go + 10 pre-existing) - Caddy: 22 (10 new in caddy_atomic_test.go + 12 pre-existing) - Envoy: 21 (5 new in envoy_atomic_test.go + 16 pre-existing) Coverage: each connector at the prompt's >=80% target. golangci-lint v2.11.4 clean across all 4 connector packages. Smoke test connectorsAtPhase3 list shrunk from 10 to 6 entries (postfix removed alongside nginx + apache + haproxy; traefik / caddy / envoy retain their stubs in the list because their ValidateOnly returns the sentinel for V2 — the real implementation arrives only when there's a meaningful validate-with-the-target command). Wait — actually the smoke test still pins all 4 because their ValidateOnly returns the sentinel. Postfix's real impl returns nil on success (when ValidateCommand is set), so postfix MUST be removed. Caddy's API mode is real-impl. Traefik + Envoy still return sentinel always — they stay in the smoke list. Phase 8 next: F5 + IIS — explicit post-deploy TLS verify + on-failure rollback. Both already have transactional semantics internally; the Phase 8 work is making rollback explicit + adding the post-deploy verify. |
||
|
|
49f1a60762 |
feat(target): ValidateOnly dry-run method on Connector interface (default returns ErrValidateOnlyNotSupported)
Phase 3 of the deploy-hardening I master bundle. Extends the
target.Connector interface with the dry-run method that operators
will use to preview a deploy before committing — but ships only the
default-stub for all 13 connectors. Phases 4-9 replace each stub
with the real validate-with-the-target implementation.
interface.go:
- Add ErrValidateOnlyNotSupported sentinel (frozen decision 0.6 —
connectors that cannot dry-run, like K8s, return this rather than
nil so operator triage can errors.Is for "not supported" vs
"validated successfully").
- Add ValidateOnly(ctx, request DeploymentRequest) error to
Connector interface.
13 new validate_only.go files (one per connector at
internal/connector/target/<name>/validate_only.go):
- apache, caddy, envoy, f5, haproxy, iis, javakeystore, k8ssecret,
nginx, postfix, ssh, traefik, wincertstore.
- Each file is identical except for the package declaration: a
one-method default stub returning target.ErrValidateOnlyNotSupported.
- Per-connector files (rather than a single embed-method approach)
let Phases 4-9 replace each connector's stub independently
without churning a shared base.
Tests:
- internal/connector/target/validate_only_test.go pins the sentinel
contract (errors.Is identity, Error() string, %w wrap propagation).
- internal/connector/target/validate_only_smoke_test.go (external
test package) constructs a zero-value &<pkg>.Connector{} for each
of the 13 connectors and asserts ValidateOnly returns
ErrValidateOnlyNotSupported. The test's
connectorsAtPhase3 list is the load-bearing CI guard:
- A 14th connector added without wiring ValidateOnly fails the
`len(connectorsAtPhase3) != 13` invariant.
- A connector whose real ValidateOnly lands (Phase 4 NGINX, Phase
5 Apache, etc.) MUST be removed from this list or the smoke test
fails (real impl no longer returns the sentinel). That removal
IS the bookkeeping that the operator-visible bit + behavior
change are wired together end-to-end.
Compile + go vet + golangci-lint v2.11.4 + go test all 0 issues.
Phase 4 next: NGINX canonical real-impl — replace the stub with
nginx -t -c <temp>; same time replace the existing os.WriteFile
flow in DeployCertificate with deploy.Apply(...).
|
||
|
|
fd05bacb76 |
feat(M41): Envoy target connector with SDS support
File-based deployment for Envoy service mesh — writes cert/key/chain to watched directory with optional SDS JSON config for xDS bootstrap. Path traversal prevention, configurable filenames, 15 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |