Closes Top-10 fix#8 of the 2026-05-02 deployment-target audit
re-run (see cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md). Pre-fix, every connector's runPostDeployVerify used
linear backoff (default 3 attempts × 2s linear waits). Linear
backoff misbehaves under load-balanced rollouts: the verify
probe hits a random LB-backed pod, and 3 × 2s often falls into
the worst case where match-fingerprint pods stop responding by
attempt 3 due to LB session-stickiness cycles.
This commit:
1. New shared helper internal/tlsprobe/retry.go::
VerifyWithExponentialBackoff. Default 3 attempts; 1s initial,
16s cap. Doubling pattern: 1s → 2s → 4s → 8s → 16s. probe
func(ctx) error signature so connectors compose
handshake + fingerprint-compare into one lambda.
2. Each connector's runPostDeployVerify (nginx, apache, haproxy,
traefik, envoy, postfix, dovecot) rewired to call the
shared helper. Per-connector signature unchanged.
3. New PostDeployVerifyMaxBackoff time.Duration field added to
each connector's Config. Operators preserving V2 linear
behavior set PostDeployVerifyMaxBackoff equal to
PostDeployVerifyBackoff.
4. Tests:
- tlsprobe/retry_test.go: TestVerifyWithExponentialBackoff_
GrowthAndCap + TestVerifyWithExponentialBackoff_
StopsOnFirstSuccess + TestVerifyWithExponentialBackoff_
CtxCancellation.
- One Test<Connector>_VerifyExponentialBackoff_
GrowsBetweenAttempts per connector (6 total across
postfix, nginx, apache, haproxy; traefik and envoy
connectors use unique test signatures so test wiring
deferred to future unification).
5. docs/deployment-atomicity.md Section 4 updated:
'linear backoff' → 'exponential backoff (1s → 16s cap)';
YAML example shows the new field.
Backward-compat note: PostDeployVerifyBackoff was interpreted as
the linear interval pre-fix; post-fix it's interpreted as the
initial backoff (which doubles each attempt). Operators using
the default value (2s) see waits of 2s → 4s → 8s instead of
2s → 2s → 2s. For LB-rollout cases this is the intended
behavior; for single-target deploys the wall-clock is slightly
longer (12s vs 6s for 3 attempts). Operators preserving V2
linear semantics: set PostDeployVerifyMaxBackoff equal to
PostDeployVerifyBackoff.
Verified locally:
- gofmt clean.
- go test -short -count=1 ./internal/tlsprobe/...
./internal/connector/target/{postfix,nginx,apache,haproxy}/... green.
Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md Top-10 fix#8.
Phase 13 verification surfaced gofmt-formatting drift in 6 files
across the bundle's new code:
- internal/api/handler/metrics.go (struct field alignment)
- internal/connector/target/k8ssecret/validate_only_test.go (alignment)
- internal/connector/target/nginx/nginx.go (alignment)
- internal/connector/target/postfix/postfix.go (alignment)
- internal/connector/target/ssh/validate_only_test.go (alignment)
- internal/service/deploy_counters.go (alignment)
Pure mechanical gofmt -w fixes; no behavior changes. CI's
make verify gate (which runs `go fmt ./...`) didn't catch these
because go fmt is more lenient than gofmt -l, but golangci-lint
v2.11.4 + the explicit gofmt step in Phase 13 verification did.
Phase 13 full-matrix verification all green:
- gofmt -l: empty across all bundle-touched files
- go vet ./internal/deploy/... ./internal/connector/target/... ./internal/service/ ./internal/api/handler/ ./cmd/agent/: clean
- golangci-lint v2.11.4 (the version CI runs): 0 issues
- go test -race -count=1 across deploy + nginx + apache + haproxy + agent + service: all green
- INTEGRATION=1 go test -tags integration -run Deploy ./deploy/test/...: 4/4 e2e tests green
Phase 14 next: release prep — Active Focus update, release notes,
Reddit-beat draft, final tag handoff to operator.
Phase 7 of the deploy-hardening I master bundle. Retrofits the
remaining file-based connectors against the canonical NGINX template.
Per-connector quirks codified:
- Postfix/Dovecot: full retrofit with PreCommit (postfix check /
doveconf -n) + PostCommit (postfix reload / doveadm reload) +
post-deploy TLS verify. Quirk preserved: when ChainPath is empty,
chain is appended to cert (Postfix/Dovecot's "no separate chain"
mode). Per-distro user defaults: postfix, dovecot, _postfix.
Default key mode 0600. ValidateOnly real impl returns sentinel
when no ValidateCommand.
- Traefik: simpler retrofit — no PreCommit/PostCommit because
Traefik watches the cert directory via inotify and auto-reloads.
Atomic-write via deploy.AtomicWriteFile + post-deploy TLS verify
+ cert rollback on verify mismatch. Default key mode 0600.
ValidateOnly returns sentinel (no validate-with-the-target
command exists for Traefik).
- Caddy: retrofitted both modes. File mode replaces os.WriteFile
with deploy.AtomicWriteFile (preserves the file watcher's auto-
reload). API mode unchanged (POST /load already atomic at the
Caddy admin server). ValidateOnly real impl: API mode probes
the admin /config/ endpoint to confirm Caddy is reachable;
file mode returns sentinel.
- Envoy: file mode atomic-write via deploy.AtomicWriteFile.
Envoy's SDS file watcher picks up the rename atomically without
config reload. ValidateOnly returns sentinel (no Envoy CLI
validate command exists for individual cert files).
Test counts (all packages above the prompt's >=20 bar):
- Postfix: 30 (12 new in postfix_atomic_test.go + 18 pre-existing)
- Traefik: 22 (12 new in traefik_atomic_test.go + 10 pre-existing)
- Caddy: 22 (10 new in caddy_atomic_test.go + 12 pre-existing)
- Envoy: 21 (5 new in envoy_atomic_test.go + 16 pre-existing)
Coverage: each connector at the prompt's >=80% target. golangci-lint
v2.11.4 clean across all 4 connector packages.
Smoke test connectorsAtPhase3 list shrunk from 10 to 6 entries
(postfix removed alongside nginx + apache + haproxy; traefik /
caddy / envoy retain their stubs in the list because their
ValidateOnly returns the sentinel for V2 — the real implementation
arrives only when there's a meaningful validate-with-the-target
command).
Wait — actually the smoke test still pins all 4 because their
ValidateOnly returns the sentinel. Postfix's real impl returns nil
on success (when ValidateCommand is set), so postfix MUST be
removed. Caddy's API mode is real-impl. Traefik + Envoy still
return sentinel always — they stay in the smoke list.
Phase 8 next: F5 + IIS — explicit post-deploy TLS verify +
on-failure rollback. Both already have transactional semantics
internally; the Phase 8 work is making rollback explicit + adding
the post-deploy verify.
Dual-mode TLS connector for mail servers — single package with mode
field selecting Postfix or Dovecot defaults. File-based cert/key
deployment with correct permissions (cert 0644, key 0600), optional
chain append, shell injection prevention, and configurable
reload/validate commands. 18 tests covering config validation,
deployment, and security. GUI wizard fields and OpenAPI enum updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>