Closes Top-10 fix#4 of the 2026-05-02 deployment-target audit
re-run (see cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md). Pre-fix, both IIS and WinCertStore's realExecutor
invoked PowerShell via exec.CommandContext(ctx, ...) and relied
entirely on the caller's ctx to provide a deadline. If the caller
forgot to attach one (context.Background() in a deeply-nested
path; an operator running an ad-hoc deploy via a CLI that doesn't
default-deadline its ctx), a hung WinRM session blocked the
deploy worker thread indefinitely.
S2 (failure isolation) bar from the audit: "does a hung WinRM
take down the deploy worker pool?" — today's answer was
"potentially yes" for these two connectors. Post-fix the answer
is "no, capped at the configured ExecDeadline (default 60s)".
This commit:
1. Adds Config.ExecDeadline (time.Duration, json: "exec_deadline")
to both connectors, defaulted to 60 seconds. WinCertStore
defaults via the existing applyDefaults helper; IIS defaults
inline at New() and inside ValidateConfig (the IIS connector
has no shared applyDefaults helper today; out-of-scope to
refactor one in for this minor fix). Operators on slow
Windows links can override via the JSON config field
exec_deadline.
2. Wraps realExecutor.Execute with a fallback context.WithTimeout
that fires ONLY when ctx has no deadline of its own. Caller-
supplied deadlines always win — the wrapper is a safety net,
not a hard cap. defer cancel() guards against goroutine leaks.
3. Tests:
- TestIIS_RealExecutor_AttachesDefaultDeadlineWhenCallerHasNone
(passes context.Background; asserts the call returns within
500ms with an error). On Linux/macOS runners powershell.exe
is missing and exec.Cmd fails fast; on Windows the wrapper's
ctx deadline cancels the running PowerShell process. Either
path returns well under 500ms.
- TestIIS_RealExecutor_RespectsCallerDeadlineWhenSet (10s
fallback executor deadline, 50ms caller ctx; asserts caller
deadline wins).
- TestIIS_RealExecutor_NoDeadlineWiredWhenZero (deadline=0
means no fallback wrapper; caller's tight ctx still bounds).
- TestIIS_New_DefaultsExecDeadlineTo60s + TestIIS_New_RespectsExplicitExecDeadline
pin the constructor's defaulting behavior (uses winrm mode
so the test doesn't need powershell.exe in PATH).
- Same five tests in wincertstore_test.go.
4. docs/connectors.md IIS + WinCertStore sections document the
new exec_deadline field with: what it is (per-PowerShell-
subprocess cap), default (60 seconds), override semantics
(caller ctx deadline wins).
No change to behavior when the caller already attaches a deadline
(the common case in production code paths). Tests using the mock
executor (mockExecutor in iis_test.go / wincertstore_test.go)
are unaffected — they bypass realExecutor entirely.
S2 cross-cutting scorecard rating in
cowork/deployment-target-audit-2026-05-02-rerun/findings.json
flips from "gap" to "pass" for IIS and WinCertStore (in any
future re-audit).
Verified locally:
- gofmt / go vet / staticcheck clean across both packages.
- go test -race -count=1 ./internal/connector/target/iis/...
./internal/connector/target/wincertstore/... green.
Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md Top-10 fix#4.
Closes Top-10 fix#3 of the 2026-05-02 deployment-target audit
re-run (see cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md). Pre-fix, the three PowerShell-driven connectors
(IIS / WinCertStore / JavaKeystore) bypass internal/deploy.Apply
because they write to the Windows cert store / Java keystore via
PowerShell + keytool rather than the local filesystem. They don't
get deploy.Apply's SHA-256 idempotency short-circuit for free, so
every renewal triggers a full Remove+Import cycle even on byte-
identical material. Operators with 60-day rotation see unnecessary
cert-store / keystore churn, briefly bumping CPU and possibly
disrupting connections in flight.
This commit adds a per-connector idempotency probe modeled on
Bundle 9's Caddy api-mode SHA-256 short-circuit (commit 8cda860).
Each probe runs at the top of DeployCertificate, BEFORE the
destructive step, with a unique # CERTCTL_IDEM_PROBE PowerShell
comment tag so test mocks match deterministically.
IIS: Get-ChildItem Cert:\... + Get-WebBinding; matches when both
the cert is in the store AND the active binding's certificateHash
equals the new thumbprint.
WinCertStore: Get-ChildItem Cert:\...\<thumbprint>; matches when
the cert exists in the configured store AND its NotAfter is
still in the future.
JavaKeystore: keytool -list -alias -v; matches when the parsed
SHA-256 fingerprint equals sha256(certPEM_DER).
On match: return Success=true with Metadata["idempotent"]="true",
no destructive operation. On any error during the probe (network,
parse, etc.): fall through to today's full deploy path.
False negatives are safe; false positives are dangerous.
Tests added (one positive + one negative per connector):
- TestIIS_Idempotent_SkipsDeployWhenBindingMatches
- TestIIS_Idempotent_DifferentBinding_FallsThroughToDeploy
- TestWinCertStore_Idempotent_SkipsImportWhenCertInStore
- TestWinCertStore_Idempotent_NotInStore_FallsThroughToDeploy
- TestJKS_Idempotent_SkipsDeployWhenAliasMatches
- TestJKS_Idempotent_DifferentAlias_FallsThroughToDeploy
Verified locally:
- gofmt clean across all three connectors.
- Syntax-validated via gofmt.
Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md Top-10 fix#3.
Closes Bundle 5 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
DeployCertificate at iis.go:235-436 imported the cert via
Import-PfxCertificate (atomic at cert-store level) then ran a
separate PowerShell script for the SNI binding update. If the
binding script failed, the new cert was orphaned in the store AND
the old binding stayed pointed at the old thumbprint.
docs/deployment-atomicity.md L91 promised "explicit pre-deploy
backup + post-rollback re-import"; the code didn't deliver.
This commit:
1. Pre-deploy snapshot. snapshotOldBinding runs Get-WebBinding
before the import; parses the bound SSL thumbprint into a local
`oldThumbprint` variable. Empty = first-time binding (no
rollback target).
2. On-failure rollback script. When the binding-update Execute
returns error, rollbackBinding runs a single PowerShell script
that:
- Remove-Item Cert:\LocalMachine\<store>\<newThumbprint> (delete
the cert we just imported but couldn't bind).
- If oldThumbprint != "", AddSslCertificate('<oldThumbprint>',
...) to re-bind the old cert. Falls through to New-WebBinding
+ AddSslCertificate when the old binding entry is also gone.
3. Post-rollback verification. verifyRollback re-reads
Get-WebBinding; asserts the bound thumbprint matches
oldThumbprint. On mismatch, warn in the DeploymentResult
message — the rollback ran but final state is suspect, operator
inspection required. Skipped when oldThumbprint == "" (no
binding to verify against).
4. Helper extraction. snapshotOldBinding / rollbackBinding /
verifyRollback are private methods on Connector for clean test
seams. Each emits a unique `# CERTCTL_*` PowerShell comment tag
so test mocks can match scripts deterministically — multiple
scripts call Get-WebBinding so substring matching otherwise
collides under Go's randomized map iteration order.
DeploymentResult shape on failure:
- rollback OK → Success=false, Message="binding update failed;
rolled back", clean error.
- rollback FAIL → Success=false, wrapped error containing both
binding error and rollback error; metadata
flags manual_action_required=true and surfaces
rollback_error / binding_error verbatim.
Tests added to iis_test.go:
- TestIIS_BindingUpdateFails_RemovesNewCert_RebindsOld — happy
rollback path. Mock executor queued with snapshot →
OLD_THUMBPRINT:abc123, import OK, binding fails, rollback →
REBOUND_EXISTING. Asserts rollback script contains both
Remove-Item for the new thumbprint AND
AddSslCertificate('abc123', ...).
- TestIIS_BindingUpdateFails_NoOldBinding_RemovesNewCertOnly —
first-time deploy variant. Snapshot returns NO_OLD_BINDING;
rollback removes the new cert but does NOT call
AddSslCertificate; verify script never runs.
- TestIIS_BindingUpdateFails_RollbackAlsoFails_OperatorActionable
— wrapped-error escalation. Asserts the returned error mentions
both `binding update failed` and `rollback also failed`, and
metadata flags manual_action_required=true.
Two existing tests (TestIISConnector_DeployCertificate_Success and
…_SNIEnabled) updated to expect 3 commands (snapshot, import,
binding) and to look for the binding script at commands[2].
docs/deployment-atomicity.md L91 unchanged from today's text — the
"Already explicit pre-deploy backup + post-rollback re-import"
claim is now honest. (Bundle 1 doc-realignment hasn't shipped yet,
so there's no softened-pending claim to restore.)
Verified locally (sandbox lacks staticcheck install due to disk
pressure, ran via go vet + go test -race; CI runs the full lint
gate):
- gofmt -l ./internal/connector/target/iis/ clean
- go vet ./internal/connector/target/iis/... clean
- go build ./internal/connector/target/iis/... clean
- go test -race -count=1 ./internal/connector/target/iis/ green
Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 5.
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit 71b2245) surfaced 111 files
with accumulated gofmt drift across cmd/, internal/, and deploy/test/.
Each file's diff is gofmt-standard: whitespace adjustments, intra-
group import sorting (alphabetical by import path within blank-line-
separated groups), and struct-tag column alignment. No semantic
changes — verified via 'git diff --ignore-all-space' which shows only
the line-position deltas from import reordering.
The gate stays in place after this commit. Going forward it catches
gofmt drift at PR time.
Phase 8 of the deploy-hardening I master bundle. F5 + IIS already
have transactional / explicit-backup-restore rollback semantics
in their DeployCertificate paths. Phase 8 adds the explicit
ValidateOnly dry-run probe that operators use to preview a deploy
without touching the live cert.
F5 (validate_only.go):
- ValidateOnly probes the iControl REST API via Authenticate.
Cheap (no F5 transaction created) + cached after first success.
Failure surfaces as a wrapped error so operators see the actual
cause (auth provider down, invalid creds, BIG-IP unreachable,
etc.). nil client returns ErrValidateOnlyNotSupported.
- A true cert-bind dry-run requires F5's no-commit transaction
mode (v17.5+); V3-Pro can add per-version dispatch. V2 ships
the reachability probe as the load-bearing safety check.
- 5 new tests in validate_only_test.go covering: auth-success,
auth-fail wrapped, nil-client sentinel, error-message contains
BIG-IP context, recoverable auth-fail surfaces provider info.
IIS (validate_only.go):
- ValidateOnly runs `Get-WebSite -Name <SiteName>` via the
injected PowerShellExecutor. Confirms the IIS PS module is
loaded AND the site exists AND the agent has admin privileges.
Failure here surfaces the actual PowerShell stderr (site not
found / module missing / access denied).
- A true cert-bind dry-run would need IIS to expose a no-commit
New-WebBinding (it doesn't); V3-Pro can extend with a
temp-install + immediate-remove. V2 ships the permission +
module probe as the load-bearing check.
- 5 new tests in validate_only_test.go covering: get-website
succeeds, get-website fails, nil-executor sentinel, site-name
quoting (handles spaces in 'Default Web Site'), output-context
in error.
Smoke test connectorsAtPhase3 list shrunk from 10 to 7 entries
(f5 + iis + postfix removed). Caddy stays in (file-mode returns
sentinel; api-mode is real-impl). Envoy + Traefik stay in (no
validate-with-target command exists for either). javakeystore +
k8ssecret + ssh + wincertstore stay in pending Phase 9.
Coverage: F5 holds at ≥85%; IIS holds at ≥85%. Race detector
clean. golangci-lint v2.11.4 clean.
Phase 9 next: SSH + WinCertStore + JavaKeystore + K8s — the
non-file-server connectors.
Phase 3 of the deploy-hardening I master bundle. Extends the
target.Connector interface with the dry-run method that operators
will use to preview a deploy before committing — but ships only the
default-stub for all 13 connectors. Phases 4-9 replace each stub
with the real validate-with-the-target implementation.
interface.go:
- Add ErrValidateOnlyNotSupported sentinel (frozen decision 0.6 —
connectors that cannot dry-run, like K8s, return this rather than
nil so operator triage can errors.Is for "not supported" vs
"validated successfully").
- Add ValidateOnly(ctx, request DeploymentRequest) error to
Connector interface.
13 new validate_only.go files (one per connector at
internal/connector/target/<name>/validate_only.go):
- apache, caddy, envoy, f5, haproxy, iis, javakeystore, k8ssecret,
nginx, postfix, ssh, traefik, wincertstore.
- Each file is identical except for the package declaration: a
one-method default stub returning target.ErrValidateOnlyNotSupported.
- Per-connector files (rather than a single embed-method approach)
let Phases 4-9 replace each connector's stub independently
without churning a shared base.
Tests:
- internal/connector/target/validate_only_test.go pins the sentinel
contract (errors.Is identity, Error() string, %w wrap propagation).
- internal/connector/target/validate_only_smoke_test.go (external
test package) constructs a zero-value &<pkg>.Connector{} for each
of the 13 connectors and asserts ValidateOnly returns
ErrValidateOnlyNotSupported. The test's
connectorsAtPhase3 list is the load-bearing CI guard:
- A 14th connector added without wiring ValidateOnly fails the
`len(connectorsAtPhase3) != 13` invariant.
- A connector whose real ValidateOnly lands (Phase 4 NGINX, Phase
5 Apache, etc.) MUST be removed from this list or the smoke test
fails (real impl no longer returns the sentinel). That removal
IS the bookkeeping that the operator-visible bit + behavior
change are wired together end-to-end.
Compile + go vet + golangci-lint v2.11.4 + go test all 0 issues.
Phase 4 next: NGINX canonical real-impl — replace the stub with
nginx -t -c <temp>; same time replace the existing os.WriteFile
flow in DeployCertificate with deploy.Apply(...).
Closes Q-1 (cat-s3-58ce7e9840be) — 37 t.Skip / testing.Short() sites
across 9 test files audited. Per-site verdict matrix:
- cmd/agent/verify_test.go (1 site): defensive guard against unreachable
httptest.NewTLSServer code path. Document-skip with closure comment.
- deploy/test/qa_test.go (11 sites): file already gated by `//go:build qa`
tag. The 11 t.Skip("Requires X — manual test") markers are runtime
second-line guards for operators who run -tags qa against a stack
missing the required external service. File-level header comment
block added explaining the manual-test convention.
- deploy/test/healthcheck_test.go (5 sites): 3 docker-availability +
1 testing.Short + 1 hard-skip for not-yet-wired runtime probe
(image-spec contract above already covers the audit-flagged
regression). All correctly gated; file-level header comment block
added explaining each.
- deploy/test/integration_test.go (5 sites): in-flight-state guards
(poll-with-skip after 90s polling for agent-online, inter-test
Phase04→Phase07 ordering, scheduler-tick race for discovered certs,
inter-test issuer fallthrough, defensive PEM-empty assertion).
Each site now has a closure comment explaining why skip is the
right choice rather than fail (upstream phase already surfaces the
real failure; skipping prevents masking root cause behind cascading
noise).
- internal/repository/postgres/{testutil,seed,repo}_test.go (5 sites):
testing.Short() gates for testcontainers-backed live PostgreSQL
integration tests. All correctly gated; closure comments added
naming the run command.
- internal/connector/notifier/email/email_test.go (2 sites):
anti-fixture assertions (test asserts SMTP dial fails; if a captive
portal black-holes the call to success, skip rather than false-pass).
Closure comments added explaining the fixture assumption.
- internal/connector/target/iis/iis_test.go (2 sites): platform-gated
skip for powershell.exe absence on non-Windows hosts. Mirrors the
production iis_connector.go LookPath guard. Closure comments added.
Total: 17 closure comments anchor the 37 skip sites (some sites share a
single block-level comment). All skips remain in place; the change is
purely documentation. The audit recommendation was "audit each skip and
decide" — for these 37, the decision is uniformly **document-skip**:
the gating is correct, the t.Skip messages name the missing precondition,
and the closure comments now pin the rationale for future readers.
See coverage-gap-audit-2026-04-24-v5/unified-audit.md
cat-s3-58ce7e9840be for closure rationale.
Complete the IIS target connector with dual-mode deployment:
- WinRM proxy agent mode via masterzen/winrm for remote Windows servers
- Base64 PFX transfer with try/finally cleanup on remote host
- GUI wizard updated with 13 IIS config fields including WinRM settings
- TargetDetailPage sensitive field redaction (password/secret/token/key)
- OpenAPI TargetType enum updated (added Traefik, Caddy)
- connectors.md fully documented with WinRM proxy config example
- 38 total IIS tests (10 new WinRM tests), all passing with race detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement full IIS target connector with PEM-to-PFX conversion via
go-pkcs12, PowerShell-based deployment (Import-PfxCertificate, IIS
binding management), SHA-1 thumbprint computation, and SNI support.
Injectable PowerShellExecutor interface enables cross-platform testing.
Regex-validated config fields prevent PowerShell injection. 28 tests.
Restructure README from 563 to 313 lines: outcome-focused feature
descriptions, "Who Is This For" persona section, examples promoted
above the fold, configuration/API/security reference moved to docs.
All numbers verified against repo (25 GUI pages, 97 OpenAPI ops,
CI thresholds service 55%/handler 60%/domain 40%/middleware 30%).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes Go Report Card gofmt score from 52% to 100%.
Pure formatting changes — no logic modifications.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>