wincertstore: pre-deploy snapshot + on-import-failure rollback

Closes Bundle 7 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
DeployCertificate at wincertstore.go:162-215 ran a single PowerShell
script that imported the PFX, optionally set FriendlyName, and
optionally removed expired same-Subject certs. Import-PfxCertificate
is atomic at the cert-store level, but the wider sequence (import →
friendly name → remove expired) is not. Failure in any post-import
step left the new cert in the store with no clean recovery path.
docs/deployment-atomicity.md L93 promised "Get-ChildItem snapshot
for rollback"; the code didn't deliver.

This commit:

1. Pre-deploy snapshot. New PowerShell script (tagged
   `# CERTCTL_SNAPSHOT`) runs Get-ChildItem over the target store,
   captures every thumbprint, and for each cert with the same
   Subject as the new one calls Export-PfxCertificate to a tempdir
   using a transient snapshotExportPassword (32-byte random,
   distinct from the import PFX password). Output parsed into a
   snapshotState{Entries: []{Thumbprint, PfxPath}, AllThumbprints,
   TempDir, ExportPassword}. The new cert's Subject is parsed from
   request.CertPEM via certutil.ParseCertificatePEM before any
   cert-store mutation; PEM-parse failure aborts the deploy
   cleanly.

2. On-import-failure rollback. When the import-script Execute
   returns error, run a rollback script (tagged
   `# CERTCTL_ROLLBACK`) that:
   - Test-Path on the new cert path; Remove-Item if present.
   - Import-PfxCertificate -FilePath <pfxPath> for each snapshot
     entry (restores prior state).
   - Remove-Item -Recurse on the snapshot tempdir.

3. Post-rollback verification. Re-read Get-ChildItem (tagged
   `# CERTCTL_VERIFY`); assert every original thumbprint is back.
   On mismatch, append a warning to the DeploymentResult message
   (rollback ran but final state is suspect — operator inspection
   recommended). Skipped when AllThumbprints is empty (first-time
   deploy).

4. Success-path tempdir cleanup. New script tagged
   `# CERTCTL_CLEANUP` runs after a successful import to remove
   the snapshot tempdir on a best-effort basis. Failure here is
   non-fatal (debug log only).

5. Helper extraction. rollbackImport(ctx, snapshot, newThumbprint)
   + verifyRollback(ctx, snapshot) + cleanupSnapshot(ctx, snapshot)
   + parseSnapshotOutput are private methods/functions on
   Connector for clean test seams. Each script emits a unique
   `# CERTCTL_*` PowerShell comment tag so test mocks can match
   scripts deterministically — the snapshot/rollback/verify/cleanup
   scripts all reference Cert:\<store> paths, so the comment tags
   are the only deterministic substring under randomized map
   iteration.

DeploymentResult shape on failure:
- import OK, rollback OK   → Success=false, "PowerShell import
                              failed; rolled back" (clean
                              recoverable failure).
- import FAIL, rollback OK → same.
- rollback FAIL            → operator-actionable wrapped error
                              containing both errors; metadata
                              flags manual_action_required=true
                              and surfaces import_error /
                              rollback_error verbatim.

Tests added to wincertstore_test.go:
- TestWinCertStore_ImportFails_RemovesNewCert_RestoresOldFromSnapshot
  — happy rollback path with one same-Subject cert in the
  snapshot. Asserts rollback script contains Remove-Item for the
  new thumbprint AND Import-PfxCertificate referencing the
  snapshotted PFX path.
- TestWinCertStore_ImportFails_NoExistingSameSubject_RemovesNewCertOnly
  — snapshot has THUMB: lines but no SNAPSHOT: entries; rollback
  removes the new cert but does NOT call Import-PfxCertificate.
- TestWinCertStore_FriendlyNameFails_NewCertRemoved_OldCertsRestored
  — variant where the import script's failure originates from
  Set-ItemProperty FriendlyName; same rollback path. Asserts
  metadata.import_error preserves the FriendlyName-related
  PowerShell output for operator visibility.
- TestWinCertStore_ImportFails_RollbackAlsoFails_OperatorActionable
  — wrapped-error escalation. Asserts the error mentions both
  "PowerShell import failed" and "rollback also failed", and
  metadata flags manual_action_required=true.

Three existing tests (Success, ImportFailed, WithFriendlyName,
WithRemoveExpired) updated to match the new contract: success
path runs 3 PowerShell scripts (snapshot + import + cleanup),
import-failure path runs 4 (snapshot + import + rollback + verify),
and the import script lives at mock.scripts[1] not [0].

PowerShell injection note: the new cert's Subject DN is embedded
in the snapshot script as a single-quoted literal. Subject DNs can
contain apostrophes (e.g. CN=O'Reilly), so escapePowerShellSingleQuoted
doubles them per the PowerShell single-quoted-literal escape rule.
The export password and thumbprints come from
certutil.GenerateRandomPassword (alphanumeric only) and the cert's
SHA-1 thumbprint hex (alphanumeric); no escaping needed for those.

docs/deployment-atomicity.md L93 unchanged from today's text —
Bundle 1 doc-realignment hasn't shipped, so the "Get-ChildItem
snapshot for rollback" line was never softened. Post-Bundle-7 the
claim is honest (was aspirational pre-fix).

Verified locally (sandbox lacks staticcheck install due to disk
pressure; CI runs the full lint gate):
- gofmt -l ./internal/connector/target/wincertstore/  clean
- go vet ./internal/connector/target/wincertstore/  clean
- go build ./cmd/agent/...  clean
- go test -race -count=1 ./internal/connector/target/wincertstore/
  green

Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 7.
This commit is contained in:
shankar0123
2026-05-02 18:13:40 +00:00
parent c222c8b57a
commit 60ae92b0e8
2 changed files with 657 additions and 19 deletions
@@ -197,8 +197,14 @@ func TestDeployCertificate_Success(t *testing.T) {
t.Fatalf("generate cert: %v", err)
}
// Bundle 7: success path runs 3 PowerShell scripts in order:
// snapshot → import → cleanup. Seed responses for each.
mock := &mockExecutor{
responses: []string{"SUCCESS:AABBCCDD"},
responses: []string{
"TEMPDIR:/tmp/certctl-snapshot-abc",
"SUCCESS:AABBCCDD",
"CLEANUP_OK",
},
}
c := NewWithExecutor(&Config{
StoreName: "My",
@@ -222,16 +228,22 @@ func TestDeployCertificate_Success(t *testing.T) {
t.Errorf("expected store_name metadata 'My', got: %s", result.Metadata["store_name"])
}
// Verify the PowerShell script was called
if len(mock.scripts) != 1 {
t.Fatalf("expected 1 script call, got %d", len(mock.scripts))
// Bundle 7: 3 scripts on success path (snapshot + import + cleanup).
if len(mock.scripts) != 3 {
t.Fatalf("expected 3 script calls (snapshot + import + cleanup), got %d", len(mock.scripts))
}
script := mock.scripts[0]
if !strings.Contains(script, "Import-PfxCertificate") {
t.Error("expected Import-PfxCertificate in script")
if !strings.Contains(mock.scripts[0], "# CERTCTL_SNAPSHOT") {
t.Errorf("expected # CERTCTL_SNAPSHOT in first script, got: %s", mock.scripts[0])
}
if !strings.Contains(script, "Cert:\\LocalMachine\\My") {
t.Error("expected correct cert store path in script")
importScript := mock.scripts[1]
if !strings.Contains(importScript, "Import-PfxCertificate") {
t.Error("expected Import-PfxCertificate in second script")
}
if !strings.Contains(importScript, "Cert:\\LocalMachine\\My") {
t.Error("expected correct cert store path in second script")
}
if !strings.Contains(mock.scripts[2], "# CERTCTL_CLEANUP") {
t.Errorf("expected # CERTCTL_CLEANUP in third script, got: %s", mock.scripts[2])
}
}
@@ -267,9 +279,16 @@ func TestDeployCertificate_ImportFailed(t *testing.T) {
t.Fatalf("generate cert: %v", err)
}
// Bundle 7: snapshot returns empty (no thumbprints in store) → import
// fails → rollback runs (and succeeds since snapshot was empty, only
// removes the new cert if it landed). Mock seeds 3 responses.
mock := &mockExecutor{
responses: []string{"Access denied"},
errors: []error{fmt.Errorf("exit code 1")},
responses: []string{
"TEMPDIR:/tmp/certctl-snapshot-xyz",
"Access denied",
"ROLLBACK_OK",
},
errors: []error{nil, fmt.Errorf("exit code 1"), nil},
}
c := NewWithExecutor(&Config{}, testLogger(), mock)
@@ -280,6 +299,11 @@ func TestDeployCertificate_ImportFailed(t *testing.T) {
if err == nil || !strings.Contains(err.Error(), "PowerShell import failed") {
t.Fatalf("expected import failure error, got: %v", err)
}
// Bundle 7: error message must reference rollback so operators know
// the deploy left the store in a known state.
if !strings.Contains(err.Error(), "rolled back") {
t.Errorf("expected error to mention 'rolled back', got: %v", err)
}
}
func TestDeployCertificate_WithFriendlyName(t *testing.T) {
@@ -288,7 +312,13 @@ func TestDeployCertificate_WithFriendlyName(t *testing.T) {
t.Fatalf("generate cert: %v", err)
}
mock := &mockExecutor{responses: []string{"SUCCESS:AABB"}}
mock := &mockExecutor{
responses: []string{
"TEMPDIR:/tmp/certctl-snapshot-fn",
"SUCCESS:AABB",
"CLEANUP_OK",
},
}
c := NewWithExecutor(&Config{
StoreName: "My",
FriendlyName: "Production API Cert",
@@ -301,8 +331,12 @@ func TestDeployCertificate_WithFriendlyName(t *testing.T) {
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
if !strings.Contains(mock.scripts[0], "FriendlyName") {
t.Error("expected FriendlyName in PowerShell script")
// Bundle 7: import script is now mock.scripts[1] (snapshot is [0]).
if len(mock.scripts) < 2 {
t.Fatalf("expected at least 2 scripts (snapshot + import), got %d", len(mock.scripts))
}
if !strings.Contains(mock.scripts[1], "FriendlyName") {
t.Error("expected FriendlyName in import script")
}
}
@@ -312,7 +346,13 @@ func TestDeployCertificate_WithRemoveExpired(t *testing.T) {
t.Fatalf("generate cert: %v", err)
}
mock := &mockExecutor{responses: []string{"SUCCESS:AABB"}}
mock := &mockExecutor{
responses: []string{
"TEMPDIR:/tmp/certctl-snapshot-re",
"SUCCESS:AABB",
"CLEANUP_OK",
},
}
c := NewWithExecutor(&Config{
StoreName: "My",
RemoveExpired: true,
@@ -325,8 +365,276 @@ func TestDeployCertificate_WithRemoveExpired(t *testing.T) {
if err != nil {
t.Fatalf("deploy failed: %v", err)
}
if !strings.Contains(mock.scripts[0], "Remove-Item") {
t.Error("expected Remove-Item for expired cert cleanup in script")
// Bundle 7: import script is mock.scripts[1].
if len(mock.scripts) < 2 {
t.Fatalf("expected at least 2 scripts (snapshot + import), got %d", len(mock.scripts))
}
if !strings.Contains(mock.scripts[1], "Remove-Item") {
t.Error("expected Remove-Item for expired cert cleanup in import script")
}
}
// --- Bundle 7: pre-deploy snapshot + on-import-failure rollback ---
//
// These four tests pin the load-bearing rollback contract added in
// Bundle 7 of the 2026-05-02 deployment-target audit:
// - happy rollback path: snapshot finds same-Subject cert → import
// fails → rollback removes new cert + re-imports snapshot;
// - first-time deploy: snapshot finds no same-Subject certs → import
// fails → rollback only removes the new cert (no re-import);
// - FriendlyName-step failure: import script fails on Set
// FriendlyName → same rollback path;
// - rollback-also-fails: operator-actionable wrapped error.
func TestWinCertStore_ImportFails_RemovesNewCert_RestoresOldFromSnapshot(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
// Snapshot finds one same-Subject cert and exports it.
// Import fails. Rollback succeeds. Verify confirms.
mock := &mockExecutor{
responses: []string{
"SNAPSHOT:OLDTHUMB123:/tmp/certctl-snapshot-abc/OLDTHUMB123.pfx\nTEMPDIR:/tmp/certctl-snapshot-abc",
"PFX import error",
"ROLLBACK_OK",
"VERIFY_OK",
},
errors: []error{nil, fmt.Errorf("exit code 1"), nil, nil},
}
c := NewWithExecutor(&Config{
StoreName: "My",
StoreLocation: "LocalMachine",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when import fails")
}
if result == nil {
t.Fatal("expected non-nil result on rollback path")
}
if result.Success {
t.Fatal("expected failure result")
}
if !strings.Contains(err.Error(), "PowerShell import failed") {
t.Errorf("expected error to mention import failure, got: %v", err)
}
if !strings.Contains(err.Error(), "rolled back") {
t.Errorf("expected error to mention 'rolled back', got: %v", err)
}
// 4 scripts: snapshot + import + rollback + verify.
if len(mock.scripts) != 4 {
t.Fatalf("expected 4 scripts (snapshot + import + rollback + verify), got %d", len(mock.scripts))
}
// Locate the rollback script and assert it contains BOTH a Remove-Item
// for the new thumbprint AND an Import-PfxCertificate for the
// snapshotted PFX.
var rollbackScript string
for _, s := range mock.scripts {
if strings.Contains(s, "# CERTCTL_ROLLBACK") {
rollbackScript = s
break
}
}
if rollbackScript == "" {
t.Fatal("expected rollback script to be executed")
}
if !strings.Contains(rollbackScript, "Remove-Item") {
t.Errorf("expected rollback to contain Remove-Item, got: %s", rollbackScript)
}
if !strings.Contains(rollbackScript, "Import-PfxCertificate") {
t.Errorf("expected rollback to Import-PfxCertificate the snapshot, got: %s", rollbackScript)
}
if !strings.Contains(rollbackScript, "OLDTHUMB123.pfx") {
t.Errorf("expected rollback to reference the snapshot pfx path, got: %s", rollbackScript)
}
if result.Metadata["rolled_back"] != "true" {
t.Errorf("expected rolled_back=true in metadata, got: %s", result.Metadata["rolled_back"])
}
}
func TestWinCertStore_ImportFails_NoExistingSameSubject_RemovesNewCertOnly(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
// Snapshot returns THUMB lines (different-Subject certs in store) but
// NO SNAPSHOT lines — no same-Subject cert was exported. Rollback
// removes the new cert but does NOT call Import-PfxCertificate.
mock := &mockExecutor{
responses: []string{
"THUMB:UNRELATED1\nTHUMB:UNRELATED2\nTEMPDIR:/tmp/certctl-snapshot-noss",
"PFX import error",
"ROLLBACK_OK",
"VERIFY_OK",
},
errors: []error{nil, fmt.Errorf("exit code 1"), nil, nil},
}
c := NewWithExecutor(&Config{
StoreName: "My",
StoreLocation: "LocalMachine",
}, testLogger(), mock)
_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when import fails")
}
var rollbackScript string
for _, s := range mock.scripts {
if strings.Contains(s, "# CERTCTL_ROLLBACK") {
rollbackScript = s
break
}
}
if rollbackScript == "" {
t.Fatal("expected rollback script to be executed")
}
if !strings.Contains(rollbackScript, "Remove-Item") {
t.Errorf("expected rollback to remove new cert via Remove-Item, got: %s", rollbackScript)
}
// No same-Subject snapshots → no Import-PfxCertificate during rollback.
if strings.Contains(rollbackScript, "Import-PfxCertificate") {
t.Errorf("expected no Import-PfxCertificate when snapshot has no same-Subject entries, got: %s", rollbackScript)
}
}
func TestWinCertStore_FriendlyNameFails_NewCertRemoved_OldCertsRestored(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
// The connector cannot inspect WHICH step inside the import script
// failed — Execute returns a single (output, error). Operators see
// the FriendlyName failure surfaced via the error output. The
// rollback runs the same way regardless of which post-Import step
// failed (FriendlyName, Get-ChildItem verify, RemoveExpired).
mock := &mockExecutor{
responses: []string{
"SNAPSHOT:OLDTHUMB456:/tmp/certctl-snapshot-fn/OLDTHUMB456.pfx\nTEMPDIR:/tmp/certctl-snapshot-fn",
"Cannot set FriendlyName: invalid character in friendly name",
"ROLLBACK_OK",
"VERIFY_OK",
},
errors: []error{nil, fmt.Errorf("Set-ItemProperty failed"), nil, nil},
}
c := NewWithExecutor(&Config{
StoreName: "My",
StoreLocation: "LocalMachine",
FriendlyName: "Production",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when FriendlyName step fails")
}
if result.Success {
t.Fatal("expected failure result")
}
// Operator visibility: the import_error in metadata should preserve
// the PowerShell output so operators can tell what went wrong.
if !strings.Contains(result.Metadata["import_error"], "FriendlyName") {
t.Errorf("expected import_error to reference FriendlyName, got: %s", result.Metadata["import_error"])
}
var rollbackScript string
for _, s := range mock.scripts {
if strings.Contains(s, "# CERTCTL_ROLLBACK") {
rollbackScript = s
break
}
}
if rollbackScript == "" {
t.Fatal("expected rollback script to be executed")
}
if !strings.Contains(rollbackScript, "Remove-Item") {
t.Errorf("expected rollback to Remove-Item the new cert, got: %s", rollbackScript)
}
if !strings.Contains(rollbackScript, "OLDTHUMB456.pfx") {
t.Errorf("expected rollback to restore snapshotted cert, got: %s", rollbackScript)
}
if result.Metadata["rolled_back"] != "true" {
t.Errorf("expected rolled_back=true, got: %s", result.Metadata["rolled_back"])
}
}
func TestWinCertStore_ImportFails_RollbackAlsoFails_OperatorActionable(t *testing.T) {
certPEM, keyPEM, err := generateTestCertAndKey()
if err != nil {
t.Fatalf("generate cert: %v", err)
}
// Snapshot succeeds; import fails; rollback ALSO fails. Operator-
// actionable case: both errors must be surfaced + metadata flags
// manual_action_required.
mock := &mockExecutor{
responses: []string{
"SNAPSHOT:OLDTHUMB789:/tmp/certctl-snapshot-rbf/OLDTHUMB789.pfx\nTEMPDIR:/tmp/certctl-snapshot-rbf",
"Import error",
"Rollback step failed",
},
errors: []error{
nil,
fmt.Errorf("import-step exit code 1"),
fmt.Errorf("rollback-step exit code 2"),
},
}
c := NewWithExecutor(&Config{
StoreName: "My",
StoreLocation: "LocalMachine",
}, testLogger(), mock)
result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
CertPEM: certPEM,
KeyPEM: keyPEM,
})
if err == nil {
t.Fatal("expected error when both import and rollback fail")
}
if result.Success {
t.Fatal("expected failure result")
}
// Wrapped error must mention BOTH errors.
if !strings.Contains(err.Error(), "PowerShell import failed") {
t.Errorf("expected error to mention import failure, got: %v", err)
}
if !strings.Contains(err.Error(), "rollback also failed") {
t.Errorf("expected error to mention rollback failure, got: %v", err)
}
if !strings.Contains(err.Error(), "manual operator inspection required") {
t.Errorf("expected error to flag manual inspection, got: %v", err)
}
// Metadata flags manual action + surfaces both errors.
if result.Metadata["manual_action_required"] != "true" {
t.Errorf("expected manual_action_required=true, got: %s", result.Metadata["manual_action_required"])
}
if result.Metadata["rolled_back"] != "false" {
t.Errorf("expected rolled_back=false, got: %s", result.Metadata["rolled_back"])
}
if result.Metadata["rollback_error"] == "" {
t.Error("expected rollback_error in metadata")
}
if result.Metadata["import_error"] == "" {
t.Error("expected import_error in metadata")
}
}