wincertstore: pre-deploy snapshot + on-import-failure rollback

Closes Bundle 7 of the 2026-05-02 deployment-target coverage audit (see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix, DeployCertificate at wincertstore.go:162-215 ran a single PowerShell script that imported the PFX, optionally set FriendlyName, and optionally removed expired same-Subject certs. Import-PfxCertificate is atomic at the cert-store level, but the wider sequence (import → friendly name → remove expired) is not. Failure in any post-import step left the new cert in the store with no clean recovery path. docs/deployment-atomicity.md L93 promised "Get-ChildItem snapshot for rollback"; the code didn't deliver. This commit: 1. Pre-deploy snapshot. New PowerShell script (tagged `# CERTCTL_SNAPSHOT`) runs Get-ChildItem over the target store, captures every thumbprint, and for each cert with the same Subject as the new one calls Export-PfxCertificate to a tempdir using a transient snapshotExportPassword (32-byte random, distinct from the import PFX password). Output parsed into a snapshotState{Entries: []{Thumbprint, PfxPath}, AllThumbprints, TempDir, ExportPassword}. The new cert's Subject is parsed from request.CertPEM via certutil.ParseCertificatePEM before any cert-store mutation; PEM-parse failure aborts the deploy cleanly. 2. On-import-failure rollback. When the import-script Execute returns error, run a rollback script (tagged `# CERTCTL_ROLLBACK`) that: - Test-Path on the new cert path; Remove-Item if present. - Import-PfxCertificate -FilePath <pfxPath> for each snapshot entry (restores prior state). - Remove-Item -Recurse on the snapshot tempdir. 3. Post-rollback verification. Re-read Get-ChildItem (tagged `# CERTCTL_VERIFY`); assert every original thumbprint is back. On mismatch, append a warning to the DeploymentResult message (rollback ran but final state is suspect — operator inspection recommended). Skipped when AllThumbprints is empty (first-time deploy). 4. Success-path tempdir cleanup. New script tagged `# CERTCTL_CLEANUP` runs after a successful import to remove the snapshot tempdir on a best-effort basis. Failure here is non-fatal (debug log only). 5. Helper extraction. rollbackImport(ctx, snapshot, newThumbprint) + verifyRollback(ctx, snapshot) + cleanupSnapshot(ctx, snapshot) + parseSnapshotOutput are private methods/functions on Connector for clean test seams. Each script emits a unique `# CERTCTL_*` PowerShell comment tag so test mocks can match scripts deterministically — the snapshot/rollback/verify/cleanup scripts all reference Cert:\<store> paths, so the comment tags are the only deterministic substring under randomized map iteration. DeploymentResult shape on failure: - import OK, rollback OK → Success=false, "PowerShell import failed; rolled back" (clean recoverable failure). - import FAIL, rollback OK → same. - rollback FAIL → operator-actionable wrapped error containing both errors; metadata flags manual_action_required=true and surfaces import_error / rollback_error verbatim. Tests added to wincertstore_test.go: - TestWinCertStore_ImportFails_RemovesNewCert_RestoresOldFromSnapshot — happy rollback path with one same-Subject cert in the snapshot. Asserts rollback script contains Remove-Item for the new thumbprint AND Import-PfxCertificate referencing the snapshotted PFX path. - TestWinCertStore_ImportFails_NoExistingSameSubject_RemovesNewCertOnly — snapshot has THUMB: lines but no SNAPSHOT: entries; rollback removes the new cert but does NOT call Import-PfxCertificate. - TestWinCertStore_FriendlyNameFails_NewCertRemoved_OldCertsRestored — variant where the import script's failure originates from Set-ItemProperty FriendlyName; same rollback path. Asserts metadata.import_error preserves the FriendlyName-related PowerShell output for operator visibility. - TestWinCertStore_ImportFails_RollbackAlsoFails_OperatorActionable — wrapped-error escalation. Asserts the error mentions both "PowerShell import failed" and "rollback also failed", and metadata flags manual_action_required=true. Three existing tests (Success, ImportFailed, WithFriendlyName, WithRemoveExpired) updated to match the new contract: success path runs 3 PowerShell scripts (snapshot + import + cleanup), import-failure path runs 4 (snapshot + import + rollback + verify), and the import script lives at mock.scripts[1] not [0]. PowerShell injection note: the new cert's Subject DN is embedded in the snapshot script as a single-quoted literal. Subject DNs can contain apostrophes (e.g. CN=O'Reilly), so escapePowerShellSingleQuoted doubles them per the PowerShell single-quoted-literal escape rule. The export password and thumbprints come from certutil.GenerateRandomPassword (alphanumeric only) and the cert's SHA-1 thumbprint hex (alphanumeric); no escaping needed for those. docs/deployment-atomicity.md L93 unchanged from today's text — Bundle 1 doc-realignment hasn't shipped, so the "Get-ChildItem snapshot for rollback" line was never softened. Post-Bundle-7 the claim is honest (was aspirational pre-fix). Verified locally (sandbox lacks staticcheck install due to disk pressure; CI runs the full lint gate): - gofmt -l ./internal/connector/target/wincertstore/ clean - go vet ./internal/connector/target/wincertstore/ clean - go build ./cmd/agent/... clean - go test -race -count=1 ./internal/connector/target/wincertstore/ green Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 7.
2026-06-14 23:08:51 +00:00 · 2026-05-02 18:13:40 +00:00
parent c222c8b57a
commit 60ae92b0e8
2 changed files with 657 additions and 19 deletions
@@ -197,8 +197,14 @@ func TestDeployCertificate_Success(t *testing.T) {
 		t.Fatalf("generate cert: %v", err)
 	}

+	// Bundle 7: success path runs 3 PowerShell scripts in order:
+	// snapshot → import → cleanup. Seed responses for each.
 	mock := &mockExecutor{
-		responses: []string{"SUCCESS:AABBCCDD"},
+		responses: []string{
+			"TEMPDIR:/tmp/certctl-snapshot-abc",
+			"SUCCESS:AABBCCDD",
+			"CLEANUP_OK",
+		},
 	}
 	c := NewWithExecutor(&Config{
 		StoreName:     "My",
@@ -222,16 +228,22 @@ func TestDeployCertificate_Success(t *testing.T) {
 		t.Errorf("expected store_name metadata 'My', got: %s", result.Metadata["store_name"])
 	}

-	// Verify the PowerShell script was called
-	if len(mock.scripts) != 1 {
-		t.Fatalf("expected 1 script call, got %d", len(mock.scripts))
+	// Bundle 7: 3 scripts on success path (snapshot + import + cleanup).
+	if len(mock.scripts) != 3 {
+		t.Fatalf("expected 3 script calls (snapshot + import + cleanup), got %d", len(mock.scripts))
 	}
-	script := mock.scripts[0]
-	if !strings.Contains(script, "Import-PfxCertificate") {
-		t.Error("expected Import-PfxCertificate in script")
+	if !strings.Contains(mock.scripts[0], "# CERTCTL_SNAPSHOT") {
+		t.Errorf("expected # CERTCTL_SNAPSHOT in first script, got: %s", mock.scripts[0])
 	}
-	if !strings.Contains(script, "Cert:\\LocalMachine\\My") {
-		t.Error("expected correct cert store path in script")
+	importScript := mock.scripts[1]
+	if !strings.Contains(importScript, "Import-PfxCertificate") {
+		t.Error("expected Import-PfxCertificate in second script")
+	}
+	if !strings.Contains(importScript, "Cert:\\LocalMachine\\My") {
+		t.Error("expected correct cert store path in second script")
+	}
+	if !strings.Contains(mock.scripts[2], "# CERTCTL_CLEANUP") {
+		t.Errorf("expected # CERTCTL_CLEANUP in third script, got: %s", mock.scripts[2])
 	}
 }

@@ -267,9 +279,16 @@ func TestDeployCertificate_ImportFailed(t *testing.T) {
 		t.Fatalf("generate cert: %v", err)
 	}

+	// Bundle 7: snapshot returns empty (no thumbprints in store) → import
+	// fails → rollback runs (and succeeds since snapshot was empty, only
+	// removes the new cert if it landed). Mock seeds 3 responses.
 	mock := &mockExecutor{
-		responses: []string{"Access denied"},
-		errors:    []error{fmt.Errorf("exit code 1")},
+		responses: []string{
+			"TEMPDIR:/tmp/certctl-snapshot-xyz",
+			"Access denied",
+			"ROLLBACK_OK",
+		},
+		errors: []error{nil, fmt.Errorf("exit code 1"), nil},
 	}
 	c := NewWithExecutor(&Config{}, testLogger(), mock)

@@ -280,6 +299,11 @@ func TestDeployCertificate_ImportFailed(t *testing.T) {
 	if err == nil || !strings.Contains(err.Error(), "PowerShell import failed") {
 		t.Fatalf("expected import failure error, got: %v", err)
 	}
+	// Bundle 7: error message must reference rollback so operators know
+	// the deploy left the store in a known state.
+	if !strings.Contains(err.Error(), "rolled back") {
+		t.Errorf("expected error to mention 'rolled back', got: %v", err)
+	}
 }

 func TestDeployCertificate_WithFriendlyName(t *testing.T) {
@@ -288,7 +312,13 @@ func TestDeployCertificate_WithFriendlyName(t *testing.T) {
 		t.Fatalf("generate cert: %v", err)
 	}

-	mock := &mockExecutor{responses: []string{"SUCCESS:AABB"}}
+	mock := &mockExecutor{
+		responses: []string{
+			"TEMPDIR:/tmp/certctl-snapshot-fn",
+			"SUCCESS:AABB",
+			"CLEANUP_OK",
+		},
+	}
 	c := NewWithExecutor(&Config{
 		StoreName:    "My",
 		FriendlyName: "Production API Cert",
@@ -301,8 +331,12 @@ func TestDeployCertificate_WithFriendlyName(t *testing.T) {
 	if err != nil {
 		t.Fatalf("deploy failed: %v", err)
 	}
-	if !strings.Contains(mock.scripts[0], "FriendlyName") {
-		t.Error("expected FriendlyName in PowerShell script")
+	// Bundle 7: import script is now mock.scripts[1] (snapshot is [0]).
+	if len(mock.scripts) < 2 {
+		t.Fatalf("expected at least 2 scripts (snapshot + import), got %d", len(mock.scripts))
+	}
+	if !strings.Contains(mock.scripts[1], "FriendlyName") {
+		t.Error("expected FriendlyName in import script")
 	}
 }

@@ -312,7 +346,13 @@ func TestDeployCertificate_WithRemoveExpired(t *testing.T) {
 		t.Fatalf("generate cert: %v", err)
 	}

-	mock := &mockExecutor{responses: []string{"SUCCESS:AABB"}}
+	mock := &mockExecutor{
+		responses: []string{
+			"TEMPDIR:/tmp/certctl-snapshot-re",
+			"SUCCESS:AABB",
+			"CLEANUP_OK",
+		},
+	}
 	c := NewWithExecutor(&Config{
 		StoreName:     "My",
 		RemoveExpired: true,
@@ -325,8 +365,276 @@ func TestDeployCertificate_WithRemoveExpired(t *testing.T) {
 	if err != nil {
 		t.Fatalf("deploy failed: %v", err)
 	}
-	if !strings.Contains(mock.scripts[0], "Remove-Item") {
-		t.Error("expected Remove-Item for expired cert cleanup in script")
+	// Bundle 7: import script is mock.scripts[1].
+	if len(mock.scripts) < 2 {
+		t.Fatalf("expected at least 2 scripts (snapshot + import), got %d", len(mock.scripts))
+	}
+	if !strings.Contains(mock.scripts[1], "Remove-Item") {
+		t.Error("expected Remove-Item for expired cert cleanup in import script")
+	}
+}
+
+// --- Bundle 7: pre-deploy snapshot + on-import-failure rollback ---
+//
+// These four tests pin the load-bearing rollback contract added in
+// Bundle 7 of the 2026-05-02 deployment-target audit:
+//   - happy rollback path: snapshot finds same-Subject cert → import
+//     fails → rollback removes new cert + re-imports snapshot;
+//   - first-time deploy: snapshot finds no same-Subject certs → import
+//     fails → rollback only removes the new cert (no re-import);
+//   - FriendlyName-step failure: import script fails on Set
+//     FriendlyName → same rollback path;
+//   - rollback-also-fails: operator-actionable wrapped error.
+
+func TestWinCertStore_ImportFails_RemovesNewCert_RestoresOldFromSnapshot(t *testing.T) {
+	certPEM, keyPEM, err := generateTestCertAndKey()
+	if err != nil {
+		t.Fatalf("generate cert: %v", err)
+	}
+
+	// Snapshot finds one same-Subject cert and exports it.
+	// Import fails. Rollback succeeds. Verify confirms.
+	mock := &mockExecutor{
+		responses: []string{
+			"SNAPSHOT:OLDTHUMB123:/tmp/certctl-snapshot-abc/OLDTHUMB123.pfx\nTEMPDIR:/tmp/certctl-snapshot-abc",
+			"PFX import error",
+			"ROLLBACK_OK",
+			"VERIFY_OK",
+		},
+		errors: []error{nil, fmt.Errorf("exit code 1"), nil, nil},
+	}
+	c := NewWithExecutor(&Config{
+		StoreName:     "My",
+		StoreLocation: "LocalMachine",
+	}, testLogger(), mock)
+
+	result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
+		CertPEM: certPEM,
+		KeyPEM:  keyPEM,
+	})
+	if err == nil {
+		t.Fatal("expected error when import fails")
+	}
+	if result == nil {
+		t.Fatal("expected non-nil result on rollback path")
+	}
+	if result.Success {
+		t.Fatal("expected failure result")
+	}
+	if !strings.Contains(err.Error(), "PowerShell import failed") {
+		t.Errorf("expected error to mention import failure, got: %v", err)
+	}
+	if !strings.Contains(err.Error(), "rolled back") {
+		t.Errorf("expected error to mention 'rolled back', got: %v", err)
+	}
+
+	// 4 scripts: snapshot + import + rollback + verify.
+	if len(mock.scripts) != 4 {
+		t.Fatalf("expected 4 scripts (snapshot + import + rollback + verify), got %d", len(mock.scripts))
+	}
+
+	// Locate the rollback script and assert it contains BOTH a Remove-Item
+	// for the new thumbprint AND an Import-PfxCertificate for the
+	// snapshotted PFX.
+	var rollbackScript string
+	for _, s := range mock.scripts {
+		if strings.Contains(s, "# CERTCTL_ROLLBACK") {
+			rollbackScript = s
+			break
+		}
+	}
+	if rollbackScript == "" {
+		t.Fatal("expected rollback script to be executed")
+	}
+	if !strings.Contains(rollbackScript, "Remove-Item") {
+		t.Errorf("expected rollback to contain Remove-Item, got: %s", rollbackScript)
+	}
+	if !strings.Contains(rollbackScript, "Import-PfxCertificate") {
+		t.Errorf("expected rollback to Import-PfxCertificate the snapshot, got: %s", rollbackScript)
+	}
+	if !strings.Contains(rollbackScript, "OLDTHUMB123.pfx") {
+		t.Errorf("expected rollback to reference the snapshot pfx path, got: %s", rollbackScript)
+	}
+
+	if result.Metadata["rolled_back"] != "true" {
+		t.Errorf("expected rolled_back=true in metadata, got: %s", result.Metadata["rolled_back"])
+	}
+}
+
+func TestWinCertStore_ImportFails_NoExistingSameSubject_RemovesNewCertOnly(t *testing.T) {
+	certPEM, keyPEM, err := generateTestCertAndKey()
+	if err != nil {
+		t.Fatalf("generate cert: %v", err)
+	}
+
+	// Snapshot returns THUMB lines (different-Subject certs in store) but
+	// NO SNAPSHOT lines — no same-Subject cert was exported. Rollback
+	// removes the new cert but does NOT call Import-PfxCertificate.
+	mock := &mockExecutor{
+		responses: []string{
+			"THUMB:UNRELATED1\nTHUMB:UNRELATED2\nTEMPDIR:/tmp/certctl-snapshot-noss",
+			"PFX import error",
+			"ROLLBACK_OK",
+			"VERIFY_OK",
+		},
+		errors: []error{nil, fmt.Errorf("exit code 1"), nil, nil},
+	}
+	c := NewWithExecutor(&Config{
+		StoreName:     "My",
+		StoreLocation: "LocalMachine",
+	}, testLogger(), mock)
+
+	_, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{
+		CertPEM: certPEM,
+		KeyPEM:  keyPEM,
+	})
+	if err == nil {
+		t.Fatal("expected error when import fails")
+	}
+
+	var rollbackScript string
+	for _, s := range mock.scripts {
+		if strings.Contains(s, "# CERTCTL_ROLLBACK") {
+			rollbackScript = s
+			break
+		}
+	}
+	if rollbackScript == "" {
+		t.Fatal("expected rollback script to be executed")
+	}
+	if !strings.Contains(rollbackScript, "Remove-Item") {
+		t.Errorf("expected rollback to remove new cert via Remove-Item, got: %s", rollbackScript)
+	}
+	// No same-Subject snapshots → no Import-PfxCertificate during rollback.
+	if strings.Contains(rollbackScript, "Import-PfxCertificate") {
+		t.Errorf("expected no Import-PfxCertificate when snapshot has no same-Subject entries, got: %s", rollbackScript)
+	}
+}
+
+func TestWinCertStore_FriendlyNameFails_NewCertRemoved_OldCertsRestored(t *testing.T) {
+	certPEM, keyPEM, err := generateTestCertAndKey()
+	if err != nil {
+		t.Fatalf("generate cert: %v", err)
+	}
+
+	// The connector cannot inspect WHICH step inside the import script
+	// failed — Execute returns a single (output, error). Operators see
+	// the FriendlyName failure surfaced via the error output. The
+	// rollback runs the same way regardless of which post-Import step
+	// failed (FriendlyName, Get-ChildItem verify, RemoveExpired).
+	mock := &mockExecutor{
+		responses: []string{
+			"SNAPSHOT:OLDTHUMB456:/tmp/certctl-snapshot-fn/OLDTHUMB456.pfx\nTEMPDIR:/tmp/certctl-snapshot-fn",
+			"Cannot set FriendlyName: invalid character in friendly name",
+			"ROLLBACK_OK",
+			"VERIFY_OK",
+		},
+		errors: []error{nil, fmt.Errorf("Set-ItemProperty failed"), nil, nil},
+	}
+	c := NewWithExecutor(&Config{
+		StoreName:     "My",
+		StoreLocation: "LocalMachine",
+		FriendlyName:  "Production",
+	}, testLogger(), mock)
+
+	result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
+		CertPEM: certPEM,
+		KeyPEM:  keyPEM,
+	})
+	if err == nil {
+		t.Fatal("expected error when FriendlyName step fails")
+	}
+	if result.Success {
+		t.Fatal("expected failure result")
+	}
+	// Operator visibility: the import_error in metadata should preserve
+	// the PowerShell output so operators can tell what went wrong.
+	if !strings.Contains(result.Metadata["import_error"], "FriendlyName") {
+		t.Errorf("expected import_error to reference FriendlyName, got: %s", result.Metadata["import_error"])
+	}
+
+	var rollbackScript string
+	for _, s := range mock.scripts {
+		if strings.Contains(s, "# CERTCTL_ROLLBACK") {
+			rollbackScript = s
+			break
+		}
+	}
+	if rollbackScript == "" {
+		t.Fatal("expected rollback script to be executed")
+	}
+	if !strings.Contains(rollbackScript, "Remove-Item") {
+		t.Errorf("expected rollback to Remove-Item the new cert, got: %s", rollbackScript)
+	}
+	if !strings.Contains(rollbackScript, "OLDTHUMB456.pfx") {
+		t.Errorf("expected rollback to restore snapshotted cert, got: %s", rollbackScript)
+	}
+
+	if result.Metadata["rolled_back"] != "true" {
+		t.Errorf("expected rolled_back=true, got: %s", result.Metadata["rolled_back"])
+	}
+}
+
+func TestWinCertStore_ImportFails_RollbackAlsoFails_OperatorActionable(t *testing.T) {
+	certPEM, keyPEM, err := generateTestCertAndKey()
+	if err != nil {
+		t.Fatalf("generate cert: %v", err)
+	}
+
+	// Snapshot succeeds; import fails; rollback ALSO fails. Operator-
+	// actionable case: both errors must be surfaced + metadata flags
+	// manual_action_required.
+	mock := &mockExecutor{
+		responses: []string{
+			"SNAPSHOT:OLDTHUMB789:/tmp/certctl-snapshot-rbf/OLDTHUMB789.pfx\nTEMPDIR:/tmp/certctl-snapshot-rbf",
+			"Import error",
+			"Rollback step failed",
+		},
+		errors: []error{
+			nil,
+			fmt.Errorf("import-step exit code 1"),
+			fmt.Errorf("rollback-step exit code 2"),
+		},
+	}
+	c := NewWithExecutor(&Config{
+		StoreName:     "My",
+		StoreLocation: "LocalMachine",
+	}, testLogger(), mock)
+
+	result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{
+		CertPEM: certPEM,
+		KeyPEM:  keyPEM,
+	})
+	if err == nil {
+		t.Fatal("expected error when both import and rollback fail")
+	}
+	if result.Success {
+		t.Fatal("expected failure result")
+	}
+
+	// Wrapped error must mention BOTH errors.
+	if !strings.Contains(err.Error(), "PowerShell import failed") {
+		t.Errorf("expected error to mention import failure, got: %v", err)
+	}
+	if !strings.Contains(err.Error(), "rollback also failed") {
+		t.Errorf("expected error to mention rollback failure, got: %v", err)
+	}
+	if !strings.Contains(err.Error(), "manual operator inspection required") {
+		t.Errorf("expected error to flag manual inspection, got: %v", err)
+	}
+
+	// Metadata flags manual action + surfaces both errors.
+	if result.Metadata["manual_action_required"] != "true" {
+		t.Errorf("expected manual_action_required=true, got: %s", result.Metadata["manual_action_required"])
+	}
+	if result.Metadata["rolled_back"] != "false" {
+		t.Errorf("expected rolled_back=false, got: %s", result.Metadata["rolled_back"])
+	}
+	if result.Metadata["rollback_error"] == "" {
+		t.Error("expected rollback_error in metadata")
+	}
+	if result.Metadata["import_error"] == "" {
+		t.Error("expected import_error in metadata")
 	}
 }